class: center, middle, inverse, title-slide .title[ # Non-linearity ] .subtitle[ ## EDUC 643: Unit 5 Part II ] .author[ ### David D. Liebowitz ] --- # Roadmap <img src="Roadmap5.jpg" width="90%" style="display: block; margin: auto;" /> --- # Goals of the unit - Describe in writing and verbally the assumptions we violate when we fit a non-linear relationship with a linear model - Transform non-linear relationships into linear ones by using logarithmic scales - Estimate regression models using logarithmic scales and interpret the results - Estimate models with quadratic and higher-order polynomial terms (special kinds of interactions) - Select between transformation options --- class: middle, inverse # Non-linearity --- # $ and learning <img src="EDUC643_16_nonlinearity_files/figure-html/unnamed-chunk-2-1.png" style="display: block; margin: auto;" /> --- # $ and learning <img src="EDUC643_16_nonlinearity_files/figure-html/unnamed-chunk-3-1.png" style="display: block; margin: auto;" /> --- # $ and learning <img src="EDUC643_16_nonlinearity_files/figure-html/unnamed-chunk-4-1.png" style="display: block; margin: auto;" /> -- .small[***If assumptions hold***, each $10,000 diff in total spending associated, on average, with 4.3 scale score point difference in reading scores.] -- .small[.blue[**But do they?**]] --- # Linear? ```r # Fit the model fit <- lm(read_score ~ total_spending, data=pisa) # Generate residual vs fitted plot pisa$resid <- resid(fit) pisa$fitted <- fitted(fit) ggplot(pisa, aes(fitted, resid)) + geom_point() + geom_hline(yintercept = 0, color = "red", linetype="dashed") ``` <img src="EDUC643_16_nonlinearity_files/figure-html/unnamed-chunk-5-1.png" style="display: block; margin: auto;" /> --- <img src="pisa_spending.png" width="1048" style="display: block; margin: auto;" /> --- # Make it nice -- <img src="EDUC643_16_nonlinearity_files/figure-html/unnamed-chunk-7-1.png" style="display: block; margin: auto;" /> -- At low levels of spending the relationship between ***total_spending*** and ***read_score*** has a big magnitude. At higher levels of spending, it seems much more modest (negative?). --- # Piecewise <img src="EDUC643_16_nonlinearity_files/figure-html/unnamed-chunk-8-1.png" style="display: block; margin: auto;" /> --- # Piecewise <img src="EDUC643_16_nonlinearity_files/figure-html/unnamed-chunk-9-1.png" style="display: block; margin: auto;" /> --- # Piecewise <img src="EDUC643_16_nonlinearity_files/figure-html/unnamed-chunk-10-1.png" style="display: block; margin: auto;" /> --- # Piecewise While it is true, as we've said before that .red-pink[*locally all relationships are linear*], we've identified some emerging issues: .small[ - Cut points arbitrary and these choices may substantially alter nature of observed relationship - With large data "eyeballing" linear sub-segments impossible - Increasing loss of power (larger standard errors and confidence intervals, greater influence of outliers) - .red-pink[**Overfitting**] risks increase + Analysis conforms to particularly to your specific data, but generalizes poorly to population of inference ] -- <img src="EDUC643_16_nonlinearity_files/figure-html/unnamed-chunk-11-1.png" style="display: block; margin: auto;" /> -- **Solutions**: transformations and polynomials --- class: middle, inverse ## Logarithmic transformations in X --- # Log transformations - We can posit a .red-pink[**non-linear relationship**] between X and Y *in the population* - Any non-linear relationship implies that the relationship between X and Y is relative to a particular value of X and/or Y, not absolute (the slope is non-constant) - .red-pink[**Transformations**] (i.e., spreading out in some cases and compressing in others the values of our X and Y variables) allow us to fit non-linear relationships within the existing machinery of the general linear model --- # Log transformations in life .pull-left[ <img src="octave.png" width="224" style="display: block; margin: auto;" /> `\(\uparrow\)` .small[1 octave = doubling of cycles-per-second] <img src="covid.png" width="660" style="display: block; margin: auto;" /> ] .pull-right[ <img src="seismograph.jpg" width="355" style="display: block; margin: auto;" /> .tiny2[ | Seismic-wave amplitude | Location | Richter Scale |------------------------------------------------------------------ | 1,000,000 | Christchurch, 2010 | 6.0 | 10,000,000 | Port-au-Prince, 2010 | 7.0 | 100,000,000 | Sichuan, 2008 | 8.0 | 1,000,000,000 | Sumatra, 2004 | 9.0 ] `\(\uparrow\)` .small[1 Richter = 10x] `\(\uparrow\)` .small[SWA] ] --- # A log 🌳 you say?? Logs are the function we can perform to "undo" raising a number to a power. If a number is equal to a base raised to a power `\((x = base^{power})\)`, then a logarithim of a given base is the number you would have to raise to that power to get `\(x\)`: .pull-left[ **Exponents** `\(10 = 10^1\)` `\(100=10^2\)` `\(1,000 = 10^3\)` `\(10,000 = 10^4\)` `\(100,000 = 10^5\)` ] .pull-right[ **Logarithms** `\(\text{log}_{10}(10)=1\)` `\(\text{log}_{10}(100)=2\)` `\(\text{log}_{10}(1,000)=3\)` `\(\text{log}_{10}(100,000)=4\)` `\(\text{log}_{10}(100,000)=5\)` ] -- Each 1 unit increase in a base-10 logarithm represents a 10-fold increase in `\(x\)`. -- Can have logarithms of different base. --- # A log 🌲 you say?? Logs are the function we can perform to "undo" raising a number to a power. If a number is equal to a base raised to a power `\((x = base^{power})\)`, then a logarithim of a given base is the number you would have to raise to that power to get `\(x\)`: .pull-left[ **Exponents** `\(2 = 2^1\)` `\(4 = 2^2\)` `\(8 = 2^3\)` `\(16 = 2^4\)` `\(32 = 2^5\)` ] .pull-right[ **Logarithms** `\(\text{log}_{2}(2)=1\)` `\(\text{log}_{2}(4)=2\)` `\(\text{log}_{2}(8)=3\)` `\(\text{log}_{2}(16)=4\)` `\(\text{log}_{2}(32)=5\)` ] -- Each 1 unit increase in a base-2 logarithm represents a doubling of `\(x\)`. -- Can say this as: “Log base 2 of 32 is 5” or “Log base 10 of 1,000 is 3” --- # Understanding logs <img src="log_scale.png" width="940" style="display: block; margin: auto;" /> -- **Some key concepts:** - Taking logs spreads out the distance between small (closer to 0) values and compresses the distance between large (further from zero) values. - Log base anything(1) is = 0 - Log base anything(0) is undefined (can't raise anything to a power and get 0) - Log base anything(<0) (i.e., log of a negative number) is undefined (technically a complex number) - Taking logs is a .red-pink[**monotonic**] transformation; doesn’t change the order of any of the underlying raw values --- # $ and scores? Let's try transforming our X variable (*total_spending*) on a logarithmic scale; can do this directly in our plot: ```r log_flag <- flag + xlab("Total spending, age 6-15 (Log10 $)") + scale_x_log10(breaks=c(10000, 50000, 100000, 300000), label=scales::comma) ``` --- # $ and scores? Let's try transforming our X variable (*total_spending*) on a logarithmic scale; can do this directly in our plot: <img src="EDUC643_16_nonlinearity_files/figure-html/unnamed-chunk-17-1.png" style="display: block; margin: auto;" /> --- # $ and scores? Let's try transforming our X variable (*total_spending*) on a logarithmic scale; can do this directly in our plot: <img src="EDUC643_16_nonlinearity_files/figure-html/unnamed-chunk-18-1.png" style="display: block; margin: auto;" /> --- ## Regress read on `\(\text{log}_{10}(spend)\)` ```r summary(lm(read_score ~ log10(total_spending), data=pisa)) ``` ``` ## ## Call: ## lm(formula = read_score ~ log10(total_spending), data = pisa) ## ## Residuals: ## Min 1Q Median 3Q Max ## -136.50 -20.83 11.00 22.42 59.11 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -78.03 69.14 -1.129 0.263 ## log10(total_spending) 112.74 14.46 7.798 8.06e-11 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 35.59 on 63 degrees of freedom ## Multiple R-squared: 0.4911, Adjusted R-squared: 0.4831 ## F-statistic: 60.8 on 1 and 63 DF, p-value: 8.062e-11 ``` --- # Conceptually .pull-left[ <img src="EDUC643_16_nonlinearity_files/figure-html/unnamed-chunk-20-1.png" style="display: block; margin: auto;" /> `$$\hat{READ}_j = 428+0.00043 \times SPEND_j$$` ] .pull-right[ <img src="EDUC643_16_nonlinearity_files/figure-html/unnamed-chunk-21-1.png" style="display: block; margin: auto;" /> `$$\hat{READ}_j = -78.03 + 112.74 \times \text{log}_{10}(SPEND_j)$$` ] -- - In ed/dev psych this kind of curve is typically called a “learning curve”; represents standard rate of learning - More broadly, "increasing exponential decay" or "diminishing marginal returns" --- # Interpret <img src="EDUC643_16_nonlinearity_files/figure-html/unnamed-chunk-22-1.png" style="display: block; margin: auto;" /> **Some alternative ways to describe this relationship:** .tiny[ - Average reading scores in the population of countries sitting for the 2018 PISA reading test were 112.7 points higher for every ten-fold increase in cumulative educational spending on children aged 6-15. - As cumulative education spending on children aged 6-15 is ten times higher, reading scores in the population of countries sitting for the 2018 PISA reading test were 112.7 points higher, on average. - We predict that two countries that spend an order of magnitude (e.g., $10,000 vs. $100,00) apart on cumulative educational expenditures on children aged 6-15 will have PISA reading scores 112.7 points apart. ] --- class: middle, inverse # Log transformations in Y ## aka Exponential growth curve --- # GDP and PPE <img src="EDUC643_16_nonlinearity_files/figure-html/unnamed-chunk-23-1.png" style="display: block; margin: auto;" /> --- # GDP and PPE <img src="EDUC643_16_nonlinearity_files/figure-html/unnamed-chunk-24-1.png" style="display: block; margin: auto;" /> --- # An alternative model The relationship of GDP and PPE are relative to their respective values. The relationship has a smaller magnitude when GDP per capita is smaller and a larger magnitude when GDP per capita is larger. -- Can use a log transformation to capture the non-absolute (non-constant) nature of the slope: `$$PPE_j = \beta_0 * 2^{(\beta_1 GDP_j + \varepsilon)}$$` `$$\text{log}_2(PPE_j) = \text{log}_2 \beta_0 + \beta_1 GDP_j + \varepsilon$$` --- # Interpreting this Can interpret log outcomes as percent changes because: `$$Y_1 = \beta_0 2^{\beta_1 X}$$` `$$Y_2 = \beta_0 2^{\beta_1(X+1)} = \beta_0 2^{\beta_1 X} 2^{\beta_1}$$` `$$\frac{Y_2}{Y_1} = \frac{\beta_0 2^{\beta_1 X} 2^{\beta_1}}{\beta_0 2^{\beta_1 X}} = 2^{\beta_1}$$` So, `\(Y_2\)` is `\(2^{\beta_1}\)` times larger than `\(Y_1\)`! -- Depends on key properties of logs: - log(xy) = log(x) + log(y) - `\(\text{log}(x^p)\)` = p*log(x) -- .red-pink[**Percent growth rate**] = `\(\large 100*(2^{\beta_1} - 1)\)` .small[Regress log(Y) on X and substitute the estimated slope into the equation for the percent growth rate to obtain the estimated percent growth rate per unit change in X.] `\(Y_2 = 2^{\beta_1} Y_1\)` is the same thing as saying the percent growth rate is `\(100*(2^{\beta_1} - 1)\)` --- # Visualized Y transformation ```r oecd$log2ppe <- log2(oecd$ppe) log_ppe <- ggplot(oecd, aes(x=gdp, y=log2ppe)) ``` --- # Visualized Y transformation <img src="EDUC643_16_nonlinearity_files/figure-html/unnamed-chunk-26-1.png" style="display: block; margin: auto;" /> --- ## Regress `\(\text{log}_{2}(ppe)\)` on gdp ```r summary(lm(log2(ppe) ~ gdp, oecd)) ``` ``` ... ## Residuals: ## Min 1Q Median 3Q Max ## -0.39728 -0.09378 0.01867 0.11920 0.31357 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 1.176e+01 1.113e-01 105.7 <2e-16 *** ## gdp 3.899e-05 2.484e-06 15.7 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.1712 on 32 degrees of freedom ## Multiple R-squared: 0.8851, Adjusted R-squared: 0.8815 ## F-statistic: 246.5 on 1 and 32 DF, p-value: < 2.2e-16 ... ``` -- **Percent growth rate**: `\(100(2^{0.000039} - 1) = 0.0027\%\)` ; for each $1 more of GDP per person, PPE is 0.0027% higher; or for each $1,000 more of GDP per person, PPE is 2.7% higher --- # Interpreting log Y results <img src="EDUC643_16_nonlinearity_files/figure-html/unnamed-chunk-28-1.png" style="display: block; margin: auto;" /> `$$\text{log}_2 (\hat{PPE}_j) = 11.8 + 0.000039 * GDP_j$$` > Per capita gross domestic product (GDP) is a strong predictor of yearly per-student expenditure from primary through tertiary education. In particular, if we compare two countries whose GDPs differ by $1,000, we would predict that the wealthier country would have per pupil expenditure that is 2.7 ***percent*** higher than the country with the smaller economy. --- class: middle, inverse # Log-log transformations ## aka proportional growth --- # Which 🌲 to harvest? - Could theoretically select a log of any base to transform outcome or predictor or both to a linear relationship - Much more sensible to restrict yourself to base_10, base_2 or the .red-pink[**natural log**]; comes from Euler's number `\((e)\)` `$$e = \lim_{n \to \infty}(1 + \frac{1}{n})^n \approx 2.718281828459...$$` - .red-pink[**Natural log**]: `\(\text{log}_{2.718...}(x) = \text{log}_e(x) = \text{ln}(x)\)` --- # All the countries <img src="EDUC643_16_nonlinearity_files/figure-html/unnamed-chunk-29-1.png" style="display: block; margin: auto;" /> --- # Log-log transformations ```r oecd2$lngdp <- log(oecd2$gdp) oecd2$lnppe <- log(oecd2$ppe) ln_ppe <- ggplot(oecd2, aes(x=lngdp, y=lnppe)) ``` --- # Log-log transformations <img src="EDUC643_16_nonlinearity_files/figure-html/unnamed-chunk-31-1.png" style="display: block; margin: auto;" /> --- ## Regress `\(\text{ln}(ppe)\)` on `\(\text{ln}(gdp)\)` ```r summary(lm(log(ppe) ~ log(gdp), oecd2)) ``` ``` ... ## Residuals: ## Min 1Q Median 3Q Max ## -0.43570 -0.04076 0.01302 0.07489 0.26542 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -0.39273 0.72674 -0.54 0.592 ## log(gdp) 0.91274 0.06801 13.42 3.83e-15 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.1509 on 34 degrees of freedom ## Multiple R-squared: 0.8412, Adjusted R-squared: 0.8365 ## F-statistic: 180.1 on 1 and 34 DF, p-value: 3.826e-15 ... ``` -- `$$\hat{LnPPE}_j = -0.39 + 0.91 * LnGDP_j$$` --- # Interpreting this Can interpret log-log relationships in percent terms. `\(\color{red}{\beta_1}\)` .red-pink[**represents the % change in Y per 1% change in X.**] .pull-left[ .small[**Postulated model:**] - `\(Y = \beta_0 X^{\beta_1}e^{\varepsilon}\)` - `\(\text{ln}(Y) = \text{ln}(\beta_0 X^{\beta_1}e^{\varepsilon})\)` - `\(\text{ln}(Y) = \text{ln}(\beta_0) + \text{ln}(X^{\beta_1}) + \text{ln}(e^{\varepsilon})\)` - `\(\text{ln}(Y) = \text{ln}(\beta_0) + \beta_1 \text{ln}(X) + \varepsilon\)` ] .pull-right[ .small[**Imagine**] `\(Y_1\)` .small[**and**] `\(Y_2\)` .small[**are 1% (or 0.01) apart**:] - `\(Y_1 = \beta_0 X^{\beta_1}\)` - `\(Y_2 = \beta_0(1.01X)^{\beta_1} = \beta_0 X^{\beta_1}(1.01)^{\beta_1}\)` - `\(\frac{Y_2}{Y_1} = \frac{\beta_0 X^{\beta_1}}{\beta_0 X^{\beta_1}} = (1.01)^{\beta_1}\)` So `\(Y_2\)` is `\((1.01)^{\beta_1}\)` times larger than `\(Y_1\)` ] -- Regress ln(Y) on ln(X) and the slope estimate is the estimated percent difference in Y per 1 percent difference in X --- # Interpret log-log relationship ```r summary(lm(log(ppe) ~ log(gdp), oecd2)) ``` ``` ... ## Residuals: ## Min 1Q Median 3Q Max ## -0.43570 -0.04076 0.01302 0.07489 0.26542 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -0.39273 0.72674 -0.54 0.592 ## log(gdp) 0.91274 0.06801 13.42 3.83e-15 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.1509 on 34 degrees of freedom ## Multiple R-squared: 0.8412, Adjusted R-squared: 0.8365 ## F-statistic: 180.1 on 1 and 34 DF, p-value: 3.826e-15 ... ``` -- "1 percent change in GDP predicts 0.91 percent change in PPE" --- # Interpret log-log relationship <img src="EDUC643_16_nonlinearity_files/figure-html/unnamed-chunk-34-1.png" style="display: block; margin: auto;" /> `$$\text{ln}(\hat{PPE}_j) = \text{ln}(\beta_0) + \beta_1 \text{ln}(GDP_j) + \varepsilon$$` > We predict that, on average, comparing two countries with GDP per capita separated by 1 percent the wealthier country will spend 0.91 percent more on its pupils across primary through tertiary education. --- ## "Forbidden" log transformations So far, we've been dealing with situations in which all the variables we needed to transform were non-zero. In fact this is often not the case: <img src="EDUC643_16_nonlinearity_files/figure-html/unnamed-chunk-35-1.png" style="display: block; margin: auto;" /> -- Many other instances: counts of behaviors, individual income, absences, scale scores, etc. --- ## "Forbidden" log transformations Traditional approach: - Add a small "starter" value to all raw values (+1, +0.1, +0.01, +0.001, etc.) - Take log of this .red-pink[**"zero-inflated"**] variable .red[**DO NOT DO THIS!!!**] - Value selected for starter and proportion of 0s in your data can results in wildly inconsistent coefficient estimates - You'll address this issue in EDUC 645 with Poisson regression - Can also (potentially) be addressed with an inverse hyperbolic sine transformation --- .pull-left[ <img src="EDUC643_16_nonlinearity_files/figure-html/unnamed-chunk-37-1.png" style="display: block; margin: auto;" /> .small[ - Regress Y on log(X) - `\(Y = \hat{\beta_0} + \hat{\beta_1}\text{log}(X)\)` - "every doubling (or whatever base) of X associated with `\(\hat{\beta_1}\)` diff in Y" ] ] .pull-right[ <img src="EDUC643_16_nonlinearity_files/figure-html/unnamed-chunk-38-1.png" style="display: block; margin: auto;" /> .small[ - Regress log(Y) on X - `\(\text{log}(Y) = \hat{\beta_0} + \hat{\beta}_1 X\)` - Every 1 unit diff in X associated with `\(100(e^{\hat{\beta_1}} - 1)\)` % diff in Y ] ] <img src="EDUC643_16_nonlinearity_files/figure-html/unnamed-chunk-39-1.png" style="display: block; margin: auto;" /> .small[ - Regress log(Y) on log(X) - `\(\text{log}(Y) = \hat{\beta_0} + \hat{\beta_1}\text{log}(X)\)` - Every 1% diff in X associated with `\(\hat{\beta_1}\)` percent diff in Y ] --- class: middle, inverse # Quadratic terms: a special kind of interaction --- # Quadratic model <img src="EDUC643_16_nonlinearity_files/figure-html/unnamed-chunk-40-1.png" style="display: block; margin: auto;" /> Effects of a predictor can differ by that predictor: `$$Y = \beta_0 + \beta_1 X_1 + \beta_2 (X_1 * X_1) + \varepsilon$$` `$$Y = \beta_0 + \beta_1 X_1 + \beta_2 X_1^2 + \varepsilon$$` -- Can point upwards or downwards, but **all quadratic relationships are .red-pink[non-monotic]; the relationship both rises and falls (or falls and rises)** --- # A quadratic relationship <img src="EDUC643_16_nonlinearity_files/figure-html/unnamed-chunk-41-1.png" style="display: block; margin: auto;" /> -- .blue[**Which direction will the quadratic line of best fit point?**] --- # A quadratic relationship <img src="EDUC643_16_nonlinearity_files/figure-html/unnamed-chunk-42-1.png" style="display: block; margin: auto;" /> --- # A quadratic relationship <img src="EDUC643_16_nonlinearity_files/figure-html/unnamed-chunk-43-1.png" style="display: block; margin: auto;" /> -- We can represent quadratic fits mathematically in generic form: `\(y = \beta_0 + \beta_1 x + \beta_2 x^2\)`. -- .blue[**Challenge: what signs will each of the three coefficients take for the above relationship?**] --- # Fitting the quadratic ```r summary(lm(read_score ~ total_spending + I(total_spending^2), pisa)) ``` ``` ... ## Residuals: ## Min 1Q Median 3Q Max ## -98.511 -15.722 3.806 22.651 59.394 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 3.728e+02 9.665e+00 38.574 < 2e-16 *** ## total_spending 1.750e-03 1.798e-04 9.732 4.22e-14 *** ## I(total_spending^2) -5.260e-09 6.498e-10 -8.096 2.70e-11 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 31.34 on 62 degrees of freedom ## Multiple R-squared: 0.6117, Adjusted R-squared: 0.5992 ## F-statistic: 48.84 on 2 and 62 DF, p-value: 1.834e-13 ... ``` -- Fitted equation: `\(\hat{read} = 372.8 + 0.00175 * spend - 0.00000000526 * spend^2\)`. -- .blue[**How do our model fit statistics compare to the linear version?**] --- # The "right" fit to data .pull-left[ <img src="EDUC643_16_nonlinearity_files/figure-html/unnamed-chunk-45-1.png" style="display: block; margin: auto;" /><img src="EDUC643_16_nonlinearity_files/figure-html/unnamed-chunk-45-2.png" style="display: block; margin: auto;" /> ] .small[ .pull-right[ - A declining relationship between spending and performance doesn't make much substantive sense, so we would probably not use a quadratic fit for our full data - However, without Qatar and Luxembourg, a quadratic describes the relationship quite nicely <br> <br> - Don't extrapolate the shape of the parabola to the left of the y-axis - Shouldn't assume the y values will be higher to the left of the y-axis ] ] --- class: middle, inverse # Higher-order polynomials --- # Cubics .small[We needn't restrict ourselves to transformations to normality to only quadratic relationships. Many relationships, for example are cubic (third-power) in nature. Particularly true when there are measurement issues in the tails and/or floor/ceiling effects.] .pull-left[ **Strong cubic** <img src="EDUC643_16_nonlinearity_files/figure-html/unnamed-chunk-46-1.png" style="display: block; margin: auto;" /> ] -- .pull-right[ **Our DIBELS data** <img src="EDUC643_16_nonlinearity_files/figure-html/unnamed-chunk-47-1.png" style="display: block; margin: auto;" /> ] -- `$$\hat{W20\_ORF} = 2.81 + 1.47*F19\_ORF - 0.0010*F19\_ORF^2 - 0.000017 * F19\_ORF^3$$` --- # Other approaches .pull-left[ .small[ There are an infinite number of potentially effective transformations: - Squares, cubes, quartic, quintics, ... - Square roots, cube roots, fourth roots, ... - Logarithms (of any base), antilogarithms - Inverses - Trigonometric functions - Hyperbolic functions - Combinations of above... ] ] .pull-right[ .small[ Approaches to achieve local linearity: - Splines - Local estimated scatterplot smoothing (LOESS) ] ] -- **Some emerging issues:** <img src="EDUC643_16_nonlinearity_files/figure-html/unnamed-chunk-48-1.png" style="display: block; margin: auto;" /> --- class: middle, inverse # Synthesis and wrap-up --- # Different approaches .pull-left[ **Empirical approach** .small[ - Notice presence of non-linearity in relationship - Find an *ad-hoc* transformation of either the predictor, the outcome, or both that renders the relationship linear - Use OLS in the transformed world, and conduct inference there - De-transform fitted model to produce sensible plots ] ] -- .pull-right[ **Theory-driven approach** .small[ - Use theory or knowledge from prior research to postulate a non-linear model - Use non-linear regression (`nls` or other estimation packages) (part of the .red-pink[**Generalized Linear Model**] family) to fit the postulated trend in the real world and conduct inference there - Interpret parameter estimates directly - **We are not learning how to do this, but worth exploring yourself** ] ] --- # The Ladder and the Bulge .pull-left[ .red-pink[**Tukey's Ladder**] <img src="tukey_ladder.png" width="609" style="display: block; margin: auto;" /> ] .pull-right[ .red-pink[**Tukey's Bulge**] <img src="tukey_bulge.png" width="733" style="display: block; margin: auto;" /> ] --- ## Putting non-linearity together .small[ - **Remember to check your linearity assumption** + Use bivariate scatter plots + Use residual and Q-Q plots to diagnose - **Make sensible transformations** + Logarithmic, inverse, root and other functions can allow a return to a world of linearity and permit you to use the GLM tools of OLS to estimate non-linear relationships + Best to use transformations that are the most straightforward to interpret + Use Tukey's Bulge to guide what kind of transformation you will attempt + There is no one "right" transformation for a given data shape + Start with transforming x before y + Generally, do **not** use a "start" to log transform data that includes 0s + Inspect scatter plots post-transformation to check for success in linearizing - With large data, can be hard to see; consider binscatter options (by hand or `binsreg`; more on this in our next unit) - **Predictors can interact with themselves** + Quadratic and cubic models provide a flexible strategy for fitting non-linear models, especially those that cannot be linearized by logarithms + Be careful about overfitting and model instability with polynomials of order >3! + Quadratics and logs will often produce [similar fitted lines](https://daviddliebowitz.github.io/EDUC643_23W/slides/EDUC643_16_nonlinearity.html#95); quadratic allows direct statistical test for non-linearity, logarithm may fit with theory better and/or can be more readily interpretable ] --- # Goals of the unit - Describe in writing and verbally the assumptions we violate when we fit a non-linear relationship with a linear model - Transform non-linear relationships into linear ones by using logarithmic scales - Estimate regression models using logarithmic scales and interpret the results - Estimate and interpret models with quadratic and higher-order polynomial terms (special kinds of interactions) - Select between transformation options --- # To-Dos ### Reading: - **By 3/6 class**: McIntosh et al. (2021) and discussion questions ### Assignment 4: - Due March 10, 12:01p ### Final - Due March 20, 12:01p ### Re- (late) submissions - Everything due March 14, 5:00p (no exceptions) - Assignments with scores <90% only - Earn up to 90% --- # Log vs. quadratic <img src="EDUC643_16_nonlinearity_files/figure-html/unnamed-chunk-51-1.png" style="display: block; margin: auto;" />