Non-linearity

class: center, middle, inverse, title-slide

.title[
# Non-linearity
]
.subtitle[
## EDUC 643: Unit 5 Part II
]
.author[
### David D. Liebowitz
]

---

# Roadmap
<img src="Roadmap5.jpg" width="90%" style="display: block; margin: auto;" />

---
# Goals of the unit

- Describe in writing and verbally the assumptions we violate when we fit a non-linear relationship with a linear model
- Transform non-linear relationships into linear ones by using logarithmic scales 
- Estimate regression models using logarithmic scales and interpret the results
- Estimate models with quadratic and higher-order polynomial terms (special kinds of interactions)
- Select between transformation options

---
class: middle, inverse

# Non-linearity

---
# $ and learning
<img src="EDUC643_16_nonlinearity_files/figure-html/unnamed-chunk-2-1.png" style="display: block; margin: auto;" />

---
# $ and learning
<img src="EDUC643_16_nonlinearity_files/figure-html/unnamed-chunk-3-1.png" style="display: block; margin: auto;" />

---
# $ and learning
<img src="EDUC643_16_nonlinearity_files/figure-html/unnamed-chunk-4-1.png" style="display: block; margin: auto;" />

--
.small[***If assumptions hold***, each $10,000 diff in total spending associated, on average, with 4.3 scale score point difference in reading scores.]

--
.small[.blue[**But do they?**]]

---
# Linear?

```r
# Fit the model
fit <- lm(read_score ~ total_spending, data=pisa)
# Generate residual vs fitted plot
pisa$resid <- resid(fit)
pisa$fitted <- fitted(fit)
ggplot(pisa, aes(fitted, resid)) + geom_point() +
  geom_hline(yintercept = 0, color = "red", linetype="dashed")
```

---
<img src="pisa_spending.png" width="1048" style="display: block; margin: auto;" />

---
# Make it nice

--
<img src="EDUC643_16_nonlinearity_files/figure-html/unnamed-chunk-7-1.png" style="display: block; margin: auto;" />

--
At low levels of spending the relationship between ***total_spending*** and ***read_score*** has a big magnitude. At higher levels of spending, it seems much more modest (negative?).

---
# Piecewise
<img src="EDUC643_16_nonlinearity_files/figure-html/unnamed-chunk-8-1.png" style="display: block; margin: auto;" />

---
# Piecewise
<img src="EDUC643_16_nonlinearity_files/figure-html/unnamed-chunk-9-1.png" style="display: block; margin: auto;" />
---
# Piecewise
<img src="EDUC643_16_nonlinearity_files/figure-html/unnamed-chunk-10-1.png" style="display: block; margin: auto;" />

---
# Piecewise
While it is true, as we've said before that .red-pink[*locally all relationships are linear*], we've identified some emerging issues:

.small[
- Cut points arbitrary and these choices may substantially alter nature of observed relationship
- With large data "eyeballing" linear sub-segments impossible
- Increasing loss of power (larger standard errors and confidence intervals, greater influence of outliers)
- .red-pink[**Overfitting**] risks increase
  + Analysis conforms to particularly to your specific data, but generalizes poorly to population of inference
]

--
<img src="EDUC643_16_nonlinearity_files/figure-html/unnamed-chunk-11-1.png" style="display: block; margin: auto;" />

--
**Solutions**: transformations and polynomials

---
class: middle, inverse

## Logarithmic transformations in X

---
# Log transformations

- We can posit a .red-pink[**non-linear relationship**] between X and Y *in the population*
- Any non-linear relationship implies that the relationship between X and Y is relative to a particular value of X and/or Y, not absolute (the slope is non-constant)
- .red-pink[**Transformations**] (i.e., spreading out in some cases and compressing in others the values of our X and Y variables) allow us to fit non-linear relationships within the existing machinery of the general linear model

---
# Log transformations in life

.pull-left[
<img src="octave.png" width="224" style="display: block; margin: auto;" />
`$\uparrow$` .small[1 octave = doubling of cycles-per-second]

<img src="covid.png" width="660" style="display: block; margin: auto;" />
]

.pull-right[
<img src="seismograph.jpg" width="355" style="display: block; margin: auto;" />
.tiny2[
| Seismic-wave amplitude  | Location              | Richter Scale
|------------------------------------------------------------------
| 1,000,000               | Christchurch, 2010    | 6.0
| 10,000,000              | Port-au-Prince, 2010  | 7.0
| 100,000,000             | Sichuan, 2008         | 8.0
| 1,000,000,000           | Sumatra, 2004         | 9.0
]
`$\uparrow$` .small[1 Richter = 10x] `$\uparrow$` .small[SWA]
]

---
# A log 🌳 you say??

Logs are the function we can perform to "undo" raising a number to a power. If a number is equal to a base raised to a power `$(x = base^{power})$`, then a logarithim of a given base is the number you would have to raise to that power to get `$x$`:

.pull-left[
**Exponents**

`$10 = 10^1$`

`$100=10^2$`

`$1,000 = 10^3$`

`$10,000 = 10^4$`

`$100,000 = 10^5$`
]

.pull-right[
**Logarithms**

`$\text{log}_{10}(10)=1$`

`$\text{log}_{10}(100)=2$`

`$\text{log}_{10}(1,000)=3$`

`$\text{log}_{10}(100,000)=4$`

`$\text{log}_{10}(100,000)=5$`
]

Each 1 unit increase in a base-10 logarithm represents a 10-fold increase in `$x$`.

--
Can have logarithms of different base.

---
# A log 🌲 you say??

.pull-left[
**Exponents**

`$2 = 2^1$`

`$4 = 2^2$`

`$8 = 2^3$`

`$16 = 2^4$`

`$32 = 2^5$`
]

.pull-right[
**Logarithms**

`$\text{log}_{2}(2)=1$`

`$\text{log}_{2}(4)=2$`

`$\text{log}_{2}(8)=3$`

`$\text{log}_{2}(16)=4$`

`$\text{log}_{2}(32)=5$`
]

Each 1 unit increase in a base-2 logarithm represents a doubling of `$x$`.

Can say this as: “Log base 2 of 32 is 5” or “Log base 10 of 1,000 is 3”

---
# Understanding logs

**Some key concepts:**
- Taking logs spreads out the distance between small (closer to 0) values and compresses the distance between large (further from zero) values.
- Log base anything(1) is = 0
- Log base anything(0) is undefined (can't raise anything to a power and get 0)
- Log base anything(<0) (i.e., log of a negative number) is undefined (technically a complex number)
- Taking logs is a .red-pink[**monotonic**] transformation; doesn’t change the order of any of the underlying raw values

---
# $ and scores?
Let's try transforming our X variable (*total_spending*) on a logarithmic scale; can do this directly in our plot:

```r
log_flag <- flag +
              xlab("Total spending, age 6-15 (Log10 $)") +
              scale_x_log10(breaks=c(10000, 50000, 100000, 300000), 
                            label=scales::comma)
```

---
# $ and scores?
Let's try transforming our X variable (*total_spending*) on a logarithmic scale; can do this directly in our plot:
<img src="EDUC643_16_nonlinearity_files/figure-html/unnamed-chunk-17-1.png" style="display: block; margin: auto;" />

---
## Regress read on `$\text{log}_{10}(spend)$`

```r
summary(lm(read_score ~ log10(total_spending), data=pisa))
```

```
## 
## Call:
## lm(formula = read_score ~ log10(total_spending), data = pisa)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -136.50  -20.83   11.00   22.42   59.11 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             -78.03      69.14  -1.129    0.263    
## log10(total_spending)   112.74      14.46   7.798 8.06e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 35.59 on 63 degrees of freedom
## Multiple R-squared:  0.4911,	Adjusted R-squared:  0.4831 
## F-statistic:  60.8 on 1 and 63 DF,  p-value: 8.062e-11
```

---
# Conceptually

.pull-left[
<img src="EDUC643_16_nonlinearity_files/figure-html/unnamed-chunk-20-1.png" style="display: block; margin: auto;" />
`$$\hat{READ}_j = 428+0.00043 \times SPEND_j$$`

]

.pull-right[
<img src="EDUC643_16_nonlinearity_files/figure-html/unnamed-chunk-21-1.png" style="display: block; margin: auto;" />

`$$\hat{READ}_j = -78.03 + 112.74 \times \text{log}_{10}(SPEND_j)$$`

]

- In ed/dev psych this kind of curve is typically called a “learning curve”; represents standard rate of learning
- More broadly, "increasing exponential decay" or "diminishing marginal returns"

---
# Interpret
<img src="EDUC643_16_nonlinearity_files/figure-html/unnamed-chunk-22-1.png" style="display: block; margin: auto;" />

**Some alternative ways to describe this relationship:**
.tiny[
- Average reading scores in the population of countries sitting for the 2018 PISA reading test were 112.7 points higher for every ten-fold increase in cumulative educational spending on children aged 6-15.
- As cumulative education spending on children aged 6-15 is ten times higher, reading scores in the population of countries sitting for the 2018 PISA reading test were 112.7 points higher, on average.
- We predict that two countries that spend an order of magnitude (e.g., $10,000 vs. $100,00) apart on cumulative educational expenditures on children aged 6-15 will have PISA reading scores 112.7 points apart.
]

---
class: middle, inverse

# Log transformations in Y

## aka Exponential growth curve

---
# GDP and PPE
<img src="EDUC643_16_nonlinearity_files/figure-html/unnamed-chunk-23-1.png" style="display: block; margin: auto;" />

---
# GDP and PPE
<img src="EDUC643_16_nonlinearity_files/figure-html/unnamed-chunk-24-1.png" style="display: block; margin: auto;" />

---
# An alternative model

The relationship of GDP and PPE are relative to their respective values. The relationship has a smaller magnitude when GDP per capita is smaller and a larger magnitude when GDP per capita is larger.

--
Can use a log transformation to capture the non-absolute (non-constant) nature of the slope:

`$$PPE_j = \beta_0 * 2^{(\beta_1 GDP_j + \varepsilon)}$$`

`$$\text{log}_2(PPE_j) = \text{log}_2 \beta_0 + \beta_1 GDP_j + \varepsilon$$`
---
# Interpreting this

Can interpret log outcomes as percent changes because:
`$$Y_1 = \beta_0 2^{\beta_1 X}$$`
`$$Y_2 = \beta_0 2^{\beta_1(X+1)} = \beta_0 2^{\beta_1 X} 2^{\beta_1}$$`

`$$\frac{Y_2}{Y_1} = \frac{\beta_0 2^{\beta_1 X} 2^{\beta_1}}{\beta_0 2^{\beta_1 X}} = 2^{\beta_1}$$`

So, `$Y_2$` is `$2^{\beta_1}$` times larger than `$Y_1$`!

--
Depends on key properties of logs:
- log(xy) = log(x) + log(y)
- `$\text{log}(x^p)$` = p*log(x)

.red-pink[**Percent growth rate**] = `$\large 100*(2^{\beta_1} - 1)$`

.small[Regress log(Y) on X and substitute the estimated slope into the equation for the percent growth rate to obtain the estimated percent growth rate per unit change in X.]

`$Y_2 = 2^{\beta_1} Y_1$` is the same thing as saying the percent growth rate is `$100*(2^{\beta_1} - 1)$`

---
# Visualized Y transformation

```r
oecd$log2ppe <- log2(oecd$ppe)

log_ppe <- ggplot(oecd, aes(x=gdp, y=log2ppe))
```

---
# Visualized Y transformation
<img src="EDUC643_16_nonlinearity_files/figure-html/unnamed-chunk-26-1.png" style="display: block; margin: auto;" />

---
## Regress `$\text{log}_{2}(ppe)$` on gdp

```r
summary(lm(log2(ppe) ~ gdp, oecd))
```

```
...
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.39728 -0.09378  0.01867  0.11920  0.31357 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1.176e+01  1.113e-01   105.7   <2e-16 ***
## gdp         3.899e-05  2.484e-06    15.7   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1712 on 32 degrees of freedom
## Multiple R-squared:  0.8851,	Adjusted R-squared:  0.8815 
## F-statistic: 246.5 on 1 and 32 DF,  p-value: < 2.2e-16
...
```

--
**Percent growth rate**: `$100(2^{0.000039} - 1) = 0.0027\%$` ; for each $1 more of GDP per person, PPE is 0.0027% higher; or for each $1,000 more of GDP per person, PPE is 2.7% higher

---
# Interpreting log Y results
<img src="EDUC643_16_nonlinearity_files/figure-html/unnamed-chunk-28-1.png" style="display: block; margin: auto;" />
`$$\text{log}_2 (\hat{PPE}_j) = 11.8 + 0.000039 * GDP_j$$`

> Per capita gross domestic product (GDP) is a strong predictor of yearly per-student expenditure from primary through tertiary education. In particular, if we compare two countries whose GDPs differ by $1,000, we would predict that the wealthier country would have per pupil expenditure that is 2.7 ***percent*** higher than the country with the smaller economy.

---
class: middle, inverse

# Log-log transformations

## aka proportional growth

---
# Which 🌲 to harvest?

- Could theoretically select a log of any base to transform outcome or predictor or both to a linear relationship
- Much more sensible to restrict yourself to base_10, base_2 or the .red-pink[**natural log**]; comes from Euler's number `$(e)$`
  
`$$e = \lim_{n \to \infty}(1 + \frac{1}{n})^n \approx 2.718281828459...$$`

- .red-pink[**Natural log**]: `$\text{log}_{2.718...}(x) = \text{log}_e(x) = \text{ln}(x)$`

---
# All the countries
<img src="EDUC643_16_nonlinearity_files/figure-html/unnamed-chunk-29-1.png" style="display: block; margin: auto;" />

---
# Log-log transformations

```r
oecd2$lngdp <- log(oecd2$gdp)
oecd2$lnppe <- log(oecd2$ppe)

ln_ppe <- ggplot(oecd2, aes(x=lngdp, y=lnppe))
```

---
# Log-log transformations
<img src="EDUC643_16_nonlinearity_files/figure-html/unnamed-chunk-31-1.png" style="display: block; margin: auto;" />

---
## Regress `$\text{ln}(ppe)$` on `$\text{ln}(gdp)$`

```r
summary(lm(log(ppe) ~ log(gdp), oecd2))
```

```
...
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.43570 -0.04076  0.01302  0.07489  0.26542 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.39273    0.72674   -0.54    0.592    
## log(gdp)     0.91274    0.06801   13.42 3.83e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1509 on 34 degrees of freedom
## Multiple R-squared:  0.8412,	Adjusted R-squared:  0.8365 
## F-statistic: 180.1 on 1 and 34 DF,  p-value: 3.826e-15
...
```

--
`$$\hat{LnPPE}_j = -0.39 + 0.91 * LnGDP_j$$`

---
# Interpreting this

Can interpret log-log relationships in percent terms. `$\color{red}{\beta_1}$` .red-pink[**represents the % change in Y per 1% change in X.**]

.pull-left[
.small[**Postulated model:**]
- `$Y = \beta_0 X^{\beta_1}e^{\varepsilon}$`
- `$\text{ln}(Y) = \text{ln}(\beta_0 X^{\beta_1}e^{\varepsilon})$`
- `$\text{ln}(Y) = \text{ln}(\beta_0) + \text{ln}(X^{\beta_1}) + \text{ln}(e^{\varepsilon})$`
- `$\text{ln}(Y) = \text{ln}(\beta_0) + \beta_1 \text{ln}(X) + \varepsilon$`
]

.pull-right[
.small[**Imagine**] `$Y_1$` .small[**and**] `$Y_2$` .small[**are 1% (or 0.01) apart**:]
- `$Y_1 = \beta_0 X^{\beta_1}$`
- `$Y_2 = \beta_0(1.01X)^{\beta_1} = \beta_0 X^{\beta_1}(1.01)^{\beta_1}$`
- `$\frac{Y_2}{Y_1} = \frac{\beta_0 X^{\beta_1}}{\beta_0 X^{\beta_1}} = (1.01)^{\beta_1}$`

So `$Y_2$` is `$(1.01)^{\beta_1}$` times larger than `$Y_1$`
]

Regress ln(Y) on ln(X) and the slope estimate is the estimated percent difference in Y per 1 percent difference in X

---
# Interpret log-log relationship

```r
summary(lm(log(ppe) ~ log(gdp), oecd2))
```

--
"1 percent change in GDP predicts 0.91 percent change in PPE"

---
# Interpret log-log relationship
<img src="EDUC643_16_nonlinearity_files/figure-html/unnamed-chunk-34-1.png" style="display: block; margin: auto;" />
`$$\text{ln}(\hat{PPE}_j) = \text{ln}(\beta_0) + \beta_1 \text{ln}(GDP_j) + \varepsilon$$`
> We predict that, on average, comparing two countries with GDP per capita separated by 1 percent the wealthier country will spend 0.91 percent more on its pupils across primary through tertiary education.

---
## "Forbidden" log transformations
So far, we've been dealing with situations in which all the variables we needed to transform were non-zero. In fact this is often not the case:
<img src="EDUC643_16_nonlinearity_files/figure-html/unnamed-chunk-35-1.png" style="display: block; margin: auto;" />

--
Many other instances: counts of behaviors, individual income, absences, scale scores, etc.

---
## "Forbidden" log transformations

Traditional approach: 
- Add a small "starter" value to all raw values (+1, +0.1, +0.01, +0.001, etc.)
- Take log of this .red-pink[**"zero-inflated"**] variable

.red[**DO NOT DO THIS!!!**]

- Value selected for starter and proportion of 0s in your data can results in wildly inconsistent coefficient estimates
- You'll address this issue in EDUC 645 with Poisson regression
    - Can also (potentially) be addressed with an inverse hyperbolic sine transformation

---

.pull-left[
<img src="EDUC643_16_nonlinearity_files/figure-html/unnamed-chunk-37-1.png" style="display: block; margin: auto;" />
.small[
- Regress Y on log(X)
- `$Y = \hat{\beta_0} + \hat{\beta_1}\text{log}(X)$`
- "every doubling (or whatever base) of X associated with `$\hat{\beta_1}$` diff in Y"
]
]

.pull-right[
<img src="EDUC643_16_nonlinearity_files/figure-html/unnamed-chunk-38-1.png" style="display: block; margin: auto;" />
.small[
- Regress log(Y) on X
- `$\text{log}(Y) = \hat{\beta_0} + \hat{\beta}_1 X$`
- Every 1 unit diff in X associated with `$100(e^{\hat{\beta_1}} - 1)$` % diff in Y
]
]

<img src="EDUC643_16_nonlinearity_files/figure-html/unnamed-chunk-39-1.png" style="display: block; margin: auto;" />
.small[
- Regress log(Y) on log(X)
- `$\text{log}(Y) = \hat{\beta_0} + \hat{\beta_1}\text{log}(X)$`
- Every 1% diff in X associated with `$\hat{\beta_1}$` percent diff in Y
]

---
class: middle, inverse

# Quadratic terms: a special kind of interaction

---
# Quadratic model
<img src="EDUC643_16_nonlinearity_files/figure-html/unnamed-chunk-40-1.png" style="display: block; margin: auto;" />

Effects of a predictor can differ by that predictor:
`$$Y = \beta_0 + \beta_1 X_1 + \beta_2 (X_1 * X_1) + \varepsilon$$`

`$$Y = \beta_0 + \beta_1 X_1 + \beta_2 X_1^2 + \varepsilon$$`

Can point upwards or downwards, but **all quadratic relationships are .red-pink[non-monotic]; the relationship both rises and falls (or falls and rises)**

---
# A quadratic relationship
<img src="EDUC643_16_nonlinearity_files/figure-html/unnamed-chunk-41-1.png" style="display: block; margin: auto;" />

.blue[**Which direction will the quadratic line of best fit point?**]

---
# A quadratic relationship
<img src="EDUC643_16_nonlinearity_files/figure-html/unnamed-chunk-42-1.png" style="display: block; margin: auto;" />

---
# A quadratic relationship
<img src="EDUC643_16_nonlinearity_files/figure-html/unnamed-chunk-43-1.png" style="display: block; margin: auto;" />

--
We can represent quadratic fits mathematically in generic form: `$y = \beta_0 + \beta_1 x + \beta_2 x^2$`.

--
.blue[**Challenge: what signs will each of the three coefficients take for the above relationship?**]

---
# Fitting the quadratic

```r
summary(lm(read_score ~ total_spending + I(total_spending^2), pisa))
```

```
...
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -98.511 -15.722   3.806  22.651  59.394 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          3.728e+02  9.665e+00  38.574  < 2e-16 ***
## total_spending       1.750e-03  1.798e-04   9.732 4.22e-14 ***
## I(total_spending^2) -5.260e-09  6.498e-10  -8.096 2.70e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 31.34 on 62 degrees of freedom
## Multiple R-squared:  0.6117,	Adjusted R-squared:  0.5992 
## F-statistic: 48.84 on 2 and 62 DF,  p-value: 1.834e-13
...
```

--
Fitted equation: `$\hat{read} = 372.8 + 0.00175 * spend - 0.00000000526 * spend^2$`.

--
.blue[**How do our model fit statistics compare to the linear version?**]

---
# The "right" fit to data

.pull-left[
<img src="EDUC643_16_nonlinearity_files/figure-html/unnamed-chunk-45-1.png" style="display: block; margin: auto;" /><img src="EDUC643_16_nonlinearity_files/figure-html/unnamed-chunk-45-2.png" style="display: block; margin: auto;" />
]

.small[
.pull-right[
- A declining relationship between spending and performance doesn't make much substantive sense, so we would probably not use a quadratic fit for our full data
- However, without Qatar and Luxembourg, a quadratic describes the relationship quite nicely

- Don't extrapolate the shape of the parabola to the left of the y-axis
- Shouldn't assume the y values will be higher to the left of the y-axis
]
]

---
class: middle, inverse

# Higher-order polynomials

---
# Cubics
.small[We needn't restrict ourselves to transformations to normality to only quadratic relationships. Many relationships, for example are cubic (third-power) in nature. Particularly true when there are measurement issues in the tails and/or floor/ceiling effects.]

.pull-left[
**Strong cubic**
<img src="EDUC643_16_nonlinearity_files/figure-html/unnamed-chunk-46-1.png" style="display: block; margin: auto;" />
]

.pull-right[
**Our DIBELS data**
<img src="EDUC643_16_nonlinearity_files/figure-html/unnamed-chunk-47-1.png" style="display: block; margin: auto;" />
]

`$$\hat{W20\_ORF} = 2.81 + 1.47*F19\_ORF - 0.0010*F19\_ORF^2 - 0.000017 * F19\_ORF^3$$`

---
# Other approaches

.pull-left[
.small[
There are an infinite number of potentially effective transformations:
- Squares, cubes, quartic, quintics, ...
- Square roots, cube roots, fourth roots, ...
- Logarithms (of any base), antilogarithms
- Inverses
- Trigonometric functions
- Hyperbolic functions
- Combinations of above...
]
]

.pull-right[
.small[
Approaches to achieve local linearity:
- Splines
- Local estimated scatterplot smoothing (LOESS)
]
]

**Some emerging issues:**
<img src="EDUC643_16_nonlinearity_files/figure-html/unnamed-chunk-48-1.png" style="display: block; margin: auto;" />

---
class: middle, inverse
# Synthesis and wrap-up

---
# Different approaches

.pull-left[
**Empirical approach**
.small[
- Notice presence of non-linearity in relationship
- Find an *ad-hoc* transformation of either the predictor, the outcome, or both that renders the relationship linear
- Use OLS in the transformed world, and conduct inference there
- De-transform fitted model to produce sensible plots
]
]

.pull-right[
**Theory-driven approach**
.small[
- Use theory or knowledge from prior research to postulate a non-linear model
- Use non-linear regression (`nls` or other estimation packages) (part of the .red-pink[**Generalized Linear Model**] family) to fit the postulated trend in the real world and conduct inference there
- Interpret parameter estimates directly
- **We are not learning how to do this, but worth exploring yourself**
]
]

---
# The Ladder and the Bulge

.pull-left[
.red-pink[**Tukey's Ladder**]
<img src="tukey_ladder.png" width="609" style="display: block; margin: auto;" />
]

.pull-right[
.red-pink[**Tukey's Bulge**]
<img src="tukey_bulge.png" width="733" style="display: block; margin: auto;" />
]

---
## Putting non-linearity together
.small[
- **Remember to check your linearity assumption**
   + Use bivariate scatter plots
   + Use residual and Q-Q plots to diagnose
- **Make sensible transformations**
   + Logarithmic, inverse, root and other functions can allow a return to a world of linearity and permit you to use the GLM tools of OLS to estimate non-linear relationships
   + Best to use transformations that are the most straightforward to interpret
   + Use Tukey's Bulge to guide what kind of transformation you will attempt
   + There is no one "right" transformation for a given data shape
   + Start with transforming x before y
   + Generally, do **not** use a "start" to log transform data that includes 0s
   + Inspect scatter plots post-transformation to check for success in linearizing
      - With large data, can be hard to see; consider binscatter options (by hand or `binsreg`; more on this in our next unit)
- **Predictors can interact with themselves**
   + Quadratic and cubic models provide a flexible strategy for fitting non-linear models, especially those that cannot be linearized by logarithms
   + Be careful about overfitting and model instability with polynomials of order >3!
   + Quadratics and logs will often produce [similar fitted lines](https://daviddliebowitz.github.io/EDUC643_23W/slides/EDUC643_16_nonlinearity.html#95); quadratic allows direct statistical test for non-linearity, logarithm may fit with theory better and/or can be more readily interpretable
]
---
# Goals of the unit

- Describe in writing and verbally the assumptions we violate when we fit a non-linear relationship with a linear model
- Transform non-linear relationships into linear ones by using logarithmic scales 
- Estimate regression models using logarithmic scales and interpret the results
- Estimate and interpret models with quadratic and higher-order polynomial terms (special kinds of interactions)
- Select between transformation options

---
# To-Dos

### Reading:
- **By 3/6 class**: McIntosh et al. (2021) and discussion questions

### Assignment 4:
- Due March 10, 12:01p

### Final
- Due March 20, 12:01p

### Re- (late) submissions
- Everything due March 14, 5:00p (no exceptions)
- Assignments with scores <90% only
- Earn up to 90%

---
# Log vs. quadratic
<img src="EDUC643_16_nonlinearity_files/figure-html/unnamed-chunk-51-1.png" style="display: block; margin: auto;" />