Interactions

class: center, middle, inverse, title-slide

.title[
# Interactions
]
.subtitle[
## EDUC 643: Unit 5 Part I
]
.author[
### David D. Liebowitz
]

---

# Roadmap
<img src="Roadmap5.jpg" width="90%" style="display: block; margin: auto;" />

---
# Goals of the unit

- Describe the main effects assumption and how this assumption can be relaxed using the statistical interaction model
- Describe in writing and verbally the concept of statistical interaction
- Estimate and interpret regression models with interactions between categorical and continuous predictors
- Estimate and interpret regression models with interactions between categorical and continuous predictors
- Estimate and interpret regression models with interactions between continuous predictors
- Visualize interaction effects graphically
- Describe statistical power and Type II error challenges resulting from interactions

.gray[
- Describe in writing and verbally the assumptions we violate when we fit a non-linear relationship with a linear model
- Transform non-linear relationships into linear ones by using logarithmic scales 
- Estimate regression models using logarithmic scales and interpret the results
- Estimate models with quadratic and higher-order polynomial terms (special kinds of interactions)
- Select between transformation options
]

---
# Our motivating question
.small[A team of researchers based at the .green[**University of Oregon**] aimed to understand the effects of the COVID-19 pandemic on students' early literacy skills.<sup>1</sup>]
<img src="dibels_team.png" width="1281" style="display: block; margin: auto;" />

.small[Ann Swindells Professor in Special Education [Gina Biancarosa](https://education.uoregon.edu/directory/faculty/all/ginab), former UO doctoral students David Fainstein, Chris Ives, and Dave Furjanic, along with CTL Research Manager Patrick Kennedy, used data from assessments of 471,456 students across 1,684 schools on the Dynamic Indicators of Basic Early Literacy Skills (DIBELS) to analyze the extent to which students' Oral Reading Fluency (ORF) scores differed across four waves of DIBELS assessment prior-to and during the pandemic.]

.small[Their study is published in [*The Elementary School Journal*](https://www.journals.uchicago.edu/doi/full/10.1086/730115).]

.footnote[[1] For various reasons, the pandemic is a ["lousy natural experiment"](https://www.educationnext.org/covid-19-pandemic-lousy-natural-experiment-for-studying-the-effects-online-learning/) for examining the effects of a particular policy response (e.g, virtual schooling). However, it is quite possible to seek to understand its global effects via just the type of analysis Furjanic et al. conducted.]

---
# Our answer to-date
<table style="NAborder-bottom: 0; width: auto !important; margin-left: auto; margin-right: auto;" class="table">
 <thead>
  <tr>
   <th style="text-align:left;">   </th>
   <th style="text-align:center;">  (1) </th>
   <th style="text-align:center;">   (2) </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> (Intercept) </td>
   <td style="text-align:center;"> 62.344*** </td>
   <td style="text-align:center;"> 16.075*** </td>
  </tr>
  <tr>
   <td style="text-align:left;">  </td>
   <td style="text-align:center;"> (0.449) </td>
   <td style="text-align:center;"> (0.326) </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Winter 2020 </td>
   <td style="text-align:center;"> 25.562*** </td>
   <td style="text-align:center;"> 25.562*** </td>
  </tr>
  <tr>
   <td style="text-align:left;">  </td>
   <td style="text-align:center;"> (0.696) </td>
   <td style="text-align:center;"> (0.332) </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Fall 2020 </td>
   <td style="text-align:center;"> -2.791*** </td>
   <td style="text-align:center;"> -2.791*** </td>
  </tr>
  <tr>
   <td style="text-align:left;">  </td>
   <td style="text-align:center;"> (0.631) </td>
   <td style="text-align:center;"> (0.311) </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Winter 2021 </td>
   <td style="text-align:center;"> 19.454*** </td>
   <td style="text-align:center;"> 19.454*** </td>
  </tr>
  <tr>
   <td style="text-align:left;box-shadow: 0px 1.5px">  </td>
   <td style="text-align:center;box-shadow: 0px 1.5px"> (0.700) </td>
   <td style="text-align:center;box-shadow: 0px 1.5px"> (0.330) </td>
  </tr>
  <tr>
   <td style="text-align:left;"> <b>Covariates?</b> </td>
   <td style="text-align:center;"> <b>No</b> </td>
   <td style="text-align:center;"> <b>Yes</b> </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Num.Obs. </td>
   <td style="text-align:center;"> 21584 </td>
   <td style="text-align:center;"> 21584 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> R2 </td>
   <td style="text-align:center;"> 0.102 </td>
   <td style="text-align:center;"> 0.795 </td>
  </tr>
</tbody>
<tfoot>
<tr><td style="padding: 0; " colspan="100%">
<sup></sup> + p &lt; 0.1, * p &lt; 0.05, ** p &lt; 0.01, *** p &lt; 0.001</td></tr>
<tr><td style="padding: 0; " colspan="100%">
<sup></sup> Cells report coefficients and heteroscedastic-robust standard errors in parentheses. Each observation is a school-grade-test value. Covariates include grade-level and total school enrollment.</td></tr>
</tfoot>
</table>

---
# Might differ by context?
<table style="NAborder-bottom: 0; width: auto !important; margin-left: auto; margin-right: auto;" class="table">
 <thead>
  <tr>
   <th style="text-align:left;">   </th>
   <th style="text-align:center;">  (1) </th>
   <th style="text-align:center;">   (2) </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> (Intercept) </td>
   <td style="text-align:center;"> 62.344*** </td>
   <td style="text-align:center;"> 16.075*** </td>
  </tr>
  <tr>
   <td style="text-align:left;">  </td>
   <td style="text-align:center;"> (0.449) </td>
   <td style="text-align:center;"> (0.326) </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Winter 2020 </td>
   <td style="text-align:center;"> 25.562*** </td>
   <td style="text-align:center;"> 25.562*** </td>
  </tr>
  <tr>
   <td style="text-align:left;">  </td>
   <td style="text-align:center;"> (0.696) </td>
   <td style="text-align:center;"> (0.332) </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Fall 2020 </td>
   <td style="text-align:center;"> -2.791*** </td>
   <td style="text-align:center;"> -2.791*** </td>
  </tr>
  <tr>
   <td style="text-align:left;">  </td>
   <td style="text-align:center;"> (0.631) </td>
   <td style="text-align:center;"> (0.311) </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Winter 2021 </td>
   <td style="text-align:center;"> 19.454*** </td>
   <td style="text-align:center;"> 19.454*** </td>
  </tr>
  <tr>
   <td style="text-align:left;box-shadow: 0px 1.5px">  </td>
   <td style="text-align:center;box-shadow: 0px 1.5px"> (0.700) </td>
   <td style="text-align:center;box-shadow: 0px 1.5px"> (0.330) </td>
  </tr>
  <tr>
   <td style="text-align:left;"> <b>Covariates?</b> </td>
   <td style="text-align:center;"> <b>No</b> </td>
   <td style="text-align:center;"> <b>Yes</b> </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Num.Obs. </td>
   <td style="text-align:center;"> 21584 </td>
   <td style="text-align:center;"> 21584 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> R2 </td>
   <td style="text-align:center;"> 0.102 </td>
   <td style="text-align:center;"> 0.795 </td>
  </tr>
</tbody>
<tfoot>
<tr><td style="padding: 0; " colspan="100%">
<sup></sup> + p &lt; 0.1, * p &lt; 0.05, ** p &lt; 0.01, *** p &lt; 0.001</td></tr>
<tr><td style="padding: 0; " colspan="100%">
<sup></sup> Cells report coefficients and heteroscedastic-robust standard errors in parentheses. Each observation is a school-grade-test value. Covariates include grade-level and total school enrollment.</td></tr>
</tfoot>
</table>

---
# Some definitions

.large[.red-pink[**Statistical interactions**]]
- When the relationship between one predictor variable and the outcome differs by the level of another predictor
- A statistical term

.large[.red-pink[**Moderation effects**]]
- A substantive interpretation of the statistical interaction
- In practice, used interchangeably with interactions

.large[.red-pink[**Mediation effects**]]
- A third variable that "explains" why or how one predictor variable is related to the outcome
- Question predictor related to the mediator variable, which in turn is related to the outcome
- Requires some strong assumptions to interpret as "mediation" or "mechanism" that are highly dependent on research design
- We'll return to this in our final unit (*not what we're talking about now*)

---
# Statistical interactions abound

**Teacher-child interaction quality moderates social risks of problem behavior<sup>1</sup>**

> "We found a negative interaction between early peer context problems and classroom instructional support in the prediction of disconnected play. In classrooms with low instructional quality, children who displayed early problem behavior in peer contexts displayed higher disconnected play in the spring. However, in classrooms with higher instructional support, this association was weakened" (pg. 9)

.footnote[[1] Bulotsky-Hearer, R., Fernandez, V., Bichay-Awadalla, K., Bailey, J. Futterer, J. & Qi, C. (2020). Teacher-child interaction quality moderates social risks associated with problem behavior in preschool classroom contexts. *Journal of Applied Developmental Psychology, 67*, 101103.]

---
# Main effects model
<img src="EDUC643_13_interactions_files/figure-html/unnamed-chunk-6-1.png" style="display: block; margin: auto;" />
**Parallel lines**: Fitted lines are parallel because the main effects model assumes that the effect of each predictor is identical and independent of the values of all other other predictors in the model.

---
# Interaction model
<img src="EDUC643_13_interactions_files/figure-html/unnamed-chunk-7-1.png" style="display: block; margin: auto;" />
**Unparallel lines**: Fitted lines are **not** assumed parallel because the interaction model allows the effect of each predictor to differ by the values of other predictor(s) in the model.

---
# Some distinctions

.pull-left[
**Ordinal interaction**
.small[
- Direction of predictor's effect consistent across moderator's levels, but **magnitude** differs
]
<img src="EDUC643_13_interactions_files/figure-html/unnamed-chunk-8-1.png" style="display: block; margin: auto;" />
]

.pull-left[
**Disordinal interaction**
.small[
- **Direction** of predictor's relationship with outcome differs across moderator's levels
]
<img src="EDUC643_13_interactions_files/figure-html/unnamed-chunk-9-1.png" style="display: block; margin: auto;" />
]

These are somewhat arbitrary distinctions, as all non-parallel lines eventually intersect. However, the key is to constrain your interpretation **within the range of your data**.

Other interactions terms you may encounter: ***synergistic interaction***, ***buffering interaction***. These are subject- and discipline-specific.

---
# Additional research questions

.large[
1. How did students' Oral Reading Fluency (ORF) trajectories differ pre- and post-pandemic-onset?
   + Main effects model
2. To what extent did differences in students' ORF trajectories pre- and post-pandemic-onset differ by the proportion of students receiving free- or reduced-price lunch (FRPL) in their schools?
   + Interaction model: **categorical X continuous**
3. To what extent did differences in students' ORF trajectories pre- and post-pandemic-onset differ by the Title I status of their schools?
   + Interaction model: **categorical x categorical**
4. To what extent do students' average ORF scores differ by the rate of FRPL-receipt and enrollment in their schools?
   + Interaction model: **continuous X continuous**
]

---
class: middle, inverse

# Interactions: categorical and continuous

---
# Period as predictor

```r
summary(lm(mean_orf ~ period, dibels_long))
```

```
## 
## Call:
## lm(formula = mean_orf ~ period, data = dibels_long)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -84.178 -30.719   2.705  28.803 122.202 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   62.3436     0.4921 126.701  < 2e-16 ***
## periody1_moy  25.5620     0.6959  36.734  < 2e-16 ***
## periody2_boy  -2.7914     0.6959  -4.011 6.06e-05 ***
## periody2_moy  19.4541     0.6959  27.957  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 36.15 on 21580 degrees of freedom
## Multiple R-squared:  0.1021,	Adjusted R-squared:  0.1019 
## F-statistic: 817.7 on 3 and 21580 DF,  p-value: < 2.2e-16
```

---
# *FRPL_PROP* as predictor

```r
summary(lm(mean_orf ~ frpl_prop, dibels_long))
```

```
## 
## Call:
## lm(formula = mean_orf ~ frpl_prop, data = dibels_long)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -87.926 -30.046   1.351  27.450 130.115 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  90.2822     0.4410  204.70   <2e-16 ***
## frpl_prop   -37.1145     0.7802  -47.57   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 36.29 on 21582 degrees of freedom
## Multiple R-squared:  0.09491,	Adjusted R-squared:  0.09486 
## F-statistic:  2263 on 1 and 21582 DF,  p-value: < 2.2e-16
```
---
# Both

```r
summary(lm(mean_orf ~ period + frpl_prop, dibels_long))
```

```
## 
## Call:
## lm(formula = mean_orf ~ period + frpl_prop, data = dibels_long)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -95.916 -27.204   1.903  27.194 121.217 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   79.7260     0.5788 137.743  < 2e-16 ***
## periody1_moy  25.5620     0.6581  38.843  < 2e-16 ***
## periody2_boy  -2.7914     0.6581  -4.242 2.23e-05 ***
## periody2_moy  19.4541     0.6581  29.562  < 2e-16 ***
## frpl_prop    -37.1145     0.7349 -50.501  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 34.18 on 21579 degrees of freedom
## Multiple R-squared:  0.197,	Adjusted R-squared:  0.1968 
## F-statistic:  1323 on 4 and 21579 DF,  p-value: < 2.2e-16
```

---
# Testing for an interaction

1\. **Create a cross-product term**, which is the product (the interaction) of the two predictors whose interaction you want to test
  + We can create this term by hand or we can ask R to do this for us within our estimating equation

2\. **Include the cross-product in a multiple-regression model** ***alongside the constituent main effects*** of the variables that make up the cross-product term

`$$MEAN\_ORF_j = \beta_0 + \beta_1 FRPL_j + \beta_2 PERIOD_j  + \color{red}{\beta_3 FRPL \times PERIOD_j} + \varepsilon_j$$`
      
  + R will automatically do this if we specify our interaction term appropriately
  + For now, assume you should **always** include the main effects (e.g., `$FRPL_j$` and `$PERIOD_j$`). Way down the line, there are cases where it might make sense to not do this.

3\. Test `$H_0$`: `$\beta_{\text{cross-product}}=0$`

---
# What will parameters mean?

`$$MEAN\_ORF_j = \beta_0 + \beta_1 FRPL_j + \color{green}{\beta_2 PERIOD_j}  + \color{red}{\beta_3 FRPL \times PERIOD_j} + \varepsilon_j$$`
`$\color{green}{\beta_2}$` tells us the difference in the **intercept** when FRPL=0. I.e., how did ORF scores differ at different time periods for schools with no students receiving FRPL.

`$\color{red}{\beta_3}$` tells us the difference in the **slope**. I.e., how did ORF scores differ for different levels of FRPL-receipt in a school, at different time periods across the different waves of the assessments

---
# Estimating in R

```r
summary(lm(mean_orf ~ period * frpl_prop, dibels_long))
```

```
...
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             78.3362     0.8304  94.331  < 2e-16 ***
## periody1_moy            28.4229     1.1744  24.202  < 2e-16 ***
## periody2_boy            -3.5915     1.1744  -3.058 0.002230 ** 
## periody2_moy            22.9528     1.1744  19.544  < 2e-16 ***
## frpl_prop              -34.1469     1.4690 -23.245  < 2e-16 ***
## periody1_moy:frpl_prop  -6.1084     2.0775  -2.940 0.003282 ** 
## periody2_boy:frpl_prop   1.7084     2.0775   0.822 0.410903    
## periody2_moy:frpl_prop  -7.4703     2.0775  -3.596 0.000324 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 34.16 on 21576 degrees of freedom
## Multiple R-squared:  0.198,	Adjusted R-squared:  0.1978 
## F-statistic: 761.1 on 7 and 21576 DF,  p-value: < 2.2e-16
...
```

---
# Estimating in R

```r
summary(lm(mean_orf ~ period * frpl_prop, dibels_long))
```

```
...
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             78.3362     0.8304  94.331  < 2e-16 ***
*## periody1_moy            28.4229     1.1744  24.202  < 2e-16 ***
*## periody2_boy            -3.5915     1.1744  -3.058 0.002230 ** 
*## periody2_moy            22.9528     1.1744  19.544  < 2e-16 ***
## frpl_prop              -34.1469     1.4690 -23.245  < 2e-16 ***
## periody1_moy:frpl_prop  -6.1084     2.0775  -2.940 0.003282 ** 
## periody2_boy:frpl_prop   1.7084     2.0775   0.822 0.410903    
## periody2_moy:frpl_prop  -7.4703     2.0775  -3.596 0.000324 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 34.16 on 21576 degrees of freedom
## Multiple R-squared:  0.198,	Adjusted R-squared:  0.1978 
## F-statistic: 761.1 on 7 and 21576 DF,  p-value: < 2.2e-16
...
```

---
# Estimating in R

```r
summary(lm(mean_orf ~ period * frpl_prop, dibels_long))
```

```
...
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             78.3362     0.8304  94.331  < 2e-16 ***
## periody1_moy            28.4229     1.1744  24.202  < 2e-16 ***
## periody2_boy            -3.5915     1.1744  -3.058 0.002230 ** 
## periody2_moy            22.9528     1.1744  19.544  < 2e-16 ***
*## frpl_prop              -34.1469     1.4690 -23.245  < 2e-16 ***
## periody1_moy:frpl_prop  -6.1084     2.0775  -2.940 0.003282 ** 
## periody2_boy:frpl_prop   1.7084     2.0775   0.822 0.410903    
## periody2_moy:frpl_prop  -7.4703     2.0775  -3.596 0.000324 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 34.16 on 21576 degrees of freedom
## Multiple R-squared:  0.198,	Adjusted R-squared:  0.1978 
## F-statistic: 761.1 on 7 and 21576 DF,  p-value: < 2.2e-16
...
```

---
# Estimating in R

```r
summary(lm(mean_orf ~ period * frpl_prop, dibels_long))
```

```
...
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             78.3362     0.8304  94.331  < 2e-16 ***
## periody1_moy            28.4229     1.1744  24.202  < 2e-16 ***
## periody2_boy            -3.5915     1.1744  -3.058 0.002230 ** 
## periody2_moy            22.9528     1.1744  19.544  < 2e-16 ***
## frpl_prop              -34.1469     1.4690 -23.245  < 2e-16 ***
*## periody1_moy:frpl_prop  -6.1084     2.0775  -2.940 0.003282 ** 
*## periody2_boy:frpl_prop   1.7084     2.0775   0.822 0.410903    
*## periody2_moy:frpl_prop  -7.4703     2.0775  -3.596 0.000324 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 34.16 on 21576 degrees of freedom
## Multiple R-squared:  0.198,	Adjusted R-squared:  0.1978 
## F-statistic: 761.1 on 7 and 21576 DF,  p-value: < 2.2e-16
...
```

--
.red-pink[**Do not interpret main effects or interactions by themselves!**]

---
**For Fall 2019:**
$$
`\begin{aligned}
\hat{MEAN\_ORF}_j = & \color{red}{78.3} + \color{red}{(-34.1)*FRPL} + 28.4(0) + (-3.6)*(0) + 23.0*(0) + \\
                    & (-6.1)*FRPL \times (0) + 1.7*FRPL \times (0) + (-7.5)*FRPL \times (0) \\
                  = & 78.3 - 34.1*FRPL
\end{aligned}`
$$

**For Winter 2020:**
$$
`\begin{aligned}
\hat{MEAN\_ORF}_j = & \color{red}{78.3} + \color{red}{(-34.1)*FRPL} + \color{red}{28.4(1)} + (-3.6)*(0) + 23.0*(0) + \\
                    & \color{red}{(-6.1)*FRPL \times (1)} + 1.7*FRPL \times (0) + (-7.5)*FRPL \times (0) \\
                  = & 106.7 - 40.2*FRPL
\end{aligned}`
$$

**For Fall 2020:**
$$
`\begin{aligned}
\hat{MEAN\_ORF}_j = & \color{red}{78.3} + \color{red}{(-34.1)*FRPL} + 28.4(0) + \color{red}{(-3.6)*(1)} + 23.0*(0) + \\
                    & (-6.1)*FRPL \times (0) + \color{red}{1.7*FRPL \times (1)} + (-7.5)*FRPL \times (0) \\
                  = & 74.7 - 32.4*FRPL
\end{aligned}`
$$

**For Winter 2021:**
$$
`\begin{aligned}
\hat{MEAN\_ORF}_j = & \color{red}{78.3} + \color{red}{(-34.1)*FRPL} + 28.4(0) + (-3.6)*(0) + \color{red}{23.0*(1)} + \\
                    & (-6.1)*FRPL \times (0) + 1.7*FRPL \times (0) + \color{red}{(-7.5)*FRPL \times (1)} \\
                  = & 101.3 - 41.6*FRPL
\end{aligned}`
$$
--
.red-pink[**Until we do the math, we only know whether interactions are statistically different, but not by how much!**]

---
# Show results

```r
fit5 <- lm(mean_orf ~ period * frpl_prop, data=dibels_long)
df5 <- margins::margins(fit5,
            at = list(period=c("y1_boy", "y1_moy", 
                               "y2_boy", "y2_moy")))

# Use prototypical values in resulting dataset to show results
proto2 <- ggplot(data=df5, aes(x=frpl_prop, y=fitted, color=period)) + 
          geom_smooth(method='lm') +
          xlab("Proportion receiving FRPL") + ylab("Predicted ORF") +
          ylim(35, 110) +
          scale_color_discrete(name = "Period",
                              breaks=c("y1_boy", "y1_moy", 
                                      "y2_boy", "y2_moy"),
                              labels=c("Fall 2019","Winter 2020",
                                       "Fall 2020", "Winter 2021")) +
          theme_minimal(base_size=16)
```

---
# Show results
<img src="EDUC643_13_interactions_files/figure-html/unnamed-chunk-18-1.png" style="display: block; margin: auto;" />

---
# Another way?

```
## 
## Call:
## lm(formula = mean_orf ~ frpl_prop, data = subset(dibels_long, 
##     period == "y1_boy"))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -74.154 -25.613  -0.189  25.572  80.831 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  78.3362     0.7582  103.31   <2e-16 ***
## frpl_prop   -34.1469     1.3413  -25.46   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 31.19 on 5394 degrees of freedom
## Multiple R-squared:  0.1073,	Adjusted R-squared:  0.1071 
## F-statistic: 648.1 on 1 and 5394 DF,  p-value: < 2.2e-16
```

---
# Another way?

```
## 
## Call:
## lm(formula = mean_orf ~ frpl_prop, data = subset(dibels_long, 
##     period == "y2_moy"))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -92.378 -31.799   3.262  29.915 121.097 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 101.2890     0.9037  112.08   <2e-16 ***
## frpl_prop   -41.6172     1.5986  -26.03   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 37.18 on 5394 degrees of freedom
## Multiple R-squared:  0.1116,	Adjusted R-squared:  0.1115 
## F-statistic: 677.7 on 1 and 5394 DF,  p-value: < 2.2e-16
```

---
## Value (and perils) of interaction

1. As we'll see momentarily, interaction models can be fit whether one (or more) of the predictors are dichotomous or continuous 
2. Interaction models provide an easy statistical test of whether the slopes differ across groups (or across levels of a continuous predictor)
3. Interaction models keep the sample intact (you don’t need to break it down into many different groups)
   + Has some implementation advantages, but the purported statistical power advantages have been historically misunderstood and overstated
   + For more on this, check out Andrew Gelman's, ["You need 16 times the sample size to estimate an interaction than to estimate a main effect"](https://statmodeling.stat.columbia.edu/2018/03/15/need-16-times-sample-size-estimate-interaction-estimate-main-effect/)
4. Can be quite hard to clearly interpret
   + Visualization critical to communicate

---
class: middle, inverse

# Interactions: categorical and categorical

---
# What are our categories?

```r
table(dibels_long$title1, exclude=NULL)
```

```
## 
##        Not Title I Title I schoolwide   Title I targeted            Missing 
##               2752              15956               2124                752
```

```r
dibels_long %>% group_by(title1) %>% summarize(mean= mean(mean_orf))
```

```
## # A tibble: 4 x 2
##   title1              mean
##   <fct>              <dbl>
## 1 Not Title I         89.0
## 2 Title I schoolwide  70.3
## 3 Title I targeted    78.6
## 4 Missing             52.9
```

---
# Title I as predictor

```r
summary(lm(mean_orf ~ title1, dibels_long))
```

```
## 
## Call:
## lm(formula = mean_orf ~ title1, data = dibels_long)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -85.215 -32.520   2.159  28.532 133.689 
## 
## Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)    
## (Intercept)               89.0149     0.7132 124.814   <2e-16 ***
## title1Title I schoolwide -18.7034     0.7722 -24.220   <2e-16 ***
## title1Title I targeted   -10.4537     1.0806  -9.674   <2e-16 ***
## title1Missing            -36.1616     1.5395 -23.490   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 37.41 on 21580 degrees of freedom
## Multiple R-squared:  0.03796,	Adjusted R-squared:  0.03783 
## F-statistic: 283.8 on 3 and 21580 DF,  p-value: < 2.2e-16
```

---
# Both

```r
summary(lm(mean_orf ~ period + title1, dibels_long))
```

```
## 
## Call:
## lm(formula = mean_orf ~ period + title1, data = dibels_long)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -93.169 -30.793   3.011  28.610 124.791 
## 
## Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)    
## (Intercept)               78.4587     0.7929  98.954  < 2e-16 ***
## periody1_moy              25.5620     0.6810  37.533  < 2e-16 ***
## periody2_boy              -2.7914     0.6810  -4.099 4.17e-05 ***
## periody2_moy              19.4541     0.6810  28.565  < 2e-16 ***
## title1Title I schoolwide -18.7034     0.7302 -25.615  < 2e-16 ***
## title1Title I targeted   -10.4537     1.0217 -10.231  < 2e-16 ***
## title1Missing            -36.1616     1.4556 -24.843  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 35.38 on 21577 degrees of freedom
## Multiple R-squared:   0.14,	Adjusted R-squared:  0.1398 
## F-statistic: 585.6 on 6 and 21577 DF,  p-value: < 2.2e-16
```

---
# Interaction

```r
summary(lm(mean_orf ~ period * title1, dibels_long))
```

```
...
## Coefficients:
##                                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                            77.6809     1.3487  57.598  < 2e-16 ***
## periody1_moy                           27.0794     1.9073  14.198  < 2e-16 ***
## periody2_boy                           -3.6222     1.9073  -1.899   0.0576 .  
## periody2_moy                           21.8790     1.9073  11.471  < 2e-16 ***
## title1Title I schoolwide              -17.8299     1.4604 -12.209  < 2e-16 ***
## title1Title I targeted                -10.3759     2.0434  -5.078 3.85e-07 ***
## title1Missing                         -32.5905     2.9113 -11.195  < 2e-16 ***
## periody1_moy:title1Title I schoolwide  -1.7557     2.0653  -0.850   0.3953    
## periody2_boy:title1Title I schoolwide   1.0411     2.0653   0.504   0.6142    
## periody2_moy:title1Title I schoolwide  -2.7795     2.0653  -1.346   0.1784    
## periody1_moy:title1Title I targeted    -0.6469     2.8899  -0.224   0.8229    
## periody2_boy:title1Title I targeted     0.8777     2.8899   0.304   0.7613    
## periody2_moy:title1Title I targeted    -0.5420     2.8899  -0.188   0.8512    
## periody1_moy:title1Missing             -4.4702     4.1171  -1.086   0.2776    
## periody2_boy:title1Missing             -0.7234     4.1171  -0.176   0.8605    
## periody2_moy:title1Missing             -9.0909     4.1171  -2.208   0.0272 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 35.38 on 21568 degrees of freedom
## Multiple R-squared:  0.1404,	Adjusted R-squared:  0.1398 
## F-statistic: 234.8 on 15 and 21568 DF,  p-value: < 2.2e-16
...
```

---
# Interpretation

.blue[**You try!**]

Construct an equation of the form:

$$
`\begin{aligned}
\hat{MEAN\_ORF}_j = & \beta_0 + \beta_1 P_2 + \beta_2 P_3 + \beta_3 P_4 +  \\
                    & \beta_4 T_2 + \beta_5 T_3 + \beta_6 T_4 + \\
                    & \beta_7 P_2 \times T_2 + \beta_8 P_2 \times T_3 + \beta_{9} P_2 \times T_4 + \\
                    & \beta_{10} P_3 \times T_2 + \beta_{11} P_3 \times T_3 + \beta_{12} P_3 \times T_4 + \\
                    & \beta_{13} P_4 \times T_2 + \beta_{14} P_4 \times T_3 + \beta_{15} P_4 \times T_4 \\
\end{aligned}`
$$
with the coefficients from the previous slide and determine the fitted equation comparing Title I schoolwide schools in Winter 2021 to Title I schoolwide schools in Fall 2019.

---
# Visualization

```r
# Let's just focus on ORF changes in first fall of pandemic

fit6 <- lm(mean_orf ~ period * title1, dibels_long)
df6 <- margins::margins(fit6,
       at = list(period=c("y2_boy", "y2_moy"),
       title1 = c("Not Title I", "Title I schoolwide", 
                  "Title I targeted", "Missing")))

# Show results for each category
categ <- ggplot(data=df6, 
            aes(x=period, y=fitted, 
                ymin=fitted-1.96*se.fitted, ymax=fitted+1.96*se.fitted, 
                group=title1, color=title1)) + 
          geom_pointrange(position=position_dodge(width=0.2)) +
          ylab("Predicted ORF") + xlab(" ") +
          scale_x_discrete(labels= c("y2_boy" = "Fall 2020",
                                     "y2_moy" = "Winter 2021")) +
          ylim(0, 110) +
          theme_minimal(base_size=16) +
          theme(legend.title = element_blank())
```

---
# Visualization
<img src="EDUC643_13_interactions_files/figure-html/unnamed-chunk-26-1.png" style="display: block; margin: auto;" />

---
# Visualization
<img src="EDUC643_13_interactions_files/figure-html/unnamed-chunk-27-1.png" style="display: block; margin: auto;" />

--
.tiny[For simplicity just focusing on primary question predictor interacted with one other variable (moderator). Could include multiple covariates and/or three-way interactions.]

--
.tiny[.red-pink[**Warning!**] It gets complicated (and underpowered) fast!]

---
class: middle, inverse

# Interactions: continuous and continuous

---
# School enrollment as predictor

```r
summary(lm(mean_orf ~ school_enroll, dibels_long))
```

```
## 
## Call:
## lm(formula = mean_orf ~ school_enroll, data = dibels_long)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -84.247 -31.962   1.117  28.587 136.732 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   60.073893   0.568179  105.73   <2e-16 ***
## school_enroll  0.038062   0.001505   25.28   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 37.59 on 21582 degrees of freedom
## Multiple R-squared:  0.02876,	Adjusted R-squared:  0.02872 
## F-statistic: 639.2 on 1 and 21582 DF,  p-value: < 2.2e-16
```

---
# Both

```r
summary(lm(mean_orf ~ frpl_prop + school_enroll, dibels_long))
```

```
## 
## Call:
## lm(formula = mean_orf ~ frpl_prop + school_enroll, data = dibels_long)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -93.355 -28.850   1.075  26.915 134.743 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    79.010988   0.684100  115.50   <2e-16 ***
## frpl_prop     -35.319032   0.776608  -45.48   <2e-16 ***
## school_enroll   0.030953   0.001447   21.40   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 35.91 on 21581 degrees of freedom
## Multiple R-squared:  0.1137,	Adjusted R-squared:  0.1136 
## F-statistic:  1384 on 2 and 21581 DF,  p-value: < 2.2e-16
```

---
# Interaction

```r
summary(lm(mean_orf ~ frpl_prop * school_enroll, dibels_long))
```

```
## 
## Call:
## lm(formula = mean_orf ~ frpl_prop * school_enroll, data = dibels_long)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -92.138 -28.539   1.124  26.924 134.563 
## 
## Coefficients:
##                           Estimate Std. Error t value Pr(>|t|)    
## (Intercept)              81.125811   1.031551  78.644  < 2e-16 ***
## frpl_prop               -39.378046   1.673133 -23.536  < 2e-16 ***
## school_enroll             0.024601   0.002733   9.000  < 2e-16 ***
## frpl_prop:school_enroll   0.012675   0.004628   2.739  0.00617 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 35.9 on 21580 degrees of freedom
## Multiple R-squared:  0.114,	Adjusted R-squared:  0.1139 
## F-statistic: 925.7 on 3 and 21580 DF,  p-value: < 2.2e-16
```

---
# Prototypical values?

.blue[**How could we choose meaningful values to demonstrates the differing relationship?**]

```r
quantile(dibels_long$school_enroll, probs = seq(0, 1, 0.1))
```

```
##   0%  10%  20%  30%  40%  50%  60%  70%  80%  90% 100% 
##   25  120  182  233  283  326  370  414  472  564 1014
```

Maybe at roughly the 10<sup>th</sup>, median (50<sup>th</sup>), and 90<sup>th</sup> percentiles?

--
Say 120, 326 and 600 students?

---
## Displaying continuous interactions

```r
fit7 <- lm(mean_orf ~ frpl_prop * school_enroll, dibels_long)

df7 <- margins::margins(fit7,
            at = list(school_enroll=c(120, 326, 600)))

# Use prototypical values in resulting dataset to show results
cont <- ggplot(data=df7, aes(x=frpl_prop, y=fitted, 
                             color=as.factor(school_enroll))) +
          geom_smooth(method='lm') +
          xlab("Proportion receiving FRPL") + ylab("Predicted ORF") +
          ylim(35, 100) +
          scale_color_discrete(name = "School Enrollment",
                              breaks=c(120, 326, 600),
                              label=c("~10th pctile (120 stu.)", 
                                      "Median (326 stu.)", 
                                      "~90th pctile (600 stu.)")) +
          theme_minimal(base_size=16)
```

---
## Displaying continuous interactions
<img src="EDUC643_13_interactions_files/figure-html/unnamed-chunk-33-1.png" style="display: block; margin: auto;" />

---
## Can also add a confidence band

```r
cont +
geom_ribbon(aes(ymin=fitted-1.96*se.fitted, ymax=fitted+1.96*se.fitted, 
fill=as.factor(school_enroll)), alpha=0.3, linetype=0, show.legend = F)
```

---
class: middle, inverse
# Synthesis and wrap-up

---
### Synthesize interactions

- **Statistical interactions are ubiquitous**
  + An interaction tells us that the relationship between one predictor and the outcome differs by levels of another
  + The standard regression model, which initially assumes that there are no interactions (the main effects assumption), can be easily modified to accommodate their presence
  + Many substantive theories suggest that relationships will be interactive
- **Test for a statistical interaction by including a cross-product term**
  + The cross-product is literally the product of the two constituent variables
  + In incorporating an interaction (moderation) term, be careful about:
        - Removing the main effects
        - Statistical power
  + Graph out the fitted model to ensure correct interpretation
        - Explore the `sjPlot` package for more ways to visualize interactions
- **More learning (beyond this course)**
  + Centering and/or standardizing variables can aid with interpretation
  + The `contrast` and `emtrends` functions in the `emmeans` package can quickly test for whether fitted values at different levels of predictor and moderator are significantly different
- **Predictors can interact with themselves** (next sub-unit: non-linearities)

---
# Goals of the unit

---
# To-Dos

### Reading: 
- **Finish by Feb. 27**: LSWR Chapter 16.2

### Assignment 3:
- Due Feb. 28, 12:01pm (noon)

### Assignment 4 (last one!!!):
- Just on interactions (not non-linearity)
- Due Mar. 10, 12:01pm (noon)

### Quiz 4:
- **NOW!!** Due 5pm tomorrow, Feb. 26