class: center, middle, inverse, title-slide .title[ # EDLD 650 Review Session ] .author[ ### David D. Liebowitz ] --- <style type="text/css"> .inverse { background-color : #2293bf; } </style> # SAT score and income <img src="EDLD_650_Review_files/figure-html/unnamed-chunk-2-1.png" style="display: block; margin: auto;" /> --- # A closer look <img src="EDLD_650_Review_files/figure-html/unnamed-chunk-3-1.png" style="display: block; margin: auto;" /> --- # A closer look <img src="EDLD_650_Review_files/figure-html/unnamed-chunk-4-1.png" style="display: block; margin: auto;" /> --- # Graphical fixed effects ```r df <- df %>% group_by(school) %>% mutate(mean_SAT = mean(SAT)) df <- df %>% mutate(demean_SAT = SAT - mean_SAT) ``` -- <img src="EDLD_650_Review_files/figure-html/unnamed-chunk-6-1.png" style="display: block; margin: auto;" /> --- # Graphical fixed effects ```r df <- df %>% group_by(school) %>% mutate(mean_income=mean(income30)) df <- df %>% mutate(demean_income = income30 - mean_income) ``` -- <img src="EDLD_650_Review_files/figure-html/unnamed-chunk-8-1.png" style="display: block; margin: auto;" /> --- # Graphical fixed effects ```r df <- df %>% group_by(school) %>% mutate(mean_income=mean(income30)) df <- df %>% mutate(demean_income = income30 - mean_income) ``` <img src="EDLD_650_Review_files/figure-html/unnamed-chunk-10-1.png" style="display: block; margin: auto;" /> --- # A naive regression ```r summary(lm(income30 ~ SAT, data=df)) ``` ``` ... #> #> Call: #> lm(formula = income30 ~ SAT, data = df) #> #> Residuals: #> Min 1Q Median 3Q Max #> -4815.2 -957.9 -64.6 961.1 5934.6 #> #> Coefficients: #> Estimate Std. Error t value Pr(>|t|) #> (Intercept) 11769.532 573.360 20.53 <2e-16 *** #> SAT 76.573 1.146 66.83 <2e-16 *** #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ... ``` -- `$$INCOME_{i} = \beta_{0} + \beta_{1} SAT_{i} + \varepsilon_{i}$$` --- # Adjust for school ```r summary(lm(income30 ~ SAT + school, data=df)) ``` ``` ... #> lm(formula = income30 ~ SAT + school, data = df) #> #> Residuals: #> Min 1Q Median 3Q Max #> -3652.7 -720.8 9.6 701.0 4041.7 #> #> Coefficients: #> Estimate Std. Error t value Pr(>|t|) #> (Intercept) 48399.823 843.842 57.357 <2e-16 *** #> SAT -1.761 1.777 -0.991 0.322 #> schoolSchool 2 2494.792 66.184 37.695 <2e-16 *** #> schoolSchool 3 5078.168 101.285 50.137 <2e-16 *** #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ... ``` -- `$$INCOME_{ij} = \beta_{0} + \beta_{1} SAT_{ij} + \beta_{2} SCHOOL2_{j} + \beta_{3} SCHOOL3_{j} + \varepsilon_{i}$$` --- # Cluster-adjusted SEs ```r summary(fixest::feols(income30 ~ SAT | school, data=df)) ``` ``` #> OLS estimation, Dep. Var.: income30 #> Observations: 3,000 #> Fixed-effects: school: 3 #> Standard-errors: Clustered (school) #> Estimate Std. Error t value Pr(>|t|) #> SAT -1.76127 1.81864 -0.968456 0.43498 #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 #> RMSE: 1,076.5 Adj. R2: 0.781517 #> Within R2: 3.28e-4 ``` -- **What happened to the standard errors? Is that right? Why or why not?** --- # Fixed vs. random ### Fixed: `$$INCOME_{ij} = \beta_{0} + \beta_{1} SAT_{ij} + \beta_{2} SCHOOL2_{j} + \beta_{3} SCHOOL3_{j} + \varepsilon_{ij}$$` -- ### Random: `$$INCOME_{ij} = \gamma_{0} + \gamma_{1} SAT_{ij} + (\nu_{j} + \varepsilon_{ij})$$` --- ### Random intercepts `$$INCOME_{ij} = \gamma_{0} + \gamma_{1} SAT_{ij} + \color{red}{(\nu_{j} + \varepsilon_{ij})}$$` There is now a composite residual that has parts *for each individual*. A school-level residual `\((\nu_{j})\)` and an individual-level residual `\((\varepsilon_{ij})\)`. Each residual term can be summarized by examining its variance/SD and correlated with other residual terms. -- `\(\nu_{j}\)`: unique to school *j*; identical for all students in *jth* school; iid for every school `\(\varepsilon_{ij}\)`: unique to student i in school j. iid for every student -- Can calculate the **intra-class correlation**, *a summary of the proportion of outcome variability attributable to differences across schools.* `$$\hat{\rho} = \frac{\sigma^{2}_{s}}{\sigma^{2}_{s}+\sigma^{2}_{i}}$$` -- .small[Random effects estimation procedures use iterative algorithms (Maximum Likelihood Estimation, restricted MLE, Gradient descent, etc.) to minimize the sum of the square errors at each hierarchical level of the model.] --- # Estimate random intercept model ```r summary(lme4::lmer(income30 ~ SAT + (1 | school), data=df)) ``` ``` #> Linear mixed model fit by REML ['lmerMod'] #> Formula: income30 ~ SAT + (1 | school) #> Data: df #> #> REML criterion at convergence: 50411 #> #> Scaled residuals: #> Min 1Q Median 3Q Max #> -3.3911 -0.6687 0.0083 0.6518 3.7524 #> #> Random effects: #> Groups Name Variance Std.Dev. #> school (Intercept) 6440493 2538 #> Residual 1160413 1077 #> Number of obs: 3000, groups: school, 3 #> #> Fixed effects: #> Estimate Std. Error t value #> (Intercept) 50892.981 1713.254 29.705 #> SAT -1.699 1.776 -0.957 #> #> Correlation of Fixed Effects: #> (Intr) #> SAT -0.518 ``` --- # Treatment effect? ``` ... #> Scaled residuals: #> Min 1Q Median 3Q Max #> -3.3911 -0.6687 0.0083 0.6518 3.7524 #> #> Random effects: #> Groups Name Variance Std.Dev. #> school (Intercept) 6440493 2538 #> Residual 1160413 1077 #> Number of obs: 3000, groups: school, 3 #> #> Fixed effects: #> Estimate Std. Error t value #> (Intercept) 50892.981 1713.254 29.705 #> SAT -1.699 1.776 -0.957 #> #> Correlation of Fixed Effects: #> (Intr) #> SAT -0.518 ... ``` -- **What's different/same compared to fixed effect estimates?**