An Introduction to Causal Inference

class: center, middle, inverse, title-slide

.title[
# An Introduction to Causal Inference
]
.subtitle[
## EDLD 650: Advanced Research Methods Seminar
]
.author[
### David D. Liebowitz
]

---

# Why causal research? (I)

.pull-left[
<img src="museum.jpg" width="90%" style="display: block; margin: auto;" />
]

.pull-right[
<img src="museum2.jpg" width="120%" style="display: block; margin: auto;" />
]

---
# Why causal research? (II)
- .large[**Abstract**]: We estimate the relationship between X and Y.
- .large[**Intro**]: It would be important to know whether X causes Y.
- .large[**Data and Analytic Strategy**]: Our data and research design are observational, and so we are unable to identify the causal impact of X on Y.
- .large[**Results**]: We find that a one-percentage point difference in X is associated with a 4.5 percentage point difference in Y.
- .large[**Discussion**]: A major limitation of our study is that we cannot rule out the possibility of confounders or reverse causality. Thus, while we cannot say whether X causes Y, our findings show this is a strong possibility and future research should explicitly explore it.
- .large[**Conclusion**]: But really 😉, X causes Y.

---
# Why .red[*careful*] causal research?

.pull-left[
<img src="cuddy.jpg" width="90%" style="display: block; margin: auto;" />
]

.pull-right[
<img src="cuddy_art.jpg" width="90%" style="display: block; margin: auto;" />
]

<img src="cuddy_art2.jpg" width="90%" style="display: block; margin: auto;" />
---
## Descriptive and causal research

> **Quality causal research question**: Did the Success for All whole-school intervention improve students' reading achievement?

> **Quality descriptive research question**1: Do the teachers of English Learner students in self-contained classrooms have different pedagogical skill levels than teachers of non-English Learners?

.footnote[[1] Helpful resource: Loeb et al. (2017). [Descriptive analysis in education: A guide for researchers.](../readings/Loeb et al 2017.pdf) (NCEE 2017-4023). Washington, DC: US DoE, IES]

.pull-left[
.red[**Don't attempt to answer a question that is inherently (or implicitly) causal using a correlational approach!** .small[*We only care about the relationship between museum-going and mortality if it is a directionally causal one!*]]

]

.pull-right[
<img src="lanes.jpg" width="30%" style="display: block; margin: auto;" />
]

.large[**The overarching goal of this course**]: To provide you with (some of) the tools to be effective consumers and producers of causal research

---
# Roadmap

---
# Agenda
1. **Introduction**
  - Correlation `$\neq$` causality
  - Roadmap
  - Agenda/goals
2. **A Causal Framework**
  - Experiments and potential outcomes
     - Class 1 Questions (Sections I and II)
  - Complexificating it
     - A word about DAGs
3. **Break**
4. **Nested data**
  - Class 1 Questions (Section III)
5. **Difference-in-differences**
6. **Conclusions**
  - Key course expectations & logistics
  - To-dos
  - Plus/deltas

---
# Goals for today

1. Articulate in words and simple graphical representations challenges in identifying causal relationships in quantitative data

2. Articulate in words and using simple mathematical terms a framework for identifying causal relationships in quantitative data
3. Describe (conceptually) unit fixed effects and their strengths (and limitations) in research designs seeking to identify causal relationships

4. Describe the conceptual approach to identifying causal effects using the difference-in-differences framework

---
class: middle, inverse

# Causal frameworks

---
# 5 conditions of causal claims

[William Shadish, Donald Cook and Thomas Cambpell (2002)](https://books.google.com/books/about/Experimental_and_Quasi_experimental_Desi.html?id=o7jaAAAAMAAJ) adapt John Stuart Mill's critical conditions that must exist in order to defend the claim that one thing causes another:

.pull-left[

1. Cause must precede effect in time

2. Identified mechanism

3. Consistency

4. Responsiveness

5. No plausible alternative explanation

]

.pull-right[
<img src="shadish.jpg" width="80%" style="display: block; margin: auto;" />
]

---
class: middle, inverse

# Experiments and potential outcomes

---
# Sliding doors

.pull-left[

* What if you missed your train (or didn't)?

* What if you had never been born?

* What if the Beatles never existed?

* What if the Nazis won WWII?

]

.pull-right[
<img src="counterfactuals.jpg" width="963" style="display: block; margin: auto;" />
]

---
# An "ideal" experiment

Hypothetically, we could draw a random sample from a defined population:
.pull-left[
<img src="group-stick-people.jpg" width="60%" style="display: block; margin: auto;" />
]

.pull-right[

- We could .blue[**implement the treatment**] for each participant
- And also concurrently .blue[**NOT implement the treatment**]
  - .small[We would need to be able to turn back time, and erase the impact and memory of the treatment in each case]

]

While this is obviously impossible, we can imagine that each participant has a value of the outcome that could .blue[**potentially**] be revealed under the following experimental conditions:

`$Y_{i}^{1}$` = potential value of outcome for `$i^{th}$` person, when treated `$(D_{i} = 1)$`

`$Y_{i}^{0}$` = potential value of outcome for `$i^{th}$` person, when .blue[**NOT**] treated `$(D_{i} = 0)$`

---
# An "ideal" experiment

`$Y_{i}^{1}$` = potential value of outcome for `$i^{th}$` person, when treated `$(D_{i} = 1)$`

`$Y_{i}^{0}$` = potential value of outcome for `$i^{th}$` person, when .blue[**NOT**] treated `$(D_{i} = 0)$`

The .blue[**Individual Treatment Effect (ITE)**] is the difference in potential outcome values between treatment and control conditions, for each individual:

`$$ITE_{i} = Y_{i}^{1} - Y_{i}^{0}$$`

.red[**We never actually observe this!!!**]

The .blue[**Average Treatment Effect (ATE)**] is the average of the individual treatment effects across all participants:

`$$\hat{ATE}_{i} = \frac{1}{n}{\sum_{i}^n ITE_{i}}$$`

If the ATE differed from zero, we could claim that the treatment *caused* the effect because there would be no other explanation for the differences detected between the treatment and control conditions!

---
# RCTs: the next best thing?

An "ideal" experiment such as this one is impossible because the same group of people cannot concurrently receive and not receive treatment. .red[**We have a missing data problem**]. We cannot actually estimate .blue[individual treatment effects] in practice, but if we are willing to make a few reasonable assumptions, we can still estimate the the .blue[average treatment effect]. This is particularly true when we conduct a .blue[randomized control trial (RCT)].

.pull-left[
<img src="group-stick-people.jpg" width="60%" style="display: block; margin: auto;" />
]

.pull-right[
We can draw our random sample, and randomly assign each participant to the .blue[**Treatment**] (where we measure their value of `$Y_{i}^{1}$` ) or .red[**Control**] (where we measure their value of `$Y_{i}^{0}$` ) condition.

]

`$$\hat{ATE}_{i} = \frac{1}{n_{1}}{\sum_{i}^{n_1} ITE_{i}} - \frac{1}{n_{0}}{\sum_{i}^{n_0} ITE_{i}}$$`
---
# Importance of exogeneity

The big idea in a randomized experiment is that treatment variation is .blue[**exogenously and randomly assigned**]. An external (or "exogenous") agent, usually the researcher, determines who is treated `$(D_{i} = 1)$` and who is not `$(D_{i} = 0)$`.

.pull-left[

- .small[Values of all observed and unobserved characteristics of the participants are randomized across treatment and control groups.]
- .small[Members of the treatment and control groups are then equivalent, on average, in the population (.blue[“equal in expectation”]) before the experiment begins, on every possible dimension.]
- .small[The values of treatment variable, D, will also be completely uncorrelated with all characteristics of participants, observed and unobserved, in the population.]

]

.pull-right[
<img src="researcher.jpg" width="70%" style="display: block; margin: auto;" />

- Exogenous and random treatment variation validates the causal attribution of an experiment. This is referred to as the research design's .blue[**internal validity**].

]

---
# A simple *t*-test

The great thing about experiments is the cleaner the design, the simpler the analysis:

Population average treatment effect: `$\mu_{1} - \mu_{0}$`

Estimated by the sample mean difference: `$\bar{Y_{1}} - \bar{Y_{0}}$`

To test for a treatment effect, conduct a two-sample *t*-test:

`$$t_{obs} = \frac{(\bar{Y_{1}} - \bar{Y_{0}})}{\sqrt{\frac{s^2}{n_1}+\frac{s^2}{n_0}}}$$`

`$$s^2 = \frac{(n_1 - 1)s_{1}^2 + (n_{0} - 1)s_{0}^2}{n_1 + n_2 -2}$$`
`$t_{crit} = t_{df = n_1 + n_2 -2}^{(\alpha = 0.05)}$` ; if `$t_{obs} > t_{crit}$`, then reject `$H_0$`!!!

No need for a pre-test, no need for controls, no need for complex statistical models!

---
# But OLS works too

In an experiment, a critical assumption of the generalized linear model (the foundation for OLS) is automatically satisfied:

`$$Y_{i} = \beta_{0} + \beta_{1}D_{i} + \varepsilon_{i}$$`
In a randomized experiment, the residuals are uncorrelated with the values of the treatment variable `$(D_{i})$` because the values of the treatment variable are assigned at random, rendering them uncorrelated with everything, including the residuals.

.pull-left[*Reminder of key OLS assumption*: residuals must be .blue[**"independent and identically distributed" (i.i.d.)**]. By independent we mean residuals must be uncorrelated with everything else, including the predictor(s) in the model, otherwise our estimates of the regression parameters will be .red[**biased**].]

.pull-right[
<img src="bias.jpg" width="90%" style="display: block; margin: auto;" />
]

---
# But OLS works BETTER!

Even in the most basic of well-executed RCTs, researchers will add covariates.

.pull-left[
<img src="variation_expl.jpg" width="120%" style="display: block; margin: auto;" />
]

.pull-right[

`$$Y_{i} = \beta_{0} + \beta_{1}D_{i} + \color{red}{\varepsilon_{i}}$$`

.small[Once you add X, part of Y that is now predicted by X (but wasn't predicted by D by design), is no no longer part of residual]

`$$Y_{i} = \beta_{0} + \beta_{1}D_{i} + \beta_{2}X_{i} + \color{purple}\varepsilon_{i}^{\prime}$$`

.small[Reduced residual variance means smaller standard errors, larger *t*-statistics and **MAWWR POWER**!!!]
]

---
# Cold-calling

.pull-left[
.purple[**Purpose**]
- Formative assessment
- Fair distribution of participation
- Shared accountability for deep understanding of complex and technical readings

.purple[**Norms**]
- Questions posted by Wednesday
- Preparation is expected
- These are hard concepts; mistakes are expected
- Judgments on accuracy of responses are about the responses, not the individual
- Questions and response are about learning, not performance
]

.pull-right[
.purple[**Structure**]
- All cold calls will be telegraphed
- Questions will come directly from question list
- Random draw (w/ replacement) from class list
- Ample wait time; multiple "at-bats"
- Teaching staff will identify incomplete or incorrect response and seek clarification
- Extension questions on a volunteer basis
]

---
class: middle, inverse

# Class 1 Discussion Questions
## Sections I and II

---
class: middle, inverse

# More complexity

---
## Threats to experimental validity

.large[.purple[**1. Contamination of treatment-control contrast**]]
 - violations of Stable Unit Treatment-Value Assumption (SUTVA)
 - an important assumption: selection of others into an intervention should not affect your outcome
 
.large[.purple[**2. Cross overs (aka non-compliance)**]]

.large[.purple[**3. Attrition**]]

.large[.purple[**4. Participation in experiment affects behavior**]]
 - Hawthorne and John Henry effects
 
--

> There is much to explore in these threats to validity. We will address some in the Instrumental Variables unit, but could form entire courses.

---
# Keep it real

Of course, in the real world, there are many reasons researchers are unable to conduct experiments:
.pull-left[
- Cost
- Time
- Willing partners
- Ethics
- Representativeness
- Power
- ...
]

.pull-right[
<img src="realworld.jpg" width="90%" style="display: block; margin: auto;" />
]

Thus, in this course, we will primarily concern ourselves with the goal of .blue[**recovering credibly causal estimates of treatment effects in observational data**].

but this is **hard**.

---
## Correlation `$\neq$` causation pt. 562

RQ: What is the relationship between Oregon's annual per capita divorce rate and the U.S. per capita annual beef consumption?

--
<img src="EDLD_650_1_intro_files/figure-html/unnamed-chunk-17-1.png" style="display: block; margin: auto;" />

*On the 10 o'clock news tonight: does U.S. beef consumption cause more "beefs" between Oregonians and their spouses?*

---
# Divorce and Beef

Do increases in beef consumption in Oregon **cause** increases in the U.S. divorce rate?

This is a classic problem of a .blue[**confounder**]!1

.footnote[[1] More fun with [spurious correlations](https://www.tylervigen.com/spurious-correlations)]

---
## Why correlation `$\neq$` causation?

Common barriers in attributing causality to observed co-relationships include:
- .purple[Confounders]: a third variable causes changes in X and also in Y
- [Colliders](http://www.the100.ci/2017/03/14/that-one-weird-third-variable-problem-nobody-ever-mentions-conditioning-on-a-collider/): a third variable that is caused by both the predictor and outcome; controlling for this can make a true causal relationship disappear!
- .purple[Reverse causation]: X may cause Y **or** Y may cause X
- .purple[Simpson's Paradox]: a third variable may reverse the correlation
- Also, **lack** of correlation `$\neq$` **lack** of causality

<img src="causalinf.jpg" width="40%" style="display: block; margin: auto;" />
h/t [@causalinf](https://twitter.com/causalinf)

---
## Directed acyclic graphs (DAGs)

[Directed Acyclical Graphs (DAGs)](https://journals.sagepub.com/doi/pdf/10.1177/2515245917745629) can help us visualize the assumptions necessary to estimate causal relationships in observational data through graphical representation.

---
# Spurious correlation

> .small[*It is easy to prove that the wearing of tall hats and the carrying of umbrellas enlarges the chest, prolongs life, and confers comparative immunity from disease...A university degree, a daily bath, the owning of thirty pairs of trousers, a knowledge of Wagner’s music, a pew in church, anything, in short, that implies more means and better nurture…can be statistically palmed off as a magic spell conferring all sorts of privileges...The mathematician whose correlations would fill a Newton with admiration, may, in collecting and accepting data and drawing conclusions from them, fall into quite crude errors by just such popular oversights.* -George Bernard Shaw (1906)]

---
# A DAG-gone example

- Directed Acyclical Graphs (DAGs)] help us visualize the assumptions necessary to estimate causal relationships in observational data
- .red-pink[**Nodes**] represent variables; .purple[**arrows**] represent directional causal effects; missing arrow implies lack of a causal path
- Effects are either:
   - direct `$(D \rightarrow Y)$`; i.e., the causal effect of D (college) on Y (earnings); **or**
   
---
# A DAG-gone example

---
# DAGs follow two rules

.blue[**Rule 1: No bidirectional arrows**] .red[**ILLEGAL!! Not "directed"**]
<img src="EDLD_650_1_intro_files/figure-html/unnamed-chunk-24-1.png" style="display: block; margin: auto;" />

.blue[**Rule 2: No feedback loops**] .red[**ILLEGAL!! Not "acyclic"**]
<img src="EDLD_650_1_intro_files/figure-html/unnamed-chunk-25-1.png" style="display: block; margin: auto;" />

---
# Confounders

We often hope that conditioning on the confounder closes **all** backdoor paths and thus allows us to estimate the direct effect of D on Y:
- `$D \rightarrow Y$`: causal effect of D on Y
- `$D \leftarrow I \rightarrow Y$`: income influences both college and earnings
- `$D \leftarrow PE \rightarrow I \rightarrow Y$`: parental education influences family income which influences own earnings
- `$D \leftarrow X \rightarrow PE \rightarrow I \rightarrow Y$`: .small[unobserved background characteristics influence parental education, family income, college attendance and own earnings]

---
# Confounders

---
# Colliders

- Career choice is a .blue[collider]. 
- No need to condition on it as the backdoor path is already closed
- .red[Leave colliders alone!] Beware of conditioning on them and thereby opening backdoors or (worse) introducing bias.
  - Here, doing so might underestimate the effect of going to college

---
## Where DAGs get tricky (for me)

.small[DAGs can be an intuitive and careful way of thinking through causal research design (see [Pearl, 2009](http://bayes.cs.ucla.edu/BOOK-2K/)). They also risk encouraging the researcher to believe she can solve by analysis what is broke by design (see [Imbens, 2020](https://arxiv.org/abs/1907.07271)).

In this class, we'll use the .blue[**potential outcomes framework**] and rely on research designs in which we can credibly argue that .blue[**assignment to treatment is exogenous or based on observable characteristics**], but concepts such as confounders, colliders and controlling backdoors are valuable parts of your toolkits. You can learn much more about DAGs than I have presented here in our SEM sequence (EDLD 633/634)!]

---
class: middle, inverse

# Break

---
class: middle, inverse

# Nested Data

---
# What is nested data?
.small[Recall the Success for All evaluation from *Methods Matter*]1

```r
ch7_sfa <- read_dta(here("data/ch7_sfa.dta"))
```

```
...
#> schid stuid wattack sfa ppvt sch_ppvt
#> <fct> <dbl> <dbl> <fct> <dbl> <dbl>
#> 1 1 10158087 469 1 89 90.6
#> 2 1 10217961 486 1 83 90.6
#> 3 1 10486718 501 1 90 90.6
...
```

```
...
#> schid stuid wattack sfa ppvt sch_ppvt
#> <fct> <dbl> <dbl> <fct> <dbl> <dbl>
#> 1 41 31979390 473 0 78 83.6
#> 2 41 31989400 485 0 65 83.6
...
```

.footnote[[1] Most datasets from *MM* available from [UCLA stats site](https://stats.idre.ucla.edu/other/examples/methods-matter/).]

---
# Modeling nested data

.large[**Physical nesting**]
- Our data can be nested in multiple units: students inside classrooms, classrooms inside schools, schools inside districts, districts inside states, etc.

.large[**Conceptual nesting**]
- If we observe students across multiple years, we will have multiple observations nested inside students
- If we administer assessments multiple times, we will have tests nested inside students

.blue[Each of these forms of nesting have implications for how we model treatment effects (and on our standard errors).]

In the SfA example, we want to capture the effect of receiving the SfA treatment, .blue[**independent of the effect of the unobserved and observed qualities of the school the student attends**].

---
# Two common approaches

.large[.purple[**Random intercepts (aka random effects)**]]

`$$WATTACK_{ij} = \gamma_{0} + \gamma_{1}SFA_{j} + (\varepsilon_{ij} + \nu_{j})$$`

You may also have seen this written as:

`$$WATTACK_{ij} = \gamma_{00} + \gamma_{01}SFA_{j} + (\varepsilon_{ij} + \nu_{0j})$$`
.red[**THESE ARE IDENTICAL!**]

---
# Two common approaches

.large[.purple[**Random intercepts (aka random effects)**]]

`$$WATTACK_{ij} = \gamma_{0} + \gamma_{1}SFA_{j} + (\varepsilon_{ij} + \nu_{j})$$`

You may also have seen this written as:

`$$WATTACK_{ij} = \gamma_{00} + \gamma_{01}SFA_{j} + (\varepsilon_{ij} + \nu_{0j})$$`

.large[.purple[**Fixed intercepts (aka fixed effects)**]]

`$$WATTACK_{ij} = \sum_{1}^{J}\alpha_{j}S_{ij} + \gamma_{1}SFA_\color{red}{ij} + \varepsilon_{ij}$$`
.red[Notice the within-school variation in treatment in this hypothetical example]

> A note on notation: fixed effects are often represented with capital Greek letters `$(e.g., \Gamma_j, \Pi_t, \Delta_k)$`. Vectors of covariates are often represented with vector notation `$(e.g., \textbf{X}_{ij})$`

---
# What is a fixed effect doing?

.small[h/t [@nickchk](https://twitter.com/nickchk)]

---
# Random v. Fixed Effects

.small[
| | Random effects | Fixed effects
|-----------------------------------------------------------------------------------------------------------
| Strengths | - Minimal loss of power - Preserves (almost all of) outcome variance | - Accounts for observed and unobserved, time-invariant, within-group differences - Reduces outcome variance to only that relevant to estimating treatment effect
| Limitations | - Introduces bias if any correlation between predictors and group-level residuals - Less transparent (more complex) interpretation | - Sacrifices degrees of freedom - Cannot have hierarchically nested fixed effects - Cannot have fixed effect collinear with level of treatment - Cannot include adjustments ("controls") that are invariant within unit
]

---
# Random v. Fixed Effects

### Some guidelines:

- Preference should be informed by data structure, analytic strategy and context1

- In both cases, need to pay attention to how you calculate standard errors

- Often disciplinary preferences

- Generally, with long panels (many w/in grouping unit observations) and in non-experimental settings where we seek to estimate treatment effects, fixed effects are preferable

.footnote[[1] .small[See Clark & Linzer [(PSRM, 2015)](../readings/Clark Linzer 2015.pdf) for a short, minimally technical, summary. *Note*: mixed models with both fixed- and random-intercepts are possible as well as are many other multi-level models (random slopes, random slopes and intercepts, etc.). Consider taking our multi-level modeling sequence (EDLD 628/629) to learn more.]]

---
## Random intercepts application

```r
sfa <- lme4::lmer(wattack ~ sfa + (1 | schid), data=ch7_sfa)
summary(sfa)
```

```
...
#> Random effects:
#>  Groups   Name        Variance Std.Dev.
#>  schid    (Intercept)  75.69    8.70   
#>  Residual             314.23   17.73   
#> Number of obs: 2334, groups:  schid, 41
#> 
#> Fixed effects:
#>             Estimate Std. Error t value
#> (Intercept)  475.302      2.035 233.616
#> sfa1           4.366      2.844   1.535
#> 
#> Correlation of Fixed Effects:
#>      (Intr)
#> sfa1 -0.715
...
```

---
## Random intercepts application

```
...
#> Random effects:
#>  Groups   Name        Variance Std.Dev.
*#>  schid    (Intercept)  75.69    8.70   
*#>  Residual             314.23   17.73   
#> Number of obs: 2334, groups:  schid, 41
#> 
#> Fixed effects:
#>             Estimate Std. Error t value
*#> (Intercept)  475.302      2.035 233.616
*#> sfa1           4.366      2.844   1.535
#> 
#> Correlation of Fixed Effects:
#>      (Intr)
#> sfa1 -0.715
...
```

Compare the intra-class correlation (ICC) `$(\hat{\rho})$` w/ Table 7.1 in *MM* (p. 114):

`$$\hat{\rho} = \frac{75.69}{75.69 + 314.23} = 0.194$$`
---
class: middle, inverse

# Class 1 Discussion Questions
## Section III

1. Review your answers to Section III
2. Revise any of your answers based on the information from the past slides
3. What is still unclear? Turn-and-talk with neighbor to see if you can gain clarity
4. We will share out any outstanding questions for the group to answer

---
# Roadmap
<img src="causal_id.jpg" width="110%" style="display: block; margin: auto;" />
---
class: middle, inverse

# Difference-in-differences (DD)

---
## S. London cholera outbreak 1854

.pull-left[
- Londed was a crowded, dirty city w/ waste disposed directly in Thames River
- Disease poorly understood; cholera widely believed to be caused by miasma & contagion
- Outbreak in S. London in summer of 1854 killing over 5,000
  - Followed an earlier outbreak in 1849 that had killed >6,000
- Physician John Snow had developed a theory that these illnesses were water-borne and set out to prove it
]

.pull-right[
<img src="snow.jpg" width="1079" style="display: block; margin: auto;" />
]
---
# The "Grand Experiment" (I)

.pull-left[
- Water is supplied to households be competing private companies:
  1. Southwark & Vauxhall
  2. Lambeth
- Southwark & Vauxhall water from Thames
- Lambeth from Thames until 1852, then from Ditton (22 miles upstream)
- Some portions of the city receive water from only one of companies; others from both
]

.pull-right[
<img src="london_water.jpg" width="1067" style="display: block; margin: auto;" />
]

---
# The "Grand Experiment" (II)

.pull-left[
- When companies supply to same area, distributed quasi-randomly
- Snow tallies the deaths in all districts supplied by one, the other, or both companies as well as the deaths in the 1849 outbreak
]

.pull-right[
<img src="mode_commun.jpg" width="1369" style="display: block; margin: auto;" />
]

.blue[**We will now pause this history lesson for a short methodological break**]

---
# So many differences!!!

What is one approach by which we might estimate the effects of a policy change or intervention?

|                     |        Treatment group         
|------------------------------------------------------
| Before              |   `$Y_{0}$`                       
| After               |   `$Y_{1}$`

Could just subtract the mean value of "before" levels of the outcome from mean value of "after":

`$\Delta Y = \bar{Y_{1}} - \bar{Y_{0}}$`

.red[BUT], there could be lots of other things going on in between those two times!

---
# The "difference" in DD

What is one approach by which we might estimate the effects of a policy change or intervention?

|                     |        Treatment group               |        "Control" group
|-------------------------------------------------------------------------------------------------
| Before              |   `$Y_{0}^{D=1}$`                      |      `$Y_{0}^{D=0}$`
| After               |   `$Y_{1}^{D=1}$`                      |      `$Y_{1}^{D=0}$`

Difference-in-differences (DD) estimates are the difference of two differences:

`$\hat{ATE} = \color{blue}{(Y_{1}^{D=1} - Y_{0}^{D=1})} - \color{purple}{(Y_{1}^{D=0} - Y_{0}^{D=0})}$`

---
# Graphical DD

---
# John Snow's DD

**Table XII**. Deaths per 10,000 in homes served by Lambeth and Southwark & Vauxhall, 1849 and 1854

|                     |  Treatment = Lambeth    |  Control = S&V      | Diff-in-Diff
|-------------------------------------------------------------------------------------------------
| Before = 1849       |  85                     |   135               | 
| After = 1854        |  19                     |   147               |
| Difference          | -66                     |   12                | .blue[**-78**]

.pull-left[
<img src="table12.jpg" width="495" height="60%" style="display: block; margin: auto;" />
]

.pull-right[
.small[Clients of Southwark & Vauxhall experienced more deaths per 10,000 in the 1854 cholera outbreak than in the 1849 one. The Lambeth clients, therefore, might have expected to have more also, but they had MANY fewer. The only thing that changed was the source of the Lambeth water. From this evidence, Snow claimed that **the only possible cause was the water!**]
]

---
## DD by regression

We can get the same results for a two-period DD in a regression framework, which allows us to:
- Add statistical adjustments (see previous discussion on value in experiments)
- Model various functional forms, and more!

`$Y_{it} = \beta_{0} + \beta_{1}TREAT_{it} + \beta_{2}AFTER_{it} + \color{purple}{\beta_{3}}TREAT \times AFTER_{it} + \varepsilon_{it}$`

where, *TREAT* = 1 if in treatment and = 0 if in control and ...

*AFTER* = 1 if observed after the treatment occurred (even if you didn't experience the treatment) and *AFTER* = 0 if before treatment; **OR**

`$CHOLERA_{it} = \beta_{0} + \beta_{1}LAMBETH_{it} + \beta_{2}1854_{it} + \color{purple}{\beta_{3}}LAMBETH \times 1854_{it} + \varepsilon_{it}$`

Here, `$\color{purple}{\beta_{3}}$` is our causal parameter of interest. We can interpret it as the causal effect of living in a home that was served water from the Thames on the death rate of residents of those homes.
---
class: middle, inverse
# Synthesis and wrap-up

---
# Goals for today

1. Articulate in words and simple graphical representations challenges in identifying causal relationships in quantitative data

2. Articulate in words and using simple mathematical terms a framework for identifying causal relationships in quantitative data

3. Describe (conceptually) unit fixed effects and their strengths (and limitations) in research designs seeking to identify causal relationships

4. Describe the conceptual approach to identifying causal effects using the difference-in-differences framework

---
# Key logistics

.small[
- Review syllabus carefully

- Prepare questions in advance (partner work encouraged)

- Class canceled Jan. 15 for MLK Jr. Day; online video and submission of written responses to class questions

- Review session 
  + Review DD details
  + What else? (multi-level models? residuals/standard errors? notation?)
  + When?

- Data Analysis and Replication Exercises (DAREs)
- Project proposal by January 28
  - Meet w/ teaching staff to discuss at least once
  - In class scholarly presentation (March 11)
  - Written final research project (March 20; *optional feedback by March 13*)
]

---
class: inverse
# To-dos

### Week 2: Difference-in-differences
### Readings for next week:
- Murnane & Willet, Chapter 8
- Dynarski (2003), Does aid matter?
- Further, MHE: Ch. 5; 'Metrics: Ch. 5, Mixtape: Chs. 8 & 9

### Assignments Due
- Complete student survey on Canvas (Jan. 10)
- Watch recorded video and submit written responses to Class 2 questions (Jan. 16)
- DARE #1 due: 11:59pm January 21

---
# Feedback

## Plus/Deltas

- What worked about today's class?
- What could be improved or changed about the pedagogical process of today's class?

## Clear/Murky

- What substantively is most clear to you or got clarified during class today?
- What is the muddiest substantive topic for you?

- .blue[*For today only, could you please indicate (a) what times you are available next week for a review session; (b) what topics would you like to see included in the review?*]