General Guidelines

The purpose of this assignment is to practice the concepts and vocabulary we have been modeling in class and implement some of the techniques we have learned. You may work on your own or collaborate with one partner. Please make sure that you engage in a a full, fair and mutually-agreeable collaboration if you do choose to collaborate. If you do collaborate, you should plan, execute and write-up your analyses together, not simply divide the work. Please make sure to indicate clearly when your work is joint and any other individual or resource (outside of class material) you consulted in your responses.

Submission Requirements

Please upload below two files on Canvas:

  1. An .html, .doc(x) or .pdf file that includes your typed responses (in your own words and not identical to anybody else’s except your partner), tables, and/or figures to the problems
  2. The .Rmd or .R file that you used to render the tables and figures in the above html/pdf.

If you have prior experience, you may choose to compose your answers in RMarkdown and render to a Word doc or pdf. If so, you can simply set up echo = TRUE and include = TRUE in your .Rmd file to allow both code and answers to display in a single output file. If you have no idea what the preceding sentence meant, ignore it.

Objectives of this assignment

Data Background

The dataset we’ll be using in this assignment is drawn from the NCRECE Teacher Professional Development Study (PDS) data. This professional development study was a randomized controlled evaluation of two forms of professional development (PD) - coursework (phase 1) and consultancy (phase 2) - delivered to about 490 early childhood education teachers across the nation. These PD supports aimed to improve teachers’ implementation of language/literacy activities and interactions with children, as well as promote gains in children’s social and academic development.

While the primary goals of the PD intervention were to improve language and literacy skills, researchers theorized that if teachers became more effective at teaching literacy, they might also improve the quality of their relationships with their students and families. In turn, these improved relationships might lead to other positive effects of the PD intervention. One pair of researchers (Hanno and Gonzalez, 2020) wanted to learn whether the PD intervention improved student attendance.

The paired assignments 1 and 2 will ask you to answer two research questions:

  1. Did teacher participation in a consultancy-based professional development decrease rates of student chronic absenteeism?
  2. Did the effects of teacher-participation in a consultancy-based professional development differ by students’ gender identities?

Sample. Our dataset is student-level data for 942 preschool students, which is the same sample used in the original Hanno and Gonzalez (2020) paper. The authors test several approaches to compute the outcome variable: chronic absenteeism. For simplifying reasons, the approach we use in our sample is one from one of their the robustness checks rather the one used in the main results of their paper.

Key variables. The outcome measure in our data is whether a student is “chronically absent” from school. In many jurisdictions, students are labelled “chronically absent” if they miss 10 percent or more of the school year. Given the typical 180-day school, students are, therefore, defined as chronically absent if they miss 18 or more days in a year.

Details on all variables are as follows:

Assignment Details

1. Read in the dataset

Open your RStudio, create a project and save it. Go to the root directory of the project and create folders named: “Code”, “Data”, “Figures” and “Tables.” Download the cat.csv dataset and store it in the folder “Data”. Create an R script (or .Rmd) file in the Code folder. Read the data into your R environment. You do not need to include this response in your memo; only in your code.

2. Observed relationship between absenteeism and treat

2.1. How many students in the treatment group were chronically absent and not chronically absent, respectively? How many students in the control group were chronically absent/not chronically absent? Write a 1-2 sentence response and create a 2x2 contingency table (frequencies in cell) to demonstrate your answer. (1 points)

2.2. What percent of students in the treatment group were chronically absent and not chronically absent, respectively? How about in the control group? Write a 1-2 sentence response and create a 2x2 table (percentages in cell) to demonstrate your answer. (1 points)

2.3. Create one figure to visualize the numbers of chronically absent/not chronically absent students in the treatment and control groups. Make sure to label the x- and y-axes. (2 points)

2.4. Based on your observations from 2.1 to 2.3, do you think there is a relationship between absenteeism and treat? Why or why not? (1 point)

3. Chi-squared goodness-of-fit test of categorical association

3.1. Our first research question of interest is whether teacher participation in professional development decreased rates of student chronic absenteeism. State your null hypothesis for this research question. (2 point)

3.2 If there were NO relationship between absenteeism and treat, on average in the population, how many students would you expect to be chronically absent/not chronically absent in each of the treatment/control groups? Re-create the table from question 2.1 to show these expected values and interpret the values in this new table in 2-3 sentences. (2 points)

3.3. Calculate the Chi-squared statistic by hand using the tables you generated in 2.1 and 3.2. In one sentence, state what this Chi-squared statistic represents. (2 points)

3.4. Perform a Chi-squared goodness-of-fit test in R to examine the relationship between absenteeism and treat. Write 3-4 sentences to interpret your results and answer the research question: Were students less likely to be chronically absent if their teachers participated the consultancy PD intervention? (2 points)

4. Sub-sample comparisons

4.1. We are also interested in whether absenteeism and treat are related for female students. Perform a Chi-squared goodness-of-fit test in R to investigate the relationship between absenteeism and treat for female students. Write 2-3 sentences to state your null hypothesis, interpret your results, and answer the research question: Were female students less likely to be chronically absent if their teachers participated in the consultancy PD intervention? (1 point)

4.2. Do the same for male students. (1 point)