Introduction

The onset of the pandemic has had an undeniable impact on the field of education. During this workshop we have focused on that impact narrowly, on how the pandemic has impacted statewide data systems and the inferences they support. In this section, we will first frame and then characterize enrollment and participation. As noted in the next section lead by Adam Van Iwaarden, enrollment and participation has generally received little attention in educational assessment, as census testing was the norm. In contrast, now missing data may continue to complicate our analyses for as long as those students who didn’t test in 2020, 2021 or both travel through the educational system. Given this, we recommend that states and their partners develop a comprehensive set of analyses to understand enrollment and participation and run these analyses regularly. The burden of doing so can be reduced substantially by standing these analyses up in R, and in particular structuring the analyses using a set of custom functions that are summarized through one or more .Rmd files. In many ways, the impact of the pandemic on the field of measurement is leading the field towards practices that would have been useful all along.

Enrollment and Participation

With some exceptions, prior to the pandemic enrollment and participation in state assessment overlapped heavily in grades 3-8 and select grades in high school. Prior shifts towards non-participation, e.g., as a result of the opt out movement, tended to be localized and short-lived. However, in 2020 enrollment and participation decoupled completely due to a federal waiver on state testing, and in 2021 the overlap in enrollment and participation in assessment varied greatly from state to state.

For those seeking to understand shifts in enrollment and participation, key is understanding what is represented in the data they have access to. Data cleaning procedures across state and vendor partners are often complex, meaning it can be difficult to understand whether a dataset includes all enrolled students, or just those students who (a) are eligible to test, (b) who tested, (c) who tested and have valid scores, or (d) some other set of students. In addition, what constitutes enrollment and participation can and often does vary from state to state.

Acknowledging that the way we define enrollment and participation may not apply to any given context, we suggest the following working definitions:

  • Enrollement: A student is enrolled if he or she is identified as attending a school within the state (e.g., based on definitions such as Full Academic Year 1 (FAY))
  • Participation: A student is participating if he or she has received a score of record on the state assessment.

During the 2020-2021 school year, enrollment fell by 3% nationally with state level decreases ranging from less than 1% to slightly above 5% (see also Ed Weeks’ reporting. In many cases, states have likely already identified students who have re-enrolled, and those who have not, meaning that one goal valuable goal of upcoming state analysis is to connect examinations of growth and status to patterns of enrollment and participation (in essence, treating these groups of students as a defacto subgroup).

Exploring Participation

We suggest starting with basic descriptions of participation and moving to increasingly complex analysis of who tested. Note that in the above section we addressed both enrollment and participation, here we focus solely on participation due to time constraints. All of the analysis and considerations shown still apply to enrollment, and in many cases shifts in enrollment may be more consequential than shifts in participation, as shifts in enrollment signal changes in the population of students being served in the state. However, enrollment involves considering the presence and absence of students across years, whereas participation generally involves only considering the presence and absence of students* within* each year, then comparing across years.

We start with some simple summary tables at the overall state level, as well as some descriptions of schools - the key unit of inference for school identification and support under ESSA. Finally, it’s quite easy to get lost in this kind of complex analysis. Students are nested in schools who are in turn nested in districts, and on top of that complexity, there is also grade and subject to consider, as well as host of student background variables and past student achievement. The key is to keep returning to the questions that are most value to you, as well as the costs and benefits of each analysis2.

Load Packages and Create Data

To start, first load in the required packages, which includes the SPGData package. This package includes a simulated data set, the sgpData_LONG_COVID object that will be used as the basis of this work. To simplify our analysis, we will focus just on grades 6 to 8, e.g., middle schools, in English Language arts during the 2017-18 (i.e. 2018) to 2020-21 (i.e., 2021) school years (noting that 2020 is missing). In doing so, we are losing a great deal of data, ending up with a small set of “partial” cohorts. Part of this reduction is simply the nature of any and all analysis involving schools (e.g., which typically divides up students into elementary, middle and high schools).

###   Load required packages
##    Data and data management  
  require(SGPdata)
  require(data.table)
  library(reshape2) 

### Create Data Subset
  Demo_COVID <- SGPdata::sgpData_LONG_COVID
  Demo_COVID <- Demo_COVID[Demo_COVID$GRADE %in% c("6", "7", "8") &
                           Demo_COVID$YEAR  %in% c("2018", "2019", "2021", "2022") &
                           Demo_COVID$CONTENT_AREA == "ELA",]
  #Note that the GRADE and YEAR variables are character class 
  
### Introduce Missing Data
  #Here we introduce missing data as a function of school membership 
  #The workshop section on Missing Data and Multiple Imputation provides a 
  #much more detailed and nuanced approach to missing data 
  set.seed(145)
  school_missing <- unique(Demo_COVID$SCHOOL_NUMBER[Demo_COVID$YEAR == "2021"])
  school_missing <- data.frame(SCHOOL_NUMBER   = school_missing, 
                             PERCENT_MISSING = rlnorm(length(school_missing), meanlog = -.15, sdlog = 1)/20)
  school_missing$PERCENT_MISSING[school_missing$PERCENT_MISSING < 0] <-0 #correct any outside of range probs
  school_missing$PERCENT_MISSING[school_missing$PERCENT_MISSING > 1] <-1
  
  for(i in school_missing$SCHOOL_NUMBER){
    #Introducing Missingess at the School Level - 2021
    school_n    <- nrow(Demo_COVID[Demo_COVID$YEAR == "2021" & Demo_COVID$SCHOOL_NUMBER == i,])
    school_prob <- school_missing$PERCENT_MISSING[school_missing$SCHOOL_NUMBER == i]
    missing_pattern <- rbinom(school_n , size=1, prob=school_prob)
    school_scores  <- Demo_COVID$SCALE_SCORE[Demo_COVID$YEAR == "2021" & Demo_COVID$SCHOOL_NUMBER == i]
    school_scores[missing_pattern == 1] <- NA
    Demo_COVID$SCALE_SCORE[Demo_COVID$YEAR == "2021" & Demo_COVID$SCHOOL_NUMBER == i] <- school_scores 
    
    #Introducing Reduced Missingess at the School Level - 2021
    school_n    <- nrow(Demo_COVID[Demo_COVID$YEAR == "2022" & Demo_COVID$SCHOOL_NUMBER == i,])
    school_prob <- school_missing$PERCENT_MISSING[school_missing$SCHOOL_NUMBER == i]/2
    missing_pattern <- rbinom(school_n , size=1, prob=school_prob)
    school_scores  <- Demo_COVID$SCALE_SCORE[Demo_COVID$YEAR == "2022" & Demo_COVID$SCHOOL_NUMBER == i]
    school_scores[missing_pattern == 1] <- NA
    Demo_COVID$SCALE_SCORE[Demo_COVID$YEAR == "2022" & Demo_COVID$SCHOOL_NUMBER == i] <- school_scores 
  }

The Student Level

After loading the data, we’ll start by constructing a simple summary of the overall number of non-participating students by year. The table below provides the overall participation rates, both across the state as well as broken down by grade and by ethnicity.

Overall Participation Rates
GRADE SUBGROUP 2018 2019 2021 2022
All All 100 100 91.8 95.8
6 All 100 100 91.9 95.5
7 All 100 100 91.6 95.6
8 All 100 100 91.9 96.2
All African American 100 100 93.1 96.0
All Asian 100 100 90.8 95.4
All Hispanic 100 100 91.6 96.1
All Other 100 100 91.9 95.8
All White 100 100 92.0 95.7

This kind of summary table can be specified to an arbitrary level of complexity, eventually ending up at the same level of complexity provided in current state level aggregate reporting (e.g., large spreadsheets that contain a great number of “intersections” of the data). We suggest considering the data in sufficient depth to understand overall trends, then isolating specific grades, schools or groups to investigate. One tool we found useful was logistic regression, as demonstrated below:

  explore.data <- Demo_COVID[YEAR == "2021",]
  explore.data$NON_TESTED <- ifelse(is.na(explore.data$SCALE_SCORE), 1, 0)  

   #B. Convert Categorical Variables to Factors
  explore.data$NON_TESTED <- as.factor(explore.data$NON_TESTED)
  explore.data$GRADE      <- as.factor(explore.data$GRADE)
  explore.data$ETHNICITY  <- factor(explore.data$ETHNICITY,
                                         levels = c("White", "African American",
                                                    "Asian", "Hispanic", "Other"))
  explore.data$IEP_STATUS <- as.factor(explore.data$IEP_STATUS)
  explore.data$GENDER     <- as.factor(explore.data$GENDER)
  
  #C. Define Demographic factors 
    #C1. Get Demographic Factors 
    fit.demos <- c("GRADE", "ETHNICITY", "FREE_REDUCED_LUNCH_STATUS",
                   "ELL_STATUS", "IEP_STATUS", "GENDER")
    formula.text <- paste0("NON_TESTED ~", paste0(fit.demos, collapse = " + "))
    logistic.fit <- glm(formula=formula.text, family="binomial", data=explore.data)
    summary(logistic.fit)
## 
## Call:
## glm(formula = formula.text, family = "binomial", data = explore.data)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.5000  -0.4217  -0.4079  -0.3938   2.3663  
## 
## Coefficients:
##                                                  Estimate Std. Error z value
## (Intercept)                                      -2.51798    0.06108 -41.224
## GRADE7                                            0.04056    0.05784   0.701
## GRADE8                                            0.01596    0.05928   0.269
## ETHNICITYAfrican American                        -0.14480    0.10039  -1.442
## ETHNICITYAsian                                    0.14948    0.06811   2.195
## ETHNICITYHispanic                                 0.05185    0.06129   0.846
## ETHNICITYOther                                    0.01995    0.08841   0.226
## FREE_REDUCED_LUNCH_STATUSFree Reduced Lunch: Yes  0.10194    0.04901   2.080
## ELL_STATUSELL: Yes                                0.20983    0.08252   2.543
## IEP_STATUSIEP: Yes                               -0.04556    0.07681  -0.593
## GENDERGender: Male                               -0.02858    0.04839  -0.591
##                                                  Pr(>|z|)    
## (Intercept)                                        <2e-16 ***
## GRADE7                                             0.4831    
## GRADE8                                             0.7877    
## ETHNICITYAfrican American                          0.1492    
## ETHNICITYAsian                                     0.0282 *  
## ETHNICITYHispanic                                  0.3976    
## ETHNICITYOther                                     0.8215    
## FREE_REDUCED_LUNCH_STATUSFree Reduced Lunch: Yes   0.0375 *  
## ELL_STATUSELL: Yes                                 0.0110 *  
## IEP_STATUSIEP: Yes                                 0.5531    
## GENDERGender: Male                                 0.5547    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 13104  on 23135  degrees of freedom
## Residual deviance: 13081  on 23125  degrees of freedom
## AIC: 13103
## 
## Number of Fisher Scoring iterations: 5

  1. It is unclear to us how FAY may have changed definition(s) in states in light of the pandemic. Similar concerns about shifts in the meaning of variables like chronic absenteeism.↩︎

  2. For example, in early work I ran some simple logistic models that predicted non-participation as a function of content area, grade and subgroup. In doing so, I included three way interactions (e.g., content area x grade x subgroup) which let me see where participation varied (e.g., particularly low for male students who were identified as both FRL and IEP). With some more work, I could have use the significant results from these models to calculate participation rates to calculate partipcation rates for subsets of students that are particularly low. However this work ultimately did not end up being as useful as other more direct investigations, as it would have required additional scaffolding to connect it to the ways in which states usually understand participation.↩︎