The onset of the pandemic has had an undeniable impact on the field of education. During this workshop we have focused on that impact narrowly, on how the pandemic has impacted statewide data systems and the inferences they support. In this section, we will first frame and then characterize enrollment and participation. As noted in the next section lead by Adam Van Iwaarden, enrollment and participation has generally received little attention in educational assessment, as census testing was the norm. In contrast, now missing data may continue to complicate our analyses for as long as those students who didn’t test in 2020, 2021 or both travel through the educational system. Given this, we recommend that states and their partners develop a comprehensive set of analyses to understand enrollment and participation and run these analyses regularly. The burden of doing so can be reduced substantially by standing these analyses up in R, and in particular structuring the analyses using a set of custom functions that are summarized through one or more .Rmd files. In many ways, the impact of the pandemic on the field of measurement is leading the field towards practices that would have been useful all along.
With some exceptions, prior to the pandemic enrollment and participation in state assessment overlapped heavily in grades 3-8 and select grades in high school. Prior shifts towards non-participation, e.g., as a result of the opt out movement, tended to be localized and short-lived. However, in 2020 enrollment and participation decoupled completely due to a federal waiver on state testing, and in 2021 the overlap in enrollment and participation in assessment varied greatly from state to state.
For those seeking to understand shifts in enrollment and participation, key is understanding what is represented in the data they have access to. Data cleaning procedures across state and vendor partners are often complex, meaning it can be difficult to understand whether a dataset includes all enrolled students, or just those students who (a) are eligible to test, (b) who tested, (c) who tested and have valid scores, or (d) some other set of students. In addition, what constitutes enrollment and participation can and often does vary from state to state.
Acknowledging that the way we define enrollment and participation may not apply to any given context, we suggest the following working definitions:
During the 2020-2021 school year, enrollment fell by 3% nationally with state level decreases ranging from less than 1% to slightly above 5% (see also Ed Weeks’ reporting. In many cases, states have likely already identified students who have re-enrolled, and those who have not, meaning that one goal valuable goal of upcoming state analysis is to connect examinations of growth and status to patterns of enrollment and participation (in essence, treating these groups of students as a defacto subgroup).
We suggest starting with basic descriptions of participation and moving to increasingly complex analysis of who tested. Note that in the above section we addressed both enrollment and participation, here we focus solely on participation due to time constraints. All of the analysis and considerations shown still apply to enrollment, and in many cases shifts in enrollment may be more consequential than shifts in participation, as shifts in enrollment signal changes in the population of students being served in the state. However, enrollment involves considering the presence and absence of students across years, whereas participation generally involves only considering the presence and absence of students* within* each year, then comparing across years.
We start with some simple summary tables at the overall state level, as well as some descriptions of schools - the key unit of inference for school identification and support under ESSA. Finally, it’s quite easy to get lost in this kind of complex analysis. Students are nested in schools who are in turn nested in districts, and on top of that complexity, there is also grade and subject to consider, as well as host of student background variables and past student achievement. The key is to keep returning to the questions that are most value to you, as well as the costs and benefits of each analysis2.
To start, first load in the required packages, which includes the SPGData package. This package includes a simulated data set, the sgpData_LONG_COVID
object that will be used as the basis of this work. To simplify our analysis, we will focus just on grades 6 to 8, e.g., middle schools, in English Language arts during the 2017-18 (i.e. 2018) to 2020-21 (i.e., 2021) school years (noting that 2020 is missing). In doing so, we are losing a great deal of data, ending up with a small set of “partial” cohorts. Part of this reduction is simply the nature of any and all analysis involving schools (e.g., which typically divides up students into elementary, middle and high schools).
### Load required packages
## Data and data management
require(SGPdata)
require(data.table)
library(reshape2)
### Create Data Subset
<- SGPdata::sgpData_LONG_COVID
Demo_COVID <- Demo_COVID[Demo_COVID$GRADE %in% c("6", "7", "8") &
Demo_COVID $YEAR %in% c("2018", "2019", "2021", "2022") &
Demo_COVID$CONTENT_AREA == "ELA",]
Demo_COVID#Note that the GRADE and YEAR variables are character class
### Introduce Missing Data
#Here we introduce missing data as a function of school membership
#The workshop section on Missing Data and Multiple Imputation provides a
#much more detailed and nuanced approach to missing data
set.seed(145)
<- unique(Demo_COVID$SCHOOL_NUMBER[Demo_COVID$YEAR == "2021"])
school_missing <- data.frame(SCHOOL_NUMBER = school_missing,
school_missing PERCENT_MISSING = rlnorm(length(school_missing), meanlog = -.15, sdlog = 1)/20)
$PERCENT_MISSING[school_missing$PERCENT_MISSING < 0] <-0 #correct any outside of range probs
school_missing$PERCENT_MISSING[school_missing$PERCENT_MISSING > 1] <-1
school_missing
for(i in school_missing$SCHOOL_NUMBER){
#Introducing Missingess at the School Level - 2021
<- nrow(Demo_COVID[Demo_COVID$YEAR == "2021" & Demo_COVID$SCHOOL_NUMBER == i,])
school_n <- school_missing$PERCENT_MISSING[school_missing$SCHOOL_NUMBER == i]
school_prob <- rbinom(school_n , size=1, prob=school_prob)
missing_pattern <- Demo_COVID$SCALE_SCORE[Demo_COVID$YEAR == "2021" & Demo_COVID$SCHOOL_NUMBER == i]
school_scores == 1] <- NA
school_scores[missing_pattern $SCALE_SCORE[Demo_COVID$YEAR == "2021" & Demo_COVID$SCHOOL_NUMBER == i] <- school_scores
Demo_COVID
#Introducing Reduced Missingess at the School Level - 2021
<- nrow(Demo_COVID[Demo_COVID$YEAR == "2022" & Demo_COVID$SCHOOL_NUMBER == i,])
school_n <- school_missing$PERCENT_MISSING[school_missing$SCHOOL_NUMBER == i]/2
school_prob <- rbinom(school_n , size=1, prob=school_prob)
missing_pattern <- Demo_COVID$SCALE_SCORE[Demo_COVID$YEAR == "2022" & Demo_COVID$SCHOOL_NUMBER == i]
school_scores == 1] <- NA
school_scores[missing_pattern $SCALE_SCORE[Demo_COVID$YEAR == "2022" & Demo_COVID$SCHOOL_NUMBER == i] <- school_scores
Demo_COVID }
After loading the data, we’ll start by constructing a simple summary of the overall number of non-participating students by year. The table below provides the overall participation rates, both across the state as well as broken down by grade and by ethnicity.
GRADE | SUBGROUP | 2018 | 2019 | 2021 | 2022 |
---|---|---|---|---|---|
All | All | 100 | 100 | 91.8 | 95.8 |
6 | All | 100 | 100 | 91.9 | 95.5 |
7 | All | 100 | 100 | 91.6 | 95.6 |
8 | All | 100 | 100 | 91.9 | 96.2 |
All | African American | 100 | 100 | 93.1 | 96.0 |
All | Asian | 100 | 100 | 90.8 | 95.4 |
All | Hispanic | 100 | 100 | 91.6 | 96.1 |
All | Other | 100 | 100 | 91.9 | 95.8 |
All | White | 100 | 100 | 92.0 | 95.7 |
This kind of summary table can be specified to an arbitrary level of complexity, eventually ending up at the same level of complexity provided in current state level aggregate reporting (e.g., large spreadsheets that contain a great number of “intersections” of the data). We suggest considering the data in sufficient depth to understand overall trends, then isolating specific grades, schools or groups to investigate. One tool we found useful was logistic regression, as demonstrated below:
<- Demo_COVID[YEAR == "2021",]
explore.data $NON_TESTED <- ifelse(is.na(explore.data$SCALE_SCORE), 1, 0)
explore.data
#B. Convert Categorical Variables to Factors
$NON_TESTED <- as.factor(explore.data$NON_TESTED)
explore.data$GRADE <- as.factor(explore.data$GRADE)
explore.data$ETHNICITY <- factor(explore.data$ETHNICITY,
explore.datalevels = c("White", "African American",
"Asian", "Hispanic", "Other"))
$IEP_STATUS <- as.factor(explore.data$IEP_STATUS)
explore.data$GENDER <- as.factor(explore.data$GENDER)
explore.data
#C. Define Demographic factors
#C1. Get Demographic Factors
<- c("GRADE", "ETHNICITY", "FREE_REDUCED_LUNCH_STATUS",
fit.demos "ELL_STATUS", "IEP_STATUS", "GENDER")
<- paste0("NON_TESTED ~", paste0(fit.demos, collapse = " + "))
formula.text <- glm(formula=formula.text, family="binomial", data=explore.data)
logistic.fit summary(logistic.fit)
##
## Call:
## glm(formula = formula.text, family = "binomial", data = explore.data)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.5000 -0.4217 -0.4079 -0.3938 2.3663
##
## Coefficients:
## Estimate Std. Error z value
## (Intercept) -2.51798 0.06108 -41.224
## GRADE7 0.04056 0.05784 0.701
## GRADE8 0.01596 0.05928 0.269
## ETHNICITYAfrican American -0.14480 0.10039 -1.442
## ETHNICITYAsian 0.14948 0.06811 2.195
## ETHNICITYHispanic 0.05185 0.06129 0.846
## ETHNICITYOther 0.01995 0.08841 0.226
## FREE_REDUCED_LUNCH_STATUSFree Reduced Lunch: Yes 0.10194 0.04901 2.080
## ELL_STATUSELL: Yes 0.20983 0.08252 2.543
## IEP_STATUSIEP: Yes -0.04556 0.07681 -0.593
## GENDERGender: Male -0.02858 0.04839 -0.591
## Pr(>|z|)
## (Intercept) <2e-16 ***
## GRADE7 0.4831
## GRADE8 0.7877
## ETHNICITYAfrican American 0.1492
## ETHNICITYAsian 0.0282 *
## ETHNICITYHispanic 0.3976
## ETHNICITYOther 0.8215
## FREE_REDUCED_LUNCH_STATUSFree Reduced Lunch: Yes 0.0375 *
## ELL_STATUSELL: Yes 0.0110 *
## IEP_STATUSIEP: Yes 0.5531
## GENDERGender: Male 0.5547
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 13104 on 23135 degrees of freedom
## Residual deviance: 13081 on 23125 degrees of freedom
## AIC: 13103
##
## Number of Fisher Scoring iterations: 5
It is unclear to us how FAY may have changed definition(s) in states in light of the pandemic. Similar concerns about shifts in the meaning of variables like chronic absenteeism.↩︎
For example, in early work I ran some simple logistic models that predicted non-participation as a function of content area, grade and subgroup. In doing so, I included three way interactions (e.g., content area x grade x subgroup) which let me see where participation varied (e.g., particularly low for male students who were identified as both FRL and IEP). With some more work, I could have use the significant results from these models to calculate participation rates to calculate partipcation rates for subsets of students that are particularly low. However this work ultimately did not end up being as useful as other more direct investigations, as it would have required additional scaffolding to connect it to the ways in which states usually understand participation.↩︎