Researchers and policymakers are starting to examine the impact of the Covid pandemic on student learning. One question of interest concerns the extent to which we can compare students’ current test scores to those from before the pandemic (e.g., in 2019). For example, due to differences in participation rates across the years, the composition of students at a school in 2019 and 2021 may be substantially different. There may also likely be non-negligible amounts of missingness in the 2021 data.
One statistical method to address this missingness is multiple imputation (MI). Broadly, MI uses information from the observed data to generate a set of plausible values for the missing observations in the data. This procedure is repeated many times, accounting for sampling error that arises when generating these values. Model parameter estimates are then pooled across the imputed data sets (Enders, 2010; Fox & Weisberg, 2018). As Fox and Weisberg (2018) write, MI “takes into account not only uncertainty due to residual variation - that is, the inability to predict missing values without error from the observed data (e.g., by sampling from the estimated error distribution for a continuous variable or sampling from the estimated conditional probability distribution of a factor) - but also uncertainty in the parameter estimates used to obtain the predictions (by sampling from the estimated distribution of the parameters of the imputation model)” (p. 3).
There are numerous MI methods available, largely differentiated by the model used to generate the imputed values. In the context of learning loss analyses, users may implement MI to estimate mean scale score or student growth percentile (SGP) values to draw comparisons between pre- and post-pandemic test results. Notably, such comparisons are not aimed at accountability initiatives, but rather to provide a better understanding of how students are progressing. Such information can facilitate supportive programs to foster students’ learning.
Using a preliminary simulation, we evaluate the efficacy of multiple imputation for creating aggregated, “adjusted” scale scores and SGPs when data are missing across testing years. Observations were amputed from a simulated data set (available in the SGPdata
R package; Betebenner et al., 2021). The data include scale scores and SGPs, as well as school characteristics and student demographics. These data were amputed to reflect patterns of either missing completely at random (MCAR) or missing at random (MAR; see Enders [2010] or Fox and Weisberg [2018] for a review of missingness types). For the MAR data, observations were amputed based on school number, scale score, and either SGP (“Status with Growth”) or free/reduced lunch (FRL) and English language learner (ELL) status (“Status with Demographics”). Either 30%, 50%, or 70% of the observations were amputed to create the missing data files. Note that a Covid impact was not incorporated into the simulated data used for these analyses.
Six imputation methods were compared, including:
pan
package (L2PAN);lmer
function (L2LMER);pan
(L2PAN_LONG);lmer
(L2LMER_LONG);These methods were also compared to the condition where no imputation was implemented (i.e., “Observed”). All MI analyses were conducted using the mice
package (van Buuren & Groothuis-Oudshoorn, 2011), with calls to corresponding R packages (e.g., pan
[Zhao & Schafer, 2018] and lme4
[Bates et al., 2015]). Here, we focus on the ability of these MI methods to accurately impute either mean scale scores or SGPs. Specifically, if we consider the complete simulated data to be the (population-level) parameter values, then we are interested in the extent to which the imputed values align with the “true” values from the complete data set. Data are either aggregated at the grade and content area level (e.g., Grade 3 Math, Grade 3 ELA, etc.) within each school, or aggregated at the school level. Note that observations for which either the grade/content area size or the school size is less than 10 are removed from the summary analyses.
In this summary, we quantify the performance of the aforementioned MI methods using three indices:
Recent research suggests that an MI method is performing relatively well when the percent bias is less than 5% (Miri et al., 2020; Qi et al., 2010) and the coverage rate is greater than 0.90 (Demirtas, 2004; Qi et al., 2010). Additionally, a p-value for the \(F_1\) statistic greater than \(\alpha\) indicates that we fail to reject the null hypothesis of equivalent true and imputed values.
We first compare the six MI methods on average percent bias and simplified CI coverage rate (CR) as a function of a variety of factors, including grade, percentage missing, and missingness type. The data are either aggregated at the grade/content area level (GC), or at the school level.
L2PAN
|
L2PAN_LONG
|
LMER
|
LMER_LONG
|
PMM
|
RQ
|
Observed
|
||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Percent Bias
|
CR
|
Percent Bias
|
CR
|
Percent Bias
|
CR
|
Percent Bias
|
CR
|
Percent Bias
|
CR
|
Percent Bias
|
CR
|
Percent Bias
|
CR
|
|||||||||||||||
Grade | SS | SGP | SS | SGP | SS | SGP | SS | SGP | SS | SGP | SS | SGP | SS | SGP | SS | SGP | SS | SGP | SS | SGP | SS | SGP | SS | SGP | SS | SGP | SS | SGP |
30% Missing | ||||||||||||||||||||||||||||
3 | 0.223 | 0.920 | 0.223 | 0.920 | 0.761 | 0.810 | 0.761 | 0.810 | 0.233 | 0.957 | 0.227 | 0.910 | 0.233 | 0.957 | ||||||||||||||
4 | 0.198 | 0.927 | 0.198 | 0.927 | 0.708 | 0.804 | 0.708 | 0.804 | 0.206 | 0.964 | 0.201 | 0.923 | 0.206 | 0.964 | ||||||||||||||
5 | 0.166 | 2.805 | 0.946 | 0.952 | 0.431 | 7.407 | 0.911 | 0.824 | 0.680 | 8.156 | 0.825 | 0.743 | 0.689 | 8.179 | 0.831 | 0.748 | 0.408 | 6.267 | 0.850 | 0.845 | 0.413 | 6.335 | 0.830 | 0.829 | 0.408 | 6.267 | 0.850 | 0.845 |
6 | 0.152 | 3.000 | 0.946 | 0.948 | 0.418 | 7.284 | 0.853 | 0.747 | 0.612 | 7.817 | 0.789 | 0.694 | 0.634 | 7.878 | 0.799 | 0.701 | 0.360 | 6.072 | 0.805 | 0.798 | 0.361 | 6.121 | 0.795 | 0.787 | 0.360 | 6.072 | 0.805 | 0.798 |
7 | 0.121 | 2.138 | 0.948 | 0.950 | 0.386 | 5.853 | 0.765 | 0.725 | 0.584 | 6.963 | 0.748 | 0.650 | 0.596 | 6.984 | 0.738 | 0.650 | 0.335 | 5.116 | 0.765 | 0.755 | 0.342 | 5.166 | 0.750 | 0.754 | 0.335 | 5.116 | 0.765 | 0.755 |
8 | 0.116 | 2.103 | 0.946 | 0.954 | 0.389 | 7.119 | 0.721 | 0.634 | 0.538 | 8.387 | 0.725 | 0.563 | 0.531 | 8.354 | 0.733 | 0.571 | 0.359 | 6.327 | 0.715 | 0.696 | 0.364 | 6.340 | 0.695 | 0.687 | 0.359 | 6.327 | 0.715 | 0.696 |
50% Missing | ||||||||||||||||||||||||||||
3 | 0.360 | 0.909 | 0.360 | 0.909 | 1.255 | 0.732 | 1.254 | 0.731 | 0.374 | 0.967 | 0.357 | 0.903 | 0.374 | 0.967 | ||||||||||||||
4 | 0.340 | 0.909 | 0.340 | 0.909 | 1.158 | 0.725 | 1.155 | 0.727 | 0.361 | 0.969 | 0.344 | 0.905 | 0.361 | 0.969 | ||||||||||||||
5 | 0.315 | 5.205 | 0.942 | 0.947 | 0.735 | 12.110 | 0.861 | 0.745 | 1.107 | 12.705 | 0.746 | 0.692 | 1.133 | 12.729 | 0.749 | 0.691 | 0.677 | 10.431 | 0.790 | 0.782 | 0.688 | 10.597 | 0.766 | 0.751 | 0.677 | 10.431 | 0.790 | 0.782 |
6 | 0.259 | 5.394 | 0.945 | 0.942 | 0.727 | 11.907 | 0.790 | 0.662 | 1.004 | 12.415 | 0.709 | 0.630 | 1.036 | 12.446 | 0.725 | 0.637 | 0.586 | 10.119 | 0.749 | 0.728 | 0.592 | 10.245 | 0.736 | 0.710 | 0.586 | 10.119 | 0.749 | 0.728 |
7 | 0.193 | 3.280 | 0.953 | 0.951 | 0.648 | 9.411 | 0.696 | 0.660 | 0.929 | 10.182 | 0.677 | 0.609 | 0.947 | 10.208 | 0.672 | 0.610 | 0.533 | 7.872 | 0.708 | 0.699 | 0.540 | 7.935 | 0.710 | 0.699 | 0.533 | 7.872 | 0.708 | 0.699 |
8 | 0.218 | 3.723 | 0.930 | 0.936 | 0.682 | 11.835 | 0.619 | 0.524 | 0.939 | 13.530 | 0.653 | 0.468 | 0.932 | 13.544 | 0.658 | 0.474 | 0.621 | 10.751 | 0.599 | 0.576 | 0.625 | 10.814 | 0.597 | 0.571 | 0.621 | 10.751 | 0.599 | 0.576 |
70% Missing | ||||||||||||||||||||||||||||
3 | 0.517 | 0.915 | 0.517 | 0.915 | 1.735 | 0.684 | 1.735 | 0.684 | 0.648 | 0.975 | 0.542 | 0.918 | 0.648 | 0.975 | ||||||||||||||
4 | 0.480 | 0.914 | 0.480 | 0.914 | 1.624 | 0.659 | 1.624 | 0.659 | 0.614 | 0.978 | 0.507 | 0.914 | 0.614 | 0.978 | ||||||||||||||
5 | 0.464 | 7.496 | 0.946 | 0.947 | 0.999 | 15.988 | 0.860 | 0.694 | 1.546 | 16.559 | 0.684 | 0.659 | 1.570 | 16.601 | 0.696 | 0.660 | 0.931 | 14.276 | 0.758 | 0.745 | 0.939 | 14.405 | 0.743 | 0.715 | 0.931 | 14.276 | 0.758 | 0.745 |
6 | 0.416 | 7.908 | 0.949 | 0.935 | 1.027 | 15.472 | 0.717 | 0.606 | 1.398 | 16.016 | 0.652 | 0.586 | 1.453 | 16.056 | 0.667 | 0.587 | 0.802 | 13.635 | 0.716 | 0.690 | 0.809 | 13.807 | 0.699 | 0.666 | 0.802 | 13.635 | 0.716 | 0.690 |
7 | 0.304 | 4.910 | 0.950 | 0.942 | 0.918 | 12.710 | 0.640 | 0.598 | 1.321 | 13.169 | 0.607 | 0.581 | 1.348 | 13.239 | 0.600 | 0.588 | 0.749 | 11.051 | 0.660 | 0.653 | 0.756 | 11.108 | 0.655 | 0.644 | 0.749 | 11.051 | 0.660 | 0.653 |
8 | 0.297 | 5.764 | 0.938 | 0.938 | 0.973 | 16.160 | 0.553 | 0.450 | 1.294 | 17.581 | 0.605 | 0.423 | 1.286 | 17.568 | 0.603 | 0.422 | 0.865 | 14.922 | 0.565 | 0.525 | 0.864 | 14.928 | 0.569 | 0.513 | 0.865 | 14.922 | 0.565 | 0.525 |
L2PAN
|
L2PAN_LONG
|
LMER
|
LMER_LONG
|
PMM
|
RQ
|
Observed
|
||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Percent Bias
|
CR
|
Percent Bias
|
CR
|
Percent Bias
|
CR
|
Percent Bias
|
CR
|
Percent Bias
|
CR
|
Percent Bias
|
CR
|
Percent Bias
|
CR
|
|||||||||||||||
Grade | SS | SGP | SS | SGP | SS | SGP | SS | SGP | SS | SGP | SS | SGP | SS | SGP | SS | SGP | SS | SGP | SS | SGP | SS | SGP | SS | SGP | SS | SGP | SS | SGP |
30% Missing | ||||||||||||||||||||||||||||
3 | 0.619 | 0.862 | 0.619 | 0.862 | 1.648 | 0.567 | 1.648 | 0.567 | 0.656 | 0.915 | 0.592 | 0.847 | 0.656 | 0.915 | ||||||||||||||
4 | 0.558 | 0.849 | 0.558 | 0.849 | 1.418 | 0.557 | 1.418 | 0.557 | 0.554 | 0.921 | 0.549 | 0.843 | 0.554 | 0.921 | ||||||||||||||
5 | 0.262 | 4.605 | 0.946 | 0.913 | 0.454 | 7.885 | 0.917 | 0.758 | 1.448 | 9.951 | 0.544 | 0.613 | 1.253 | 9.506 | 0.625 | 0.651 | 0.407 | 6.629 | 0.858 | 0.853 | 0.420 | 6.858 | 0.826 | 0.816 | 0.407 | 6.629 | 0.858 | 0.853 |
6 | 0.210 | 4.107 | 0.933 | 0.909 | 0.508 | 7.594 | 0.823 | 0.714 | 1.261 | 8.685 | 0.522 | 0.621 | 1.130 | 8.472 | 0.606 | 0.660 | 0.343 | 6.132 | 0.827 | 0.822 | 0.353 | 6.362 | 0.798 | 0.795 | 0.343 | 6.132 | 0.827 | 0.822 |
7 | 0.246 | 4.209 | 0.926 | 0.902 | 0.528 | 6.928 | 0.688 | 0.709 | 1.534 | 8.611 | 0.331 | 0.577 | 1.253 | 8.291 | 0.453 | 0.607 | 0.393 | 6.308 | 0.761 | 0.740 | 0.393 | 6.410 | 0.741 | 0.726 | 0.393 | 6.308 | 0.761 | 0.740 |
8 | 0.215 | 3.485 | 0.917 | 0.909 | 0.553 | 8.093 | 0.662 | 0.586 | 1.392 | 10.283 | 0.341 | 0.463 | 1.053 | 9.801 | 0.483 | 0.493 | 0.434 | 7.267 | 0.691 | 0.686 | 0.455 | 7.465 | 0.661 | 0.652 | 0.434 | 7.267 | 0.691 | 0.686 |
50% Missing | ||||||||||||||||||||||||||||
3 | 1.082 | 0.815 | 1.082 | 0.815 | 2.536 | 0.435 | 2.536 | 0.435 | 1.146 | 0.900 | 1.098 | 0.792 | 1.146 | 0.900 | ||||||||||||||
4 | 1.022 | 0.792 | 1.022 | 0.792 | 2.350 | 0.409 | 2.350 | 0.409 | 1.052 | 0.913 | 1.038 | 0.783 | 1.052 | 0.913 | ||||||||||||||
5 | 0.410 | 7.068 | 0.946 | 0.924 | 0.729 | 12.045 | 0.874 | 0.716 | 2.283 | 13.473 | 0.422 | 0.641 | 1.700 | 12.864 | 0.600 | 0.680 | 0.643 | 10.396 | 0.814 | 0.810 | 0.668 | 11.033 | 0.775 | 0.750 | 0.643 | 10.396 | 0.814 | 0.810 |
6 | 0.361 | 6.905 | 0.931 | 0.908 | 0.839 | 11.990 | 0.743 | 0.636 | 2.028 | 12.343 | 0.417 | 0.602 | 1.518 | 12.078 | 0.577 | 0.630 | 0.566 | 9.981 | 0.765 | 0.748 | 0.595 | 10.368 | 0.733 | 0.706 | 0.566 | 9.981 | 0.765 | 0.748 |
7 | 0.385 | 5.860 | 0.920 | 0.905 | 0.863 | 10.274 | 0.596 | 0.653 | 2.434 | 11.184 | 0.232 | 0.578 | 1.615 | 10.861 | 0.454 | 0.605 | 0.613 | 9.673 | 0.688 | 0.677 | 0.613 | 9.442 | 0.685 | 0.661 | 0.613 | 9.673 | 0.688 | 0.677 |
8 | 0.369 | 5.865 | 0.901 | 0.887 | 0.875 | 12.565 | 0.572 | 0.498 | 2.256 | 14.802 | 0.228 | 0.420 | 1.416 | 14.414 | 0.473 | 0.441 | 0.700 | 11.679 | 0.603 | 0.586 | 0.702 | 11.850 | 0.599 | 0.556 | 0.700 | 11.679 | 0.603 | 0.586 |
70% Missing | ||||||||||||||||||||||||||||
3 | 1.714 | 0.794 | 1.714 | 0.794 | 3.594 | 0.314 | 3.594 | 0.314 | 1.836 | 0.873 | 1.782 | 0.764 | 1.836 | 0.873 | ||||||||||||||
4 | 1.640 | 0.769 | 1.640 | 0.769 | 3.444 | 0.295 | 3.444 | 0.295 | 1.724 | 0.893 | 1.671 | 0.755 | 1.724 | 0.893 | ||||||||||||||
5 | 0.642 | 10.629 | 0.951 | 0.930 | 1.026 | 16.033 | 0.867 | 0.683 | 3.217 | 16.539 | 0.312 | 0.649 | 1.933 | 16.406 | 0.630 | 0.664 | 0.902 | 14.304 | 0.788 | 0.777 | 0.937 | 15.242 | 0.759 | 0.715 | 0.902 | 14.304 | 0.788 | 0.777 |
6 | 0.556 | 10.041 | 0.930 | 0.910 | 1.162 | 15.789 | 0.685 | 0.582 | 2.901 | 15.572 | 0.325 | 0.590 | 1.751 | 15.749 | 0.598 | 0.594 | 0.790 | 13.415 | 0.723 | 0.700 | 0.847 | 14.188 | 0.686 | 0.649 | 0.790 | 13.415 | 0.723 | 0.700 |
7 | 0.588 | 8.366 | 0.921 | 0.906 | 1.222 | 13.242 | 0.509 | 0.599 | 3.470 | 13.632 | 0.160 | 0.579 | 1.772 | 13.758 | 0.514 | 0.570 | 0.830 | 12.891 | 0.659 | 0.643 | 0.834 | 12.713 | 0.648 | 0.615 | 0.830 | 12.891 | 0.659 | 0.643 |
8 | 0.593 | 9.505 | 0.897 | 0.871 | 1.196 | 16.893 | 0.495 | 0.435 | 3.202 | 18.518 | 0.165 | 0.390 | 1.561 | 18.269 | 0.515 | 0.414 | 0.940 | 15.870 | 0.567 | 0.535 | 0.944 | 16.025 | 0.567 | 0.519 | 0.940 | 15.870 | 0.567 | 0.535 |
L2PAN
|
L2PAN_LONG
|
LMER
|
LMER_LONG
|
PMM
|
RQ
|
Observed
|
||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Percent Bias
|
CR
|
Percent Bias
|
CR
|
Percent Bias
|
CR
|
Percent Bias
|
CR
|
Percent Bias
|
CR
|
Percent Bias
|
CR
|
Percent Bias
|
CR
|
|||||||||||||||
Grade | SS | SGP | SS | SGP | SS | SGP | SS | SGP | SS | SGP | SS | SGP | SS | SGP | SS | SGP | SS | SGP | SS | SGP | SS | SGP | SS | SGP | SS | SGP | SS | SGP |
30% Missing | ||||||||||||||||||||||||||||
3 | 1.473 | 0.484 | 1.475 | 0.482 | 2.486 | 0.293 | 2.486 | 0.293 | 1.469 | 0.568 | 1.484 | 0.457 | 1.469 | 0.568 | ||||||||||||||
4 | 1.364 | 0.461 | 1.362 | 0.460 | 2.229 | 0.289 | 2.229 | 0.289 | 1.361 | 0.543 | 1.376 | 0.434 | 1.361 | 0.543 | ||||||||||||||
5 | 0.323 | 5.022 | 0.937 | 0.911 | 0.450 | 7.499 | 0.921 | 0.795 | 2.149 | 8.637 | 0.286 | 0.669 | 1.812 | 8.310 | 0.406 | 0.722 | 0.397 | 6.681 | 0.849 | 0.847 | 0.410 | 6.796 | 0.789 | 0.820 | 0.397 | 6.681 | 0.849 | 0.847 |
6 | 0.273 | 4.780 | 0.906 | 0.890 | 0.796 | 7.457 | 0.636 | 0.718 | 1.905 | 8.333 | 0.304 | 0.627 | 1.610 | 7.974 | 0.441 | 0.696 | 0.353 | 6.294 | 0.834 | 0.822 | 0.358 | 6.350 | 0.785 | 0.786 | 0.353 | 6.294 | 0.834 | 0.822 |
7 | 0.314 | 4.523 | 0.914 | 0.902 | 0.789 | 6.482 | 0.512 | 0.696 | 2.354 | 7.290 | 0.143 | 0.617 | 1.856 | 6.725 | 0.215 | 0.678 | 0.407 | 6.095 | 0.768 | 0.753 | 0.404 | 6.255 | 0.723 | 0.734 | 0.407 | 6.095 | 0.768 | 0.753 |
8 | 0.293 | 4.513 | 0.892 | 0.886 | 0.730 | 8.145 | 0.481 | 0.588 | 2.168 | 9.643 | 0.138 | 0.535 | 1.599 | 9.143 | 0.242 | 0.588 | 0.430 | 7.438 | 0.694 | 0.673 | 0.423 | 7.408 | 0.648 | 0.647 | 0.430 | 7.438 | 0.694 | 0.673 |
50% Missing | ||||||||||||||||||||||||||||
3 | 2.676 | 0.313 | 2.676 | 0.313 | 3.887 | 0.151 | 3.887 | 0.151 | 2.664 | 0.403 | 2.663 | 0.282 | 2.664 | 0.403 | ||||||||||||||
4 | 2.520 | 0.318 | 2.520 | 0.318 | 3.702 | 0.163 | 3.702 | 0.163 | 2.521 | 0.411 | 2.521 | 0.287 | 2.521 | 0.411 | ||||||||||||||
5 | 0.490 | 7.599 | 0.934 | 0.913 | 0.683 | 11.463 | 0.899 | 0.738 | 3.330 | 12.074 | 0.176 | 0.685 | 2.323 | 11.720 | 0.400 | 0.731 | 0.641 | 10.294 | 0.800 | 0.793 | 0.658 | 10.693 | 0.729 | 0.758 | 0.641 | 10.294 | 0.800 | 0.793 |
6 | 0.447 | 7.698 | 0.898 | 0.881 | 1.259 | 11.689 | 0.529 | 0.655 | 3.078 | 12.105 | 0.172 | 0.618 | 2.069 | 11.535 | 0.446 | 0.684 | 0.565 | 9.895 | 0.775 | 0.746 | 0.612 | 10.438 | 0.700 | 0.708 | 0.565 | 9.895 | 0.775 | 0.746 |
7 | 0.469 | 6.662 | 0.903 | 0.893 | 1.234 | 9.769 | 0.404 | 0.645 | 3.594 | 9.951 | 0.083 | 0.618 | 2.232 | 9.672 | 0.246 | 0.666 | 0.634 | 9.817 | 0.695 | 0.673 | 0.617 | 9.651 | 0.650 | 0.657 | 0.634 | 9.817 | 0.695 | 0.673 |
8 | 0.458 | 6.747 | 0.885 | 0.876 | 1.126 | 12.720 | 0.403 | 0.501 | 3.449 | 14.350 | 0.068 | 0.474 | 1.963 | 13.834 | 0.259 | 0.494 | 0.671 | 11.711 | 0.609 | 0.580 | 0.677 | 12.094 | 0.566 | 0.562 | 0.671 | 11.711 | 0.609 | 0.580 |
70% Missing | ||||||||||||||||||||||||||||
3 | 4.097 | 0.226 | 4.097 | 0.226 | 5.488 | 0.080 | 5.488 | 0.080 | 4.075 | 0.305 | 4.081 | 0.201 | 4.075 | 0.305 | ||||||||||||||
4 | 3.933 | 0.245 | 3.933 | 0.245 | 5.394 | 0.092 | 5.394 | 0.092 | 3.959 | 0.311 | 3.947 | 0.215 | 3.959 | 0.311 | ||||||||||||||
5 | 0.747 | 10.554 | 0.939 | 0.921 | 1.367 | 15.214 | 0.952 | 0.690 | 4.635 | 15.291 | 0.112 | 0.679 | 2.425 | 15.274 | 0.495 | 0.697 | 0.961 | 13.960 | 0.743 | 0.758 | 0.984 | 14.628 | 0.672 | 0.713 | 0.961 | 13.960 | 0.743 | 0.758 |
6 | 0.720 | 10.956 | 0.902 | 0.893 | 1.636 | 15.429 | 0.484 | 0.603 | 4.400 | 15.257 | 0.089 | 0.605 | 2.130 | 15.085 | 0.515 | 0.630 | 0.808 | 13.518 | 0.726 | 0.695 | 0.914 | 14.246 | 0.622 | 0.652 | 0.808 | 13.518 | 0.726 | 0.695 |
7 | 0.684 | 8.897 | 0.908 | 0.905 | 1.635 | 12.798 | 0.328 | 0.603 | 5.061 | 12.756 | 0.051 | 0.601 | 2.214 | 12.957 | 0.358 | 0.602 | 0.888 | 13.460 | 0.655 | 0.627 | 0.859 | 13.170 | 0.607 | 0.600 | 0.888 | 13.460 | 0.655 | 0.627 |
8 | 0.732 | 10.794 | 0.879 | 0.854 | 1.523 | 16.992 | 0.339 | 0.444 | 4.894 | 18.173 | 0.033 | 0.436 | 1.943 | 18.206 | 0.399 | 0.423 | 0.903 | 15.971 | 0.582 | 0.549 | 0.917 | 16.316 | 0.561 | 0.547 | 0.903 | 15.971 | 0.582 | 0.549 |
L2PAN
|
L2PAN_LONG
|
LMER
|
LMER_LONG
|
PMM
|
RQ
|
Observed
|
||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Percent Bias
|
CR
|
Percent Bias
|
CR
|
Percent Bias
|
CR
|
Percent Bias
|
CR
|
Percent Bias
|
CR
|
Percent Bias
|
CR
|
Percent Bias
|
CR
|
|||||||||||||||
Percent Missing | SS | SGP | SS | SGP | SS | SGP | SS | SGP | SS | SGP | SS | SGP | SS | SGP | SS | SGP | SS | SGP | SS | SGP | SS | SGP | SS | SGP | SS | SGP | SS | SGP |
MCAR | ||||||||||||||||||||||||||||
30% Missing | 0.074 | 1.801 | 0.925 | 0.929 | 0.184 | 5.449 | 0.795 | 0.637 | 0.527 | 5.993 | 0.513 | 0.572 | 0.528 | 6.011 | 0.515 | 0.582 | 0.170 | 4.564 | 0.817 | 0.686 | 0.172 | 4.623 | 0.762 | 0.670 | 0.170 | 4.564 | 0.817 | 0.686 |
50% Missing | 0.138 | 3.474 | 0.918 | 0.917 | 0.320 | 8.911 | 0.741 | 0.544 | 0.876 | 9.501 | 0.413 | 0.503 | 0.878 | 9.528 | 0.416 | 0.505 | 0.291 | 7.615 | 0.795 | 0.611 | 0.288 | 7.720 | 0.723 | 0.585 | 0.291 | 7.615 | 0.795 | 0.611 |
70% Missing | 0.238 | 5.056 | 0.901 | 0.904 | 0.465 | 11.817 | 0.701 | 0.479 | 1.243 | 12.236 | 0.348 | 0.454 | 1.248 | 12.259 | 0.351 | 0.463 | 0.464 | 10.428 | 0.755 | 0.552 | 0.431 | 10.511 | 0.701 | 0.536 | 0.464 | 10.428 | 0.755 | 0.552 |
MAR (Status with Demographics) | ||||||||||||||||||||||||||||
30% Missing | 0.305 | 2.904 | 0.726 | 0.874 | 0.356 | 5.932 | 0.638 | 0.600 | 1.415 | 7.374 | 0.174 | 0.480 | 1.287 | 6.981 | 0.210 | 0.518 | 0.341 | 4.734 | 0.697 | 0.709 | 0.323 | 5.095 | 0.609 | 0.660 | 0.341 | 4.734 | 0.697 | 0.709 |
50% Missing | 0.521 | 4.977 | 0.647 | 0.855 | 0.607 | 9.157 | 0.550 | 0.524 | 2.263 | 10.257 | 0.103 | 0.473 | 1.869 | 9.725 | 0.164 | 0.511 | 0.598 | 7.709 | 0.612 | 0.637 | 0.580 | 8.230 | 0.502 | 0.580 | 0.598 | 7.709 | 0.612 | 0.637 |
70% Missing | 0.830 | 7.918 | 0.598 | 0.843 | 0.948 | 12.033 | 0.473 | 0.481 | 3.255 | 12.494 | 0.053 | 0.471 | 2.380 | 12.419 | 0.148 | 0.467 | 0.934 | 10.435 | 0.536 | 0.593 | 0.927 | 11.268 | 0.430 | 0.527 | 0.934 | 10.435 | 0.536 | 0.593 |
MAR (Status with Growth) | ||||||||||||||||||||||||||||
30% Missing | 0.665 | 3.299 | 0.385 | 0.856 | 0.669 | 5.570 | 0.345 | 0.632 | 2.181 | 6.478 | 0.031 | 0.513 | 1.968 | 6.142 | 0.044 | 0.571 | 0.674 | 4.839 | 0.347 | 0.693 | 0.694 | 4.955 | 0.297 | 0.666 | 0.674 | 4.839 | 0.347 | 0.693 |
50% Missing | 1.237 | 5.242 | 0.306 | 0.846 | 1.221 | 8.776 | 0.245 | 0.562 | 3.492 | 9.265 | 0.012 | 0.517 | 2.860 | 8.884 | 0.029 | 0.560 | 1.249 | 7.631 | 0.250 | 0.633 | 1.269 | 7.947 | 0.204 | 0.589 | 1.249 | 7.631 | 0.250 | 0.633 |
70% Missing | 1.961 | 7.527 | 0.258 | 0.847 | 2.077 | 11.580 | 0.160 | 0.503 | 4.988 | 11.564 | 0.003 | 0.494 | 3.603 | 11.510 | 0.028 | 0.515 | 1.999 | 10.335 | 0.174 | 0.587 | 2.036 | 10.865 | 0.138 | 0.535 | 1.999 | 10.335 | 0.174 | 0.587 |
The following figures provide more nuanced insight into the performance of the six MI methods as a function of important factors, including the percentage and type of missingness as well as the grade/content area size. Analyses are separated by the imputed value (i.e., either the scale score or the SGP).
\(~\)
\(~\)
\(~\)
\(~\)
\(~\)
\(~\)
\(~\)
\(~\)
Many of the above figures are replicated, here aggregating at the school level rather than by grade and content area within a school.
\(~\)
\(~\)
\(~\)
\(~\)
\(~\)
\(~\)
\(~\)
\(~\)
The following models are preliminary mechanisms for understanding which factors are related to relatively worse MI performance. To simplify these models, we examine either raw bias or absolute bias.
We use the fixest
package (Berge, 2018) to regress the bias variables on grade/content area size, percentage missing, missingness type, and imputation method; grade and content area are also included in the model as fixed effects. Currently, these are simple additive models. More complex models may be incorporated in future analyses (e.g., including interactions, a random effect for the school, etc.).
Scale Scores | SGPs | |
---|---|---|
N | 0.0004 (0.0021) | 0.0054* (0.0021) |
MISS_PERC50%Missing | 2.046*** (0.3475) | -0.1588** (0.0377) |
MISS_PERC70%Missing | 4.432*** (0.7589) | -0.3639** (0.0732) |
MISS_TYPEDEMOG | 5.709*** (0.6697) | -0.0496 (0.0292) |
MISS_TYPEGROWTH | 10.36*** (1.535) | 0.0746 (0.0493) |
i(var=IMP_METHOD,ref=“Observed”)L2LMER_LONG | -0.2410 (0.8601) | -0.3768. (0.1911) |
i(var=IMP_METHOD,ref=“Observed”)L2PAN_LONG | -6.195*** (1.314) | -0.4023. (0.2021) |
i(var=IMP_METHOD,ref=“Observed”)L2LMER | 1.698*** (0.2889) | -0.3242 (0.1859) |
i(var=IMP_METHOD,ref=“Observed”)L2PAN | -5.253*** (1.052) | -0.0308 (0.1878) |
i(var=IMP_METHOD,ref=“Observed”)RQ | -5.282*** (1.060) | -0.2582 (0.2059) |
i(var=IMP_METHOD,ref=“Observed”)PMM | -5.290*** (1.072) | -0.3420 (0.2087) |
Fixed-Effects: | —————– | —————— |
GRADE^CONTENT_AREA | Yes | Yes |
________________________________________ | _________________ | __________________ |
S.E.: Clustered | by: GRA.^CON. | by: GRA.^CON. |
Observations | 96,075 | 51,849 |
R2 | 0.35696 | 0.00418 |
Within R2 | 0.32018 | 0.00356 |
\(~\)
Scale Scores | SGPs | |
---|---|---|
N | -0.0100*** (0.0017) | -0.0152*** (0.0012) |
MISS_PERC50%Missing | 2.763*** (0.2529) | 1.574*** (0.0909) |
MISS_PERC70%Missing | 5.864*** (0.6021) | 3.130*** (0.1531) |
MISS_TYPEDEMOG | 3.600*** (0.4151) | 0.5515*** (0.0712) |
MISS_TYPEGROWTH | 8.268*** (1.261) | 0.5947*** (0.0801) |
i(var=IMP_METHOD,ref=“Observed”)L2LMER_LONG | 1.117 (0.8447) | 2.581*** (0.2756) |
i(var=IMP_METHOD,ref=“Observed”)L2PAN_LONG | -3.355*** (0.5251) | 2.312*** (0.2846) |
i(var=IMP_METHOD,ref=“Observed”)L2LMER | 2.959*** (0.3078) | 2.643*** (0.2569) |
i(var=IMP_METHOD,ref=“Observed”)L2PAN | -4.621*** (0.8489) | 0.0985 (0.1049) |
i(var=IMP_METHOD,ref=“Observed”)RQ | -3.921*** (0.6587) | 1.848*** (0.2552) |
i(var=IMP_METHOD,ref=“Observed”)PMM | -3.906*** (0.6933) | 1.749*** (0.2606) |
Fixed-Effects: | ——————- | ——————- |
GRADE^CONTENT_AREA | Yes | Yes |
________________________________________ | ___________________ | ___________________ |
S.E.: Clustered | by: GRA.^CON. | by: GRA.^CON. |
Observations | 96,075 | 51,849 |
R2 | 0.35972 | 0.18045 |
Within R2 | 0.33837 | 0.17236 |
\(~\)
We can also re-fit the scale score models using only observations from grades 5 through 8.
Scale Scores | SGPs | |
---|---|---|
N | 0.0012 (0.0022) | -0.0095*** (0.0017) |
MISS_PERC50%Missing | 1.062*** (0.0788) | 2.058*** (0.0652) |
MISS_PERC70%Missing | 2.274*** (0.1773) | 4.179*** (0.1362) |
MISS_TYPEDEMOG | 3.814*** (0.2469) | 2.487*** (0.1313) |
MISS_TYPEGROWTH | 6.015*** (0.3615) | 4.800*** (0.1986) |
i(var=IMP_METHOD,ref=“Observed”)L2LMER_LONG | -2.628*** (0.4167) | -1.274** (0.3270) |
i(var=IMP_METHOD,ref=“Observed”)L2PAN_LONG | -9.960*** (0.7401) | -4.664*** (0.4131) |
i(var=IMP_METHOD,ref=“Observed”)L2LMER | 0.9643*** (0.1011) | 2.140*** (0.1114) |
i(var=IMP_METHOD,ref=“Observed”)L2PAN | -8.215*** (0.3348) | -7.010*** (0.2442) |
i(var=IMP_METHOD,ref=“Observed”)RQ | -8.256*** (0.3002) | -5.750*** (0.2844) |
i(var=IMP_METHOD,ref=“Observed”)PMM | -8.317*** (0.2633) | -5.831*** (0.2838) |
Fixed-Effects: | —————— | ——————- |
GRADE^CONTENT_AREA | Yes | Yes |
________________________________________ | __________________ | ___________________ |
S.E.: Clustered | by: GRA.^CON. | by: GRA.^CON. |
Observations | 51,849 | 51,849 |
R2 | 0.33165 | 0.34586 |
Within R2 | 0.33026 | 0.34365 |
In these models, the data are aggregated at the school level.
Scale Scores | SGPs | |
---|---|---|
(Intercept) | 1.936*** (0.1775) | -0.3654* (0.1493) |
N | -0.0023*** (0.0002) | 0.0010*** (0.0001) |
MISS_PERC50%Missing | 2.044*** (0.1227) | -0.1587 (0.1032) |
MISS_PERC70%Missing | 4.494*** (0.1227) | -0.3560*** (0.1032) |
MISS_TYPEDEMOG | 5.696*** (0.1227) | -0.1420 (0.1032) |
MISS_TYPEGROWTH | 10.36*** (0.1227) | -0.0388 (0.1032) |
i(var=IMP_METHOD,ref=“Observed”)L2LMER_LONG | -0.7282*** (0.1874) | -0.3890* (0.1576) |
i(var=IMP_METHOD,ref=“Observed”)L2PAN_LONG | -6.651*** (0.1874) | -0.4009* (0.1576) |
i(var=IMP_METHOD,ref=“Observed”)L2LMER | 1.259*** (0.1874) | -0.3714* (0.1576) |
i(var=IMP_METHOD,ref=“Observed”)L2PAN | -5.680*** (0.1874) | 0.0527 (0.1576) |
i(var=IMP_METHOD,ref=“Observed”)RQ | -5.703*** (0.1874) | -0.2800. (0.1576) |
i(var=IMP_METHOD,ref=“Observed”)PMM | -5.694*** (0.1874) | -0.2982. (0.1576) |
________________________________________ | ___________________ | ___________________ |
S.E. type | Standard | Standard |
Observations | 14,616 | 14,616 |
R2 | 0.46063 | 0.00555 |
Adj. R2 | 0.46022 | 0.00480 |
Scale Scores | SGPs | |
---|---|---|
(Intercept) | 2.955*** (0.1578) | 1.437*** (0.0997) |
N | -0.0022*** (0.0002) | -0.0025*** (9.82e-5) |
MISS_PERC50%Missing | 2.423*** (0.1091) | 1.254*** (0.0689) |
MISS_PERC70%Missing | 5.280*** (0.1091) | 2.481*** (0.0689) |
MISS_TYPEDEMOG | 3.929*** (0.1091) | 0.4720*** (0.0689) |
MISS_TYPEGROWTH | 8.547*** (0.1091) | 0.3616*** (0.0689) |
i(var=IMP_METHOD,ref=“Observed”)L2LMER_LONG | 0.4437** (0.1666) | 2.295*** (0.1053) |
i(var=IMP_METHOD,ref=“Observed”)L2PAN_LONG | -5.029*** (0.1666) | 2.089*** (0.1053) |
i(var=IMP_METHOD,ref=“Observed”)L2LMER | 2.482*** (0.1666) | 2.368*** (0.1053) |
i(var=IMP_METHOD,ref=“Observed”)L2PAN | -5.526*** (0.1666) | 0.1478 (0.1053) |
i(var=IMP_METHOD,ref=“Observed”)RQ | -5.108*** (0.1666) | 1.668*** (0.1053) |
i(var=IMP_METHOD,ref=“Observed”)PMM | -5.108*** (0.1666) | 1.519*** (0.1053) |
________________________________________ | ___________________ | ____________________ |
S.E. type | Standard | Standard |
Observations | 14,616 | 14,616 |
R2 | 0.48342 | 0.17251 |
Adj. R2 | 0.48303 | 0.17188 |
Before honing in on the particular differences among the MI methods, a handful of general trends merit comment. First, across missingness types and percentages, the percent bias for scale scores is notably lower than the percent bias for SGPs. Second, there is a small but noticeable positive relationship between missingness percentage and percent bias; this relationship was more pronounced for scale scores than SGPs. Moreover, the scatterplots of percent bias and coverage rates as a function of grade/content area size (\(N\)) indicate that observations with smaller \(N\) were more likely to have higher percent bias on either scale scores or SGPs, as well as greater variation in the CI coverage rates. Finally, the bar plots indicate that there are higher proportions of significant differences between the true and imputed scale score values (based on the simplified \(F_1\) statistic) when data are MAR, particularly based on status and growth. The preliminary regression models also suggest that across the imputation methods, bias tends to increase as percentage missingness increases, as well as when data are MAR compared to MCAR (particularly using status and growth). However, note that the \(R^2\) values were relatively low, so caution is warranted when interpreting the model results.
Overall, the cross-sectional L2PAN method demonstrates the best performance among the examined methods. Specifically, L2PAN was most often able to maintain average percent bias for scales scores and SGPs below 5%, while simultaneously constructing confidence intervals with higher coverage rates. Additionally, L2PAN most often failed to reject the null hypothesis that the imputed and true values are equivalent. In other words, the other MI methods more often found statistically significant differences (using the simplified \(F_1\) statistic) between the imputed and true average scale score and SGP values. Finally, the regression models indicate that L2PAN is negatively related to scale score bias, suggesting reduced scale score bias compared to the “observed” condition. Relatedly, L2PAN was often not positively related to SGP bias.
The results provide growing evidence that L2PAN is a relatively effective MI method for the types of data and missingness examined thus far. Still, the above analyses indicate that in numerous cases, L2PAN is unable to generate relatively accurate pooled estimates. In this section, we take a deeper dive to identify the conditions wherein L2PAN performs well, as well as the conditions wherein L2PAN is unable to adequately recover the true mean scale scores or SGPs.
\(~\)
Figure 3.1 presents the average SS percent bias for the L2PAN method as a function of grade/content area size, missingness percentage and type, and grade. First, notice in this figure that the average percent bias never exceeds 5%, which is the threshold for poor MI efficacy posited by researchers like Qi and colleagues (2010) and Miri and colleagues (2020). There is a general trend wherein the SS percent bias decreases as the grade/content area size quantile increases, as well as a tendency for the percent bias to be greater when data are missing using status and growth compared to using status and demographics. The largest relative scale score percent bias with L2PAN occurs for grades three and four when 70% of the data are missing using status and growth.
Figures 3.2 replicates the above analysis, using average scale score coverage rate rather than percent bias.
\(~\)
The above figure indicates that the average coverage rate is greater than 0.90 in a large proportion of conditions. When data are missing based on status and growth for grades 3 and 4, the average scale score coverage rates are particularly low. The results above indicate that for observations in grades 3 and 4, simplified CIs for scale scores tend to be too liberal when data are MAR (particularly using growth rather than demographics); this relationship is exacerbated as the missingness percentage and grade/content area size quantile increases.
We may also use the (simplified) \(F_1\) statistic to flag observations indicating significant differences between the imputed and true values. The figure below plots the proportion of \(F_1\) statistics that result in a rejection of the null hypothesis for the scale scores (using a threshold of \(\alpha=0.1\)), where the null hypothesis is that the imputed and true values are equivalent. We find that cases in grades 3 and 4 when data are MAR (based on status and growth) more often indicate significant differences between the imputed and true values. Again, this more often occurs among larger grade/content area size quantiles.
\(~\)
Across the 13725 observations in the L2PAN imputed data, 97.8% of observations have an average SS percent bias less than 5%. Moreover, 59.8% of observations have a mean SS coverage rate greater than 0.90. We are particularly interested in whether there are similar characteristics among the observations that do not meet these percent bias and coverage rate thresholds when imputing with L2PAN.
There are 298 observations with SS percent bias greater than 5% using L2PAN. Similarly, there are 5003 observations with SS coverage rates less than 0.90. All together, there are 293 total observations with both SS percent bias greater than 5% and a coverage rate less than 0.90, as well as \(F_1\) statistics for which we reject the null hypothesis that the imputed and true values are equivalent. We’ll look at this last category of “flagged” observations in more detail.
\(~\)
\(~\)
We see that the majority of “flagged” cases tend to have grade/content area sizes less than 60. Variation in SS percent bias is also greater among observations with smaller grade/content area sizes. Additionally, these observations are overwhelmingly in grades 3 and 4 when data are missing based on status and growth, particularly for conditions with 50% and 70% of data missing.
\(~\)
Figure 3.7 shows the average SS percent bias for the L2PAN method at the school level, as a function of school size, missingness percentage, and missingness type. As with the grade/content area analyses, the maximum average SS percent bias is relatively low, here never exceeding 2.5%. We see higher average SS percent bias when data are MAR based on status and growth, and as the percentage missing increases. In some cases, percent bias also decreases slightly as the school size quantile increases.
Figure 3.8 below uses average SS coverage rate rather than percent bias. We see that the average coverage rates are concerningly low when data are MAR, particularly based on characteristics like status and growth. There is not a clear linear relationship between the school size quantile and the average CI coverage rate. For instance, when data are MAR based on status and growth, we find a U-shaped relationship between these two variables.
\(~\)
The figure below plot the proportion of \(F_1\) statistics that result in a rejection of the null hypothesis for the SS (using a threshold of \(\alpha=0.1\)). Again, we find this trend of relatively worse performance by L2PAN for data that are MAR based on status and growth, as well as higher missingness percentages.
\(~\)
Across the 2088 observations in the L2PAN imputed data at the school level, 99.7% of observations have an average SS percent bias less than 5%. Moreover, 38.6% of observations have a mean SS coverage rate greater than 0.90.
There are 7 observations with SS percent bias greater than 5% using L2PAN. Similarly, there are 1203 observations with SS coverage rates less than 0.90. Putting these conditions together, there are 7 total observations with both SS percent bias greater than 5% and a coverage rate less than 0.90, as well as \(F_1\) statistics for which we reject the null hypothesis that the imputed and true values are equivalent.
In this section, we fit a series of relatively simple classification models. The goal is to identify data features that are related to poor L2PAN performance, providing guidance as to when imputing data with this MI method may warrant caution. The models use logistic regression with up to two-way interactions among the predictors. The baseline model is an observation where the grade/content area or school size of zero and 30% of the data are MCAR.
The models are fit using the fixest
package. In the first model, the data are analyzed at the grade/content area level. Therefore, fixed effects for grade and content level are included in the model. The outcome is a binary variable where 1 indicates an SS percent bias greater than 5%, coverage rate less than 0.90, and \(F_1\) statistic p-value less than 0.10.
In the second model, the data are analyzed at the school level. Because there were so few “flagged” observations in these data (N = 7), the outcome variable is the simplified CI coverage rate.
Grade/Content Area: Flagged | School: CI Coverage Rates | |
---|---|---|
N | -0.1289*** (0.0077) | 0.0021*** (0.0005) |
MISS_TYPEDEMOG | 14.71*** (1.439) | -1.641*** (0.3311) |
MISS_TYPEGROWTH | 17.21*** (0.2905) | -3.722*** (0.3507) |
MISS_PERC50%Missing | 1.092* (0.4278) | -0.2605 (0.3837) |
MISS_PERC70%Missing | 0.6798 (0.5413) | -0.5595 (0.3716) |
i(var=N,f=MISS_TYPE)MCAR | 0.1267*** (0.0150) | -0.0011 (0.0007) |
i(var=N,f=MISS_TYPE)DEMOG | 0.0782* (0.0340) | -0.0016*** (0.0004) |
i(var=N,f=MISS_TYPE)GROWTH | 0.1240*** (0.0075) | |
i(var=N,f=MISS_PERC)30%Missing | 0.0146 (0.0122) | -0.0008 (0.0005) |
i(var=N,f=MISS_PERC)50%Missing | -0.0096* (0.0045) | -0.0003 (0.0005) |
i(var=MISS_TYPE,f=MISS_PERC)DEMOG x 30%Missing | -1.510. (0.8055) | 0.2962 (0.3899) |
i(var=MISS_TYPE,f=MISS_PERC)DEMOG x 50%Missing | -1.982* (0.7820) | 0.0075 (0.3797) |
i(var=MISS_TYPE,f=MISS_PERC)GROWTH x 30%Missing | -3.391*** (0.3375) | 0.3456 (0.3944) |
i(var=MISS_TYPE,f=MISS_PERC)GROWTH x 50%Missing | -1.344*** (0.1815) | 0.0592 (0.3919) |
(Intercept) | 2.448*** (0.3337) | |
Fixed-Effects: | ——————- | ——————- |
CONTENT_AREA^GRADE | Yes | No |
________________________________________ | ___________________ | ___________________ |
S.E. type | by: CON.^GRA. | Standard |
Convergence | FALSE | TRUE |
Observations | 6,318 | 2,088 |
Squared Cor. | 0.19628 | 0.44657 |
Pseudo R2 | 0.37202 | 0.34479 |
BIC | 1,647.0 | 1,910.7 |
\(~\)
Figure 3.13 shows the average SGP percent bias. As previously noted, average SGP percent bias is generally larger than that for scale scores. In many missingness percentage and type categorizations, the largest average SGP percent bias occurs for observations in eighth grade and in the first grade/content area size quantile. We again see a general trend wherein observations with smaller grade/content area size quantiles (i.e., in the first quantile) have larger average bias. The largest average SGP percent bias is approximately 29, occurring when 70% of the data are missing using status and growth, and observations are in the eighth grade and first quantile of grade/content area size.
\(~\)
Whereas the average SS coverage rate was lower than 0.5 for some conditions (see Figure 3.2), Figure 3.7 indicates relatively high average coverage rates across the examined conditions when analyzing the SGPs. For instance, the average CI coverage rate for the SGPs does not fall below 0.76. Therefore, although L2PAN tends to produce imputed SGP estimates with relatively higher percent bias compared to the imputed SS estimates, the confidence intervals for the SGP estimates more often contain the true SGP values when analyzing at the grade/content area level.
As shown in the figure below, statistically significant differences between the imputed and true mean SGPs were more often found when data were MAR based on status with growth. In some cases, there was also a tendency for higher proportions of rejected null hypotheses among higher grades.
\(~\)
Across the 13725 observations in the L2PAN imputed data, 32.3% of observations have an average SGP percent bias less than 5%. Moreover, 39.9% of observations have a mean SGP coverage rate greater than 0.90. Again, we’ll further examine the “flagged” observations, which have SGP percent bias greater than 5%, a simplified CI coverage rate less than 0.90, and statistically significant \(F_1\) test.
Here, there are 319 total observations that we may “flag” for which L2PAN performs relatively worse when imputing the mean SGP values at the grade/content area level.
\(~\)
\(~\)
\(~\)
As with the scale score imputations, L2PAN tends to generate more concerning SGP imputed values for smaller grade/content area sizes. The scatterplot further shows that observations with smaller grade/content area sizes tend to have greater variation in percent bias for the SGPs. The faceted bar plot above suggests that across the missingness type and percentage conditions, L2PAN tends to do worse in terms of the SGP imputation for grade six (and also often grade five). There is also some evidence that imputation efficacy decreases as missingness percentage increases, as well as for MAR compared to MCAR data.
\(~\)
In the above figure, we see that average SGP percent bias is higher among lower school size quantiles, higher missingness percentages, and when data are MAR. However, we find fewer clear trends when examining average SGP coverage rates (see below). Notably, the average coverage rates do not dip below 0.80 when analyzing the SGPs at the school level. Coverage rates tend to be lower for MAR compared to MCAR data.
\(~\)
The figure below plots the proportion of \(F_1\) statistics that result in a rejection of the null hypothesis for the SGPs (again using a threshold of \(\alpha=0.1\)). Notice in this figure that statistically significant differences between the imputed and true mean SGPs were more often found when data were MAR.
\(~\)
Across the 2088 observations in the L2PAN imputed data at the school level, 71.1% of observations have an average SGP percent bias less than 5%. Moreover, 64.4% of observations have a mean SGP coverage rate greater than 0.90.
Here, there are 176 total observations that we may “flag” for which L2PAN performs relatively worse when imputing the SGP values at the school level (again, based on a combination of the SGP percent bias, simplified CI coverage rate, and p-value for the \(F_1\) statistic).
\(~\)
\(~\)
\(~\)
We find that “flagged” observations at the school level are more likely to have smaller school sizes and occur when data are MAR (particularly based on status and growth). There is also a clear positive relationship between the number of “flagged” observations and the percentage of missingness, holding other factors constant.
We fit another series of logistic regressions with up to two-way interactions among the predictors. Here, the outcome for both the grade/content area and school level is a binary variable where 1 now indicates an SGP percent bias greater than 5%, coverage rate less than 0.90, and \(F_1\) statistic p-value less than 0.10.
Grade/Content Area | School | |
---|---|---|
N | -0.0491*** (0.0036) | -0.0005 (0.0007) |
MISS_TYPEDEMOG | 0.8894* (0.3914) | 0.4954 (0.5394) |
MISS_TYPEGROWTH | 1.200* (0.5243) | 0.3132 (0.5286) |
MISS_PERC50%Missing | 9.491*** (0.7164) | 1.692 (1.113) |
MISS_PERC70%Missing | 11.04*** (0.2832) | 2.396* (1.078) |
i(var=N,f=MISS_TYPE)MCAR | -0.0079 (0.0082) | -0.0024 (0.0015) |
i(var=N,f=MISS_TYPE)DEMOG | 9.73e-5 (0.0024) | -0.0009 (0.0009) |
i(var=N,f=MISS_PERC)30%Missing | 0.0504*** (0.0028) | -0.0014 (0.0012) |
i(var=N,f=MISS_PERC)50%Missing | 0.0507*** (0.0034) | -0.0001 (0.0009) |
i(var=N,f=MISS_PERC)70%Missing | 0.0509*** (0.0034) | |
i(var=MISS_TYPE,f=MISS_PERC)DEMOG x 30%Missing | 10.10*** (0.2562) | 1.802 (1.102) |
i(var=MISS_TYPE,f=MISS_PERC)DEMOG x 50%Missing | 1.205* (0.5523) | 0.5349 (0.5571) |
i(var=MISS_TYPE,f=MISS_PERC)GROWTH x 30%Missing | 10.66*** (0.2551) | 2.207* (1.091) |
i(var=MISS_TYPE,f=MISS_PERC)GROWTH x 50%Missing | 1.591** (0.5072) | 0.5959 (0.5525) |
(Intercept) | -4.308*** (1.081) | |
Fixed-Effects: | ——————- | —————– |
CONTENT_AREA^GRADE | Yes | No |
________________________________________ | ___________________ | _________________ |
S.E. type | by: CON.^GRA. | Standard |
Convergence | FALSE | TRUE |
Observations | 7,407 | 2,088 |
Squared Cor. | 0.03162 | 0.03581 |
Pseudo R2 | 0.10027 | 0.07425 |
BIC | 2,562.8 | 1,224.8 |
\(~\)
The current simulation was designed to address two related questions. First, can we determine whether MI is an appropriate method for creating “adjusted” scale scores and SGPs when data are missing? Second, if MI is indeed appropriate, which method is most effective and in what data contexts? The above results can be broadly summarized as follows:
The simulation results clearly demonstrate that L2PAN multiple imputation’s efficacy depends upon a nuanced interaction of different data characteristics. It is also likely possible that other factors (not accounted for in the current analyses) also contribute to MI’s performance in this context. For instance, recall that the \(R^2\) values for the exploratory linear regressions were relatively small. There may be additional interaction and polynomial effects on percent bias and CI coverage rates that warrant further exploration.
To address when MI is a plausible methodological option for generating adjusted 2021 scores, we make the following additional recommendations:
In summary, L2PAN is a promising mechanism for dealing with relatively low levels of missingness in scale scores or SGPs. As with any simulation, the results presented here can only be generalized to the simulation conditions examined. Moreover, future work should include diagnostic checks to examine the MI performance with a particular set of data. In other words, even if L2PAN imputation is implemented and is expected to work well (based on the given simulations), supplementary diagnostic analyses would help ensure that the imputation method is working as intended. Stuart and colleagues (2009), as well as Nguyen and colleagues (2017), provide helpful overviews of these diagnostics.