1 Simulation Overview

In a previous vignette, we examined the efficacy of multiple imputation (MI) for dealing with missing scale score and student growth percentile (SGP) data. A simulation was conducted wherein observations were systematically removed from a synthetic data set (from the SGPdata R package; Betebenner et al., 2021). The results indicated that in many contexts, the cross-sectional L2PAN imputation method is a viable approach for creating “adjusted” scale scores and SGPs. Importantly, L2PAN generally performed best in conditions of (a) lower missingness percentages, (b) data missing completely at random, and (c) larger school sizes.

The simulation was replicated to incorporate a COVID-19 impact within the synthetic data from SGPdata. This vignette summarizes the results of this “impact” simulation. As before, data were amputed with patterns of missing completely at random (MCAR), missing at random (MAR) based on status and growth, or MAR based on status and demographics. Moreover, either 30%, 50%, or 70% of the observations were systematically removed (although note that the missingness percentage could vary by school even within each of these three levels). Six imputation methods were compared, with some slight differences from the previous “without impact” simulation:

  • Cross-sectional multi-level modeling with pan (L2PAN);
  • Longitudinal multi-level modeling with pan (L2PAN_LONG);
  • Quantile regression (RQ);
  • Random forests (RF);
  • One-level predictive mean matching (PMM); and
  • Multi-level modeling with predictive mean matching (L2PMM).

Like the previous simulation, we also compared these methods to the “Observed” condition, when no imputation was done. All MI analyses were conducted using the mice package (van Buuren & Groothuis-Oudshoorn, 2011), with calls to corresponding R packages (e.g., pan [Zhao & Schafer, 2018]). Here, we focus on the ability of these MI methods to accurately impute either mean scale scores or SGPs.

This vignette structure largely mirrors the summary from the “without impact” simulation. We use the following three indices to operationalize MI performance:

  • Percent bias
    • The absolute value of the ratio of the raw bias (i.e., the average difference between the imputed and true values) to the average true value, multiplied by 100.
    • Ideally less than 5% (Miri et al., 2020; Qi et al., 2010).
  • Simplified confidence interval (CI) coverage rate
    • The proportion of times that the simplified CI contains the average true score; the simplified CI was proposed by Vink and van Buuren (2014) for cases where the complete data set can be considered the population.
    • Ideally around \(1 - \alpha\) (in this case, \(1 - 0.10 = 0.90\); Demirtas, 2004; Qi et al., 2010).
  • Simplified \(\mathbf{F}_1\) statistic
    • Tests the null hypothesis that the true and imputed values are equivalent (van Buuren, 2018; Vink & van Buuren, 2014).
    • A p-value greater than \(\alpha\) denotes a failure to reject the null hypothesis.

2 Imputation Method Comparison

We first examine the results across the various design factors (e.g., type and percentage of missingness, grade and content area, MI method, etc.). We hope to elucidate whether one or more MI methods outperforms the others in terms of reduced bias and high coverage rates.

2.1 Summary Tables

2.1.1 GC: MCAR

Table 2.1: Mean percent bias and confidence interval coverage rates for scale score (SS) and student growth percentiles (SGPs) with MCAR data, grade-content area level
L2PAN
L2PAN_LONG
RQ
RF
L2PMM
PMM
Observed
Percent Bias
CR
Percent Bias
CR
Percent Bias
CR
Percent Bias
CR
Percent Bias
CR
Percent Bias
CR
Percent Bias
CR
Grade SS SGP SS SGP SS SGP SS SGP SS SGP SS SGP SS SGP SS SGP SS SGP SS SGP SS SGP SS SGP SS SGP SS SGP
30% Missing
3 0.210 0.921 0.210 0.921 0.216 0.914 0.226 0.928 0.220 0.912 0.221 0.960 0.221 0.960
4 0.190 0.927 0.190 0.927 0.194 0.924 0.206 0.933 0.203 0.914 0.200 0.966 0.200 0.966
5 0.169 2.916 0.950 0.950 0.220 3.712 0.945 0.911 0.394 6.033 0.858 0.843 0.204 3.233 0.950 0.940 0.370 5.878 0.950 0.917 0.390 5.932 0.865 0.858 0.390 5.932 0.865 0.858
6 0.154 3.147 0.952 0.947 0.252 3.981 0.921 0.874 0.377 6.483 0.806 0.785 0.216 3.879 0.936 0.919 0.361 6.458 0.950 0.899 0.377 6.456 0.807 0.799 0.377 6.456 0.807 0.799
7 0.127 1.932 0.947 0.955 0.268 2.815 0.856 0.879 0.326 4.761 0.752 0.753 0.154 2.383 0.930 0.920 0.311 4.886 0.938 0.893 0.322 4.696 0.757 0.765 0.322 4.696 0.757 0.765
8 0.134 2.527 0.944 0.942 0.171 3.222 0.905 0.861 0.376 6.439 0.716 0.679 0.184 3.276 0.922 0.900 0.374 6.745 0.914 0.838 0.378 6.450 0.717 0.693 0.378 6.450 0.717 0.693
50% Missing
3 0.338 0.905 0.338 0.905 0.339 0.896 0.373 0.890 0.366 0.882 0.362 0.964 0.362 0.964
4 0.287 0.914 0.287 0.914 0.297 0.908 0.335 0.900 0.316 0.887 0.318 0.975 0.318 0.975
5 0.310 4.943 0.942 0.945 0.355 5.743 0.893 0.846 0.652 9.853 0.787 0.767 0.388 6.084 0.904 0.882 0.635 9.760 0.932 0.889 0.649 9.734 0.802 0.791 0.649 9.734 0.802 0.791
6 0.280 5.488 0.945 0.941 0.442 6.822 0.850 0.791 0.617 10.518 0.724 0.706 0.390 6.884 0.879 0.847 0.584 10.249 0.950 0.898 0.611 10.400 0.744 0.729 0.611 10.400 0.744 0.729
7 0.205 3.445 0.948 0.946 0.386 3.930 0.763 0.790 0.506 7.694 0.702 0.682 0.283 4.546 0.877 0.845 0.514 8.049 0.930 0.878 0.510 7.653 0.717 0.699 0.510 7.653 0.717 0.699
8 0.225 3.999 0.938 0.941 0.289 5.271 0.811 0.767 0.642 10.778 0.601 0.566 0.356 6.285 0.845 0.806 0.658 11.433 0.893 0.816 0.643 10.725 0.612 0.587 0.643 10.725 0.612 0.587
70% Missing
3 0.548 0.900 0.548 0.900 0.571 0.909 0.574 0.825 0.572 0.863 0.642 0.970 0.642 0.970
4 0.424 0.915 0.424 0.915 0.443 0.917 0.457 0.848 0.450 0.871 0.541 0.977 0.541 0.977
5 0.525 8.319 0.945 0.943 0.487 8.918 0.813 0.744 0.919 13.843 0.751 0.722 0.632 9.819 0.855 0.816 0.906 13.737 0.928 0.880 0.918 13.751 0.768 0.753 0.918 13.751 0.768 0.753
6 0.446 8.435 0.941 0.934 0.637 9.360 0.743 0.689 0.852 14.563 0.688 0.656 0.612 10.751 0.813 0.773 0.832 14.507 0.928 0.869 0.849 14.402 0.705 0.686 0.849 14.402 0.705 0.686
7 0.331 5.076 0.945 0.942 0.588 5.615 0.642 0.660 0.731 10.782 0.664 0.650 0.466 7.201 0.790 0.769 0.732 11.124 0.920 0.873 0.726 10.731 0.671 0.670 0.726 10.731 0.671 0.670
8 0.391 6.547 0.934 0.926 0.402 7.316 0.704 0.636 0.899 14.900 0.563 0.527 0.583 9.989 0.756 0.700 0.926 15.762 0.893 0.810 0.901 14.980 0.573 0.544 0.901 14.980 0.573 0.544

2.1.2 GC: Status with Demographics

Table 2.2: Mean percent bias and confidence interval coverage rates for scale score (SS) and student growth percentiles (SGPs) with MAR data (using status with demographics), grade-content area level
L2PAN
L2PAN_LONG
RQ
RF
L2PMM
PMM
Observed
Percent Bias
CR
Percent Bias
CR
Percent Bias
CR
Percent Bias
CR
Percent Bias
CR
Percent Bias
CR
Percent Bias
CR
Grade SS SGP SS SGP SS SGP SS SGP SS SGP SS SGP SS SGP SS SGP SS SGP SS SGP SS SGP SS SGP SS SGP SS SGP
30% Missing
3 0.606 0.848 0.606 0.848 0.598 0.832 0.693 0.793 0.599 0.831 0.620 0.904 0.620 0.904
4 0.524 0.847 0.524 0.847 0.528 0.837 0.636 0.779 0.536 0.825 0.521 0.923 0.521 0.923
5 0.264 4.456 0.943 0.902 0.483 6.118 0.865 0.726 0.405 6.390 0.840 0.819 0.289 4.526 0.918 0.886 0.481 6.502 0.958 0.906 0.395 6.230 0.855 0.854 0.395 6.230 0.855 0.854
6 0.222 4.238 0.928 0.889 0.381 5.773 0.849 0.706 0.366 6.527 0.800 0.789 0.260 4.714 0.907 0.878 0.416 6.708 0.966 0.913 0.365 6.572 0.816 0.812 0.365 6.572 0.816 0.812
7 0.221 3.916 0.917 0.890 0.426 5.616 0.740 0.691 0.344 5.642 0.766 0.728 0.217 3.422 0.896 0.870 0.435 5.947 0.949 0.875 0.345 5.577 0.776 0.748 0.345 5.577 0.776 0.748
8 0.207 3.466 0.924 0.911 0.324 5.063 0.810 0.691 0.446 7.391 0.680 0.664 0.266 4.799 0.860 0.823 0.487 7.830 0.930 0.849 0.446 7.396 0.697 0.696 0.446 7.396 0.697 0.696
50% Missing
3 1.102 0.799 1.102 0.799 1.104 0.775 1.248 0.681 1.106 0.775 1.163 0.882 1.163 0.882
4 0.924 0.797 0.924 0.797 0.941 0.788 1.095 0.670 0.939 0.773 0.947 0.907 0.947 0.907
5 0.420 6.936 0.946 0.921 0.878 9.262 0.772 0.667 0.652 10.177 0.788 0.768 0.529 7.742 0.868 0.842 0.812 10.324 0.954 0.903 0.642 10.043 0.810 0.801 0.642 10.043 0.810 0.801
6 0.375 6.879 0.924 0.898 0.609 9.333 0.768 0.603 0.608 10.568 0.722 0.704 0.462 8.006 0.837 0.809 0.671 10.619 0.958 0.886 0.607 10.605 0.744 0.730 0.607 10.605 0.744 0.730
7 0.359 5.764 0.917 0.888 0.671 8.276 0.629 0.566 0.545 8.666 0.700 0.673 0.364 5.586 0.839 0.811 0.671 8.780 0.959 0.885 0.546 8.628 0.714 0.700 0.546 8.628 0.714 0.700
8 0.343 5.702 0.925 0.909 0.557 8.325 0.692 0.560 0.685 11.446 0.601 0.579 0.452 7.883 0.776 0.742 0.778 12.047 0.935 0.829 0.696 11.527 0.620 0.610 0.696 11.527 0.620 0.610
70% Missing
3 1.715 0.772 1.715 0.772 1.754 0.746 1.963 0.574 1.741 0.747 1.797 0.865 1.797 0.865
4 1.452 0.782 1.452 0.782 1.491 0.768 1.689 0.587 1.503 0.750 1.549 0.899 1.549 0.899
5 0.644 9.981 0.956 0.938 1.428 12.162 0.664 0.617 0.916 14.049 0.765 0.729 0.830 11.344 0.809 0.800 1.175 13.956 0.949 0.901 0.904 13.905 0.782 0.764 0.904 13.905 0.782 0.764
6 0.575 10.122 0.925 0.907 0.865 13.694 0.657 0.485 0.862 14.388 0.678 0.655 0.714 11.610 0.773 0.747 0.971 14.324 0.958 0.884 0.851 14.396 0.698 0.677 0.851 14.396 0.698 0.677
7 0.529 8.140 0.914 0.896 0.862 10.936 0.538 0.450 0.781 12.022 0.659 0.629 0.588 8.402 0.768 0.741 0.876 11.539 0.958 0.876 0.783 11.995 0.683 0.652 0.783 11.995 0.683 0.652
8 0.565 9.367 0.911 0.895 0.857 12.834 0.556 0.445 0.943 15.907 0.584 0.544 0.720 11.890 0.699 0.651 1.045 15.983 0.935 0.813 0.956 15.865 0.588 0.563 0.956 15.865 0.588 0.563

2.1.3 GC: Status with Growth

Table 2.3: Mean percent bias and confidence interval coverage rates for scale score (SS) and student growth percentiles (SGPs) with MAR data (using status with growth), grade-content area level
L2PAN
L2PAN_LONG
RQ
RF
L2PMM
PMM
Observed
Percent Bias
CR
Percent Bias
CR
Percent Bias
CR
Percent Bias
CR
Percent Bias
CR
Percent Bias
CR
Percent Bias
CR
Grade SS SGP SS SGP SS SGP SS SGP SS SGP SS SGP SS SGP SS SGP SS SGP SS SGP SS SGP SS SGP SS SGP SS SGP
30% Missing
3 1.467 0.463 1.467 0.463 1.464 0.443 1.534 0.420 1.439 0.462 1.464 0.549 1.464 0.549
4 1.295 0.457 1.295 0.457 1.301 0.426 1.367 0.399 1.276 0.437 1.292 0.545 1.292 0.545
5 0.345 5.071 0.935 0.899 0.553 6.213 0.864 0.763 0.413 6.312 0.793 0.822 0.366 4.881 0.887 0.887 0.617 6.429 0.968 0.901 0.389 6.262 0.852 0.845 0.389 6.262 0.852 0.845
6 0.301 5.143 0.898 0.868 0.521 6.079 0.751 0.709 0.380 6.683 0.769 0.778 0.310 5.016 0.881 0.877 0.539 6.734 0.978 0.902 0.378 6.737 0.815 0.799 0.378 6.737 0.815 0.799
7 0.289 4.257 0.902 0.887 0.536 5.907 0.643 0.710 0.339 5.284 0.725 0.739 0.277 3.256 0.859 0.892 0.551 5.417 0.977 0.889 0.351 5.206 0.762 0.752 0.351 5.206 0.762 0.752
8 0.270 3.983 0.904 0.898 0.459 5.602 0.662 0.719 0.453 7.520 0.656 0.667 0.330 5.089 0.821 0.822 0.618 7.840 0.951 0.833 0.450 7.477 0.705 0.695 0.450 7.477 0.705 0.695
50% Missing
3 2.549 0.308 2.549 0.308 2.544 0.284 2.629 0.241 2.509 0.309 2.549 0.413 2.549 0.413
4 2.316 0.315 2.316 0.315 2.320 0.276 2.402 0.242 2.281 0.298 2.329 0.416 2.329 0.416
5 0.530 7.459 0.934 0.909 1.062 9.151 0.739 0.681 0.676 9.911 0.727 0.769 0.649 7.560 0.817 0.849 1.047 9.889 0.956 0.882 0.641 9.833 0.806 0.788 0.641 9.833 0.806 0.788
6 0.482 7.994 0.890 0.873 0.813 9.988 0.674 0.617 0.623 10.651 0.692 0.712 0.544 8.280 0.811 0.817 0.810 10.378 0.975 0.889 0.609 10.636 0.748 0.732 0.609 10.636 0.748 0.732
7 0.468 6.852 0.880 0.874 0.837 9.566 0.548 0.572 0.553 8.662 0.665 0.668 0.496 5.633 0.756 0.813 0.819 8.253 0.976 0.886 0.552 8.461 0.698 0.680 0.552 8.461 0.698 0.680
8 0.445 6.624 0.906 0.902 0.754 9.569 0.552 0.564 0.717 12.262 0.583 0.575 0.572 8.655 0.729 0.741 0.940 12.625 0.955 0.812 0.715 12.238 0.620 0.597 0.715 12.238 0.620 0.597
70% Missing
3 3.892 0.236 3.892 0.236 3.892 0.208 3.993 0.148 3.886 0.229 3.905 0.316 3.905 0.316
4 3.626 0.246 3.626 0.246 3.631 0.205 3.722 0.149 3.602 0.232 3.691 0.302 3.691 0.302
5 0.861 10.692 0.929 0.915 2.442 12.843 0.528 0.588 1.040 13.696 0.664 0.730 1.157 10.940 0.718 0.809 1.584 13.426 0.948 0.881 0.990 13.556 0.753 0.754 0.990 13.556 0.753 0.754
6 0.766 11.189 0.889 0.892 1.159 15.106 0.585 0.491 0.918 14.764 0.627 0.659 0.878 11.843 0.716 0.758 1.193 14.352 0.975 0.870 0.865 14.703 0.687 0.678 0.865 14.703 0.687 0.678
7 0.674 9.274 0.889 0.886 1.148 13.562 0.454 0.413 0.784 12.000 0.620 0.627 0.818 8.386 0.657 0.758 1.082 11.227 0.983 0.865 0.790 11.879 0.665 0.646 0.790 11.879 0.665 0.646
8 0.704 10.252 0.893 0.877 1.130 15.104 0.455 0.440 0.969 16.683 0.572 0.543 0.900 12.410 0.636 0.667 1.320 16.337 0.966 0.806 0.970 16.682 0.596 0.544 0.970 16.682 0.596 0.544

2.1.4 School Level

Table 2.4: Mean percent bias and confidence interval coverage rates for scale score (SS) and student growth percentiles (SGPs) at the school level
L2PAN
L2PAN_LONG
RQ
RF
L2PMM
PMM
Observed
Percent Bias
CR
Percent Bias
CR
Percent Bias
CR
Percent Bias
CR
Percent Bias
CR
Percent Bias
CR
Percent Bias
CR
Percent Missing SS SGP SS SGP SS SGP SS SGP SS SGP SS SGP SS SGP SS SGP SS SGP SS SGP SS SGP SS SGP SS SGP SS SGP
MCAR
30% Missing 0.077 1.963 0.928 0.925 0.101 2.559 0.878 0.840 0.169 4.477 0.766 0.672 0.108 2.403 0.888 0.857 0.164 4.463 0.880 0.794 0.168 4.427 0.808 0.681 0.168 4.427 0.808 0.681
50% Missing 0.144 3.519 0.913 0.908 0.170 4.163 0.829 0.736 0.290 7.438 0.700 0.570 0.203 4.793 0.813 0.742 0.275 7.443 0.846 0.753 0.302 7.379 0.770 0.586 0.302 7.379 0.770 0.586
70% Missing 0.248 5.772 0.886 0.883 0.254 6.273 0.769 0.619 0.435 10.371 0.683 0.524 0.327 7.587 0.716 0.651 0.412 10.505 0.822 0.736 0.477 10.312 0.741 0.538 0.477 10.312 0.741 0.538
MAR (Status with Demographics)
30% Missing 0.291 3.051 0.720 0.852 0.388 4.769 0.597 0.597 0.319 4.899 0.601 0.659 0.362 3.374 0.574 0.789 0.424 5.132 0.669 0.783 0.316 4.878 0.696 0.674 0.316 4.878 0.696 0.674
50% Missing 0.513 5.044 0.635 0.848 0.696 7.427 0.472 0.487 0.563 7.895 0.503 0.579 0.646 5.915 0.429 0.703 0.725 7.998 0.598 0.766 0.567 7.911 0.611 0.601 0.567 7.911 0.611 0.601
70% Missing 0.808 7.426 0.607 0.852 1.098 10.027 0.394 0.426 0.879 10.847 0.434 0.536 1.020 8.633 0.326 0.629 1.101 10.757 0.539 0.756 0.886 10.870 0.534 0.556 0.886 10.870 0.534 0.556
MAR (Status with Growth)
30% Missing 0.669 3.446 0.371 0.843 0.737 4.549 0.286 0.639 0.693 4.879 0.276 0.650 0.760 3.500 0.263 0.794 0.873 4.985 0.353 0.776 0.668 4.892 0.340 0.667 0.668 4.892 0.340 0.667
50% Missing 1.197 5.548 0.295 0.825 1.353 7.421 0.186 0.523 1.227 7.885 0.181 0.567 1.345 5.946 0.163 0.706 1.495 7.793 0.267 0.754 1.188 7.887 0.234 0.585 1.188 7.887 0.234 0.585
70% Missing 1.905 7.853 0.249 0.828 2.308 10.771 0.129 0.410 1.948 10.834 0.132 0.522 2.152 8.667 0.086 0.638 2.312 10.597 0.206 0.734 1.905 10.818 0.159 0.534 1.905 10.818 0.159 0.534

2.2 Summary Figures: Grade/Content Area

2.2.1 Scale Scores

Figure 2.1: Scale score percent bias by imputation method, missingness percentage, and missingness type

Scale score percent bias by imputation method, missingness percentage, and missingness type

\(~\)

Figure 2.2: Scale score coverage rate by imputation method, missingness percentage, and missingness type

Scale score coverage rate by imputation method, missingness percentage, and missingness type

\(~\)

Figure 2.3: Scatterplot of scale score percent bias as a function of grade/content area size

Scatterplot of scale score percent bias as a function of grade/content area size

\(~\)

Figure 2.4: Scatterplot of scale score percent bias as a function of percentage missing at the grade/content area level

Scatterplot of scale score percent bias as a function of percentage missing at the grade/content area level

\(~\)

Figure 2.5: Scatterplot of scale score coverage rate as a function of grade/content area size

Scatterplot of scale score coverage rate as a function of grade/content area size

\(~\)

Figure 2.6: Proportion of times that the imputed SS was found to differ from the true value based on the simplified F1 statistic

Proportion of times that the imputed SS was found to differ from the true value based on the simplified F1 statistic

\(~\)

Figure 2.7: Density plot of rejected null hypotheses for the simplified F1 statistic as a function of grade/content area size

Density plot of rejected null hypotheses for the simplified F1 statistic as a function of grade/content area size

\(~\)

Figure 2.8: Density plot of rejected null hypotheses for the simplified F1 statistic as a function of percent missing

Density plot of rejected null hypotheses for the simplified F1 statistic as a function of percent missing

2.2.2 SGPs

Figure 2.9: SGP percent bias by imputation method, missingness percentage, and missingness type

SGP percent bias by imputation method, missingness percentage, and missingness type

\(~\)

Figure 2.10: SGP coverage rate by imputation method, missingness percentage, and missingness type

SGP coverage rate by imputation method, missingness percentage, and missingness type

\(~\)

Figure 2.11: Scatterplot of SGP percent bias as a function of grade/content area size

Scatterplot of SGP percent bias as a function of grade/content area size

\(~\)

Figure 2.12: Scatterplot of SGP percent bias as a function of percentage missing at the grade/content area level

Scatterplot of SGP percent bias as a function of percentage missing at the grade/content area level

\(~\)

Figure 2.13: Scatterplot of SGP coverage rate as a function of grade/content area size

Scatterplot of SGP coverage rate as a function of grade/content area size

\(~\)

Figure 2.14: Proportion of times that the imputed SGP was found to differ from the true value based on the simplified F1 statistic

Proportion of times that the imputed SGP was found to differ from the true value based on the simplified F1 statistic

\(~\)

Figure 2.15: Density plot of rejected null hypotheses for the simplified F1 statistic as a function of grade/content area size

Density plot of rejected null hypotheses for the simplified F1 statistic as a function of grade/content area size

\(~\)

Figure 2.16: Density plot of rejected null hypotheses for the simplified F1 statistic as a function of percent missing

Density plot of rejected null hypotheses for the simplified F1 statistic as a function of percent missing

2.3 Summary Figures: School Level

We replicate the above figures when looking at aggregated school-level results.

2.3.1 Scale Scores

Figure 2.17: Scale score percent bias by imputation method, missingness percentage, and missingness type

Scale score percent bias by imputation method, missingness percentage, and missingness type

\(~\)

Figure 2.18: Scale score coverage rate by imputation method, missingness percentage, and missingness type

Scale score coverage rate by imputation method, missingness percentage, and missingness type

\(~\)

Figure 2.19: Scatterplot of scale score percent bias as a function of school size

Scatterplot of scale score percent bias as a function of school size

\(~\)

Figure 2.20: Scatterplot of scale score percent bias as a function of percentage missing at the school level

Scatterplot of scale score percent bias as a function of percentage missing at the school level

\(~\)

Figure 2.21: Scatterplot of scale score coverage rate as a function of school size

Scatterplot of scale score coverage rate as a function of school size

\(~\)

Figure 2.22: Proportion of times that the imputed SS was found to differ from the true value based on the simplified F1 statistic

Proportion of times that the imputed SS was found to differ from the true value based on the simplified F1 statistic

\(~\)

Figure 2.23: Density plot of rejected null hypotheses for the simplified F1 statistic as a function of school size

Density plot of rejected null hypotheses for the simplified F1 statistic as a function of school size

\(~\)

Figure 2.24: Density plot of rejected null hypotheses for the simplified F1 statistic as a function of percent missing

Density plot of rejected null hypotheses for the simplified F1 statistic as a function of percent missing

2.3.2 SGPs

Figure 2.25: SGP percent bias by imputation method, missingness percentage, and missingness type

SGP percent bias by imputation method, missingness percentage, and missingness type

\(~\)

Figure 2.26: SGP coverage rate by imputation method, missingness percentage, and missingness type

SGP coverage rate by imputation method, missingness percentage, and missingness type

\(~\)

Figure 2.27: Scatterplot of SGP percent bias as a function of school size

Scatterplot of SGP percent bias as a function of school size

\(~\)

Figure 2.28: Scatterplot of SGP percent bias as a function of percentage missing at the school level

Scatterplot of SGP percent bias as a function of percentage missing at the school level

\(~\)

Figure 2.29: Scatterplot of SGP coverage rate as a function of school size

Scatterplot of SGP coverage rate as a function of school size

\(~\)

Figure 2.30: Proportion of times that the imputed SGP was found to differ from the true value based on the simplified F1 statistic

Proportion of times that the imputed SGP was found to differ from the true value based on the simplified F1 statistic

\(~\)

Figure 2.31: Density plot of rejected null hypotheses for the simplified F1 statistic as a function of school size

Density plot of rejected null hypotheses for the simplified F1 statistic as a function of school size

\(~\)

Figure 2.32: Density plot of rejected null hypotheses for the simplified F1 statistic as a function of percent missing

Density plot of rejected null hypotheses for the simplified F1 statistic as a function of percent missing

2.4 Basic Regression Models

The following models are preliminary fixed-effects regression models, regressing either raw or absolute bias on (a) school or grade/content area size, (b) percentage missing, (c) missingness type, and (d) imputation method. We include both additive and two-way interaction models. The following models are fit using the fixest package (Berge, 2018).

Note: The \(R^2\) values for the subsequent models are relatively low. Therefore, inferences should be drawn from these models with caution.

2.4.1 Grade/Content Area

Table 2.5: Linear fixed-effect regression models for raw bias at the grade/content area level
Scale Scores: Additive Scale Scores: Interaction SGPs: Additive SGPs: Interaction
N 0.0011 (0.0014) 0.0308** (0.0084) 0.0047* (0.0017) 0.0006 (0.0021)
MISS_PERC50%Missing 1.573*** (0.2930) 1.433*** (0.2609) -0.1290* (0.0481) -0.1505* (0.0562)
MISS_PERC70%Missing 3.645*** (0.6291) 2.857*** (0.5872) -0.2664* (0.0849) -0.2885** (0.0732)
MISS_TYPEDEMOG 3.764*** (0.5795) 7.324*** (0.3485) -0.0401 (0.0638) 0.0451 (0.2141)
MISS_TYPEGROWTH 7.886*** (1.499) 12.38*** (0.6266) 0.0396 (0.0853) 0.2381 (0.3322)
i(var=IMP_METHOD,ref=“Observed”)L2PAN_LONG -4.267*** (0.8048) 2.452*** (0.4414) 0.0022 (0.2487) 0.0467 (0.1721)
i(var=IMP_METHOD,ref=“Observed”)L2PAN -4.787*** (0.9246) 2.328*** (0.3280) -0.0994 (0.2040) 0.0329 (0.0785)
i(var=IMP_METHOD,ref=“Observed”)RQ -4.838*** (0.9402) 2.034*** (0.2855) -0.2937 (0.2580) -0.3277 (0.1804)
i(var=IMP_METHOD,ref=“Observed”)RF -4.224*** (0.8953) 2.080*** (0.2746) -0.2165 (0.2073) -0.1185 (0.1214)
i(var=IMP_METHOD,ref=“Observed”)L2PMM -3.889*** (0.6558) 1.738*** (0.2426) -0.4286 (0.2323) -0.4464* (0.1862)
i(var=IMP_METHOD,ref=“Observed”)PMM -4.934*** (0.9876) 2.128*** (0.3095) -0.4117 (0.2600) -0.3897. (0.1901)
N x MISS_PERC50%Missing -0.0039* (0.0016) 0.0016. (0.0007)
N x MISS_PERC70%Missing -0.0095* (0.0033) 0.0033* (0.0013)
N x MISS_TYPEDEMOG -0.0103* (0.0035) 0.0007 (0.0011)
N x MISS_TYPEGROWTH -0.0277** (0.0081) 5.25e-5 (0.0014)
N x i(IMP_METHOD,ref=“Observed”)L2PAN_LONG -0.0212*** (0.0038) -0.0003 (0.0024)
N x i(IMP_METHOD,ref=“Observed”)L2PAN -0.0152* (0.0050) 0.0009 (0.0014)
N x i(IMP_METHOD,ref=“Observed”)RQ -0.0129* (0.0053) 0.0042 (0.0023)
N x i(IMP_METHOD,ref=“Observed”)RF -0.0150** (0.0047) 0.0015 (0.0017)
N x i(IMP_METHOD,ref=“Observed”)L2PMM -0.0107* (0.0038) 0.0046. (0.0021)
N x i(IMP_METHOD,ref=“Observed”)PMM -0.0129* (0.0055) 0.0044 (0.0025)
MISS_PERC50%Missing x MISS_TYPEDEMOG 1.608*** (0.2596) 0.0029 (0.0367)
MISS_PERC70%Missing x MISS_TYPEDEMOG 3.498*** (0.5972) -0.0876 (0.1074)
MISS_PERC50%Missing x MISS_TYPEGROWTH 3.273*** (0.6279) -0.0061 (0.0446)
MISS_PERC70%Missing x MISS_TYPEGROWTH 7.555*** (1.338) -0.0301 (0.1054)
MISS_PERC50%Missing x i(IMP_METHOD,ref=“Observed”)L2PAN_LONG -1.337** (0.3180) 0.0062 (0.0785)
MISS_PERC70%Missing x i(IMP_METHOD,ref=“Observed”)L2PAN_LONG -2.132* (0.6987) 0.0268 (0.1653)
MISS_PERC50%Missing x i(IMP_METHOD,ref=“Observed”)L2PAN -1.596** (0.3746) -0.0518 (0.0733)
MISS_PERC70%Missing x i(IMP_METHOD,ref=“Observed”)L2PAN -3.133** (0.8251) -0.1188 (0.1581)
MISS_PERC50%Missing x i(IMP_METHOD,ref=“Observed”)RQ -1.633** (0.3864) -0.1083 (0.0914)
MISS_PERC70%Missing x i(IMP_METHOD,ref=“Observed”)RQ -3.141** (0.8506) -0.2158 (0.2051)
MISS_PERC50%Missing x i(IMP_METHOD,ref=“Observed”)RF -1.399** (0.3562) -0.0979 (0.0726)
MISS_PERC70%Missing x i(IMP_METHOD,ref=“Observed”)RF -2.581** (0.7695) -0.1919 (0.1605)
MISS_PERC50%Missing x i(IMP_METHOD,ref=“Observed”)L2PMM -1.325*** (0.2875) -0.1656. (0.0861)
MISS_PERC70%Missing x i(IMP_METHOD,ref=“Observed”)L2PMM -2.501** (0.6670) -0.2828 (0.1611)
MISS_PERC50%Missing x i(IMP_METHOD,ref=“Observed”)PMM -1.677** (0.4093) -0.1557 (0.0923)
MISS_PERC70%Missing x i(IMP_METHOD,ref=“Observed”)PMM -3.236** (0.9079) -0.3005 (0.2025)
MISS_TYPEDEMOG x i(IMP_METHOD,ref=“Observed”)L2PAN_LONG -5.343*** (0.7582) -0.0751 (0.2407)
MISS_TYPEGROWTH x i(IMP_METHOD,ref=“Observed”)L2PAN_LONG -7.942** (1.851) -0.0362 (0.3300)
MISS_TYPEDEMOG x i(IMP_METHOD,ref=“Observed”)L2PAN -5.929*** (0.8663) -0.1344 (0.2245)
MISS_TYPEGROWTH x i(IMP_METHOD,ref=“Observed”)L2PAN -8.249*** (1.858) -0.2694 (0.3099)
MISS_TYPEDEMOG x i(IMP_METHOD,ref=“Observed”)RQ -5.799*** (0.8946) -0.1351 (0.2286)
MISS_TYPEGROWTH x i(IMP_METHOD,ref=“Observed”)RQ -7.976** (1.809) -0.2578 (0.3292)
MISS_TYPEDEMOG x i(IMP_METHOD,ref=“Observed”)RF -5.195*** (0.9000) -0.1042 (0.2076)
MISS_TYPEGROWTH x i(IMP_METHOD,ref=“Observed”)RF -7.333** (1.677) -0.1972 (0.3192)
MISS_TYPEDEMOG x i(IMP_METHOD,ref=“Observed”)L2PMM -4.771*** (0.5485) -0.1231 (0.2142)
MISS_TYPEGROWTH x i(IMP_METHOD,ref=“Observed”)L2PMM -6.562*** (1.306) -0.2610 (0.3138)
MISS_TYPEDEMOG x i(IMP_METHOD,ref=“Observed”)PMM -5.929*** (0.9740) -0.1522 (0.2316)
MISS_TYPEGROWTH x i(IMP_METHOD,ref=“Observed”)PMM -8.270** (1.921) -0.3071 (0.3324)
Fixed-Effects: —————— ——————- —————– ——————
GRADE^CONTENT_AREA Yes Yes Yes Yes
________________________________________ __________________ ___________________ _________________ __________________
S.E.: Clustered by: GRA.^CON. by: GRA.^CON. by: GRA.^CON. by: GRA.^CON.
Observations 97,146 97,146 52,542 52,542
R2 0.34190 0.40344 0.00445 0.00536
Within R2 0.28282 0.34988 0.00349 0.00440

\(~\)

Table 2.6: Linear fixed-effect regression models for absolute bias at the grade/content area level
Scale Scores: Additive Scale Scores: Interaction SGPs: Additive SGPs: Interaction
N -0.0085*** (0.0014) 0.0179* (0.0065) -0.0146*** (0.0012) -0.0099** (0.0025)
MISS_PERC50%Missing 2.070*** (0.2204) 2.128*** (0.2390) 1.443*** (0.0610) 1.145*** (0.0262)
MISS_PERC70%Missing 4.670*** (0.5063) 4.365*** (0.5163) 3.047*** (0.1319) 2.665*** (0.0891)
MISS_TYPEDEMOG 2.683*** (0.4104) 5.761*** (0.3552) 0.7656*** (0.0493) 1.442*** (0.1116)
MISS_TYPEGROWTH 6.674*** (1.344) 10.80*** (0.6290) 1.046*** (0.0810) 2.505*** (0.1763)
i(var=IMP_METHOD,ref=“Observed”)L2PAN_LONG -3.407*** (0.5026) 1.810*** (0.3659) 0.9601*** (0.1582) 0.6862*** (0.1235)
i(var=IMP_METHOD,ref=“Observed”)L2PAN -4.251*** (0.7546) 1.997*** (0.2945) 0.0555 (0.0931) 0.6699*** (0.1036)
i(var=IMP_METHOD,ref=“Observed”)RQ -3.606*** (0.5816) 2.552*** (0.4868) 1.682*** (0.2676) 2.207*** (0.2058)
i(var=IMP_METHOD,ref=“Observed”)RF -3.786*** (0.7501) 2.127*** (0.3354) 0.3773. (0.1873) 1.085*** (0.1501)
i(var=IMP_METHOD,ref=“Observed”)L2PMM -3.274*** (0.4794) 2.260*** (0.4296) 1.679*** (0.2785) 2.225*** (0.2329)
i(var=IMP_METHOD,ref=“Observed”)PMM -3.569*** (0.6125) 2.624*** (0.4728) 1.679*** (0.2739) 2.195*** (0.2115)
N x MISS_PERC50%Missing -0.0053*** (0.0012) -0.0037*** (0.0005)
N x MISS_PERC70%Missing -0.0123*** (0.0028) -0.0082*** (0.0011)
N x MISS_TYPEDEMOG -0.0042. (0.0023) -0.0016* (0.0005)
N x MISS_TYPEGROWTH -0.0191* (0.0071) -0.0037** (0.0007)
N x i(IMP_METHOD,ref=“Observed”)L2PAN_LONG -0.0162*** (0.0023) 0.0021 (0.0016)
N x i(IMP_METHOD,ref=“Observed”)L2PAN -0.0163** (0.0038) 0.0006 (0.0011)
N x i(IMP_METHOD,ref=“Observed”)RQ -0.0138** (0.0031) 0.0012 (0.0023)
N x i(IMP_METHOD,ref=“Observed”)RF -0.0174*** (0.0039) -0.0004 (0.0018)
N x i(IMP_METHOD,ref=“Observed”)L2PMM -0.0111** (0.0028) 0.0024 (0.0025)
N x i(IMP_METHOD,ref=“Observed”)PMM -0.0144** (0.0033) 0.0012 (0.0025)
MISS_PERC50%Missing x MISS_TYPEDEMOG 1.137*** (0.2015) 0.2287*** (0.0262)
MISS_PERC70%Missing x MISS_TYPEDEMOG 2.451*** (0.4405) 0.3606** (0.0896)
MISS_PERC50%Missing x MISS_TYPEGROWTH 2.696*** (0.6013) 0.3029*** (0.0479)
MISS_PERC70%Missing x MISS_TYPEGROWTH 6.167*** (1.265) 0.5772** (0.1101)
MISS_PERC50%Missing x i(IMP_METHOD,ref=“Observed”)L2PAN_LONG -1.139*** (0.2159) 0.2995* (0.1037)
MISS_PERC70%Missing x i(IMP_METHOD,ref=“Observed”)L2PAN_LONG -1.827** (0.5001) 0.5383* (0.2028)
MISS_PERC50%Missing x i(IMP_METHOD,ref=“Observed”)L2PAN -1.482*** (0.3179) -0.0259 (0.0575)
MISS_PERC70%Missing x i(IMP_METHOD,ref=“Observed”)L2PAN -2.735** (0.6957) 0.0064 (0.0866)
MISS_PERC50%Missing x i(IMP_METHOD,ref=“Observed”)RQ -1.223*** (0.2487) 0.6679*** (0.1114)
MISS_PERC70%Missing x i(IMP_METHOD,ref=“Observed”)RQ -2.316** (0.6035) 1.097** (0.2132)
MISS_PERC50%Missing x i(IMP_METHOD,ref=“Observed”)RF -1.245** (0.2945) 0.2673* (0.0951)
MISS_PERC70%Missing x i(IMP_METHOD,ref=“Observed”)RF -2.230** (0.6350) 0.5072* (0.1901)
MISS_PERC50%Missing x i(IMP_METHOD,ref=“Observed”)L2PMM -1.092*** (0.2144) 0.6307*** (0.1162)
MISS_PERC70%Missing x i(IMP_METHOD,ref=“Observed”)L2PMM -2.030** (0.5285) 0.9582** (0.2297)
MISS_PERC50%Missing x i(IMP_METHOD,ref=“Observed”)PMM -1.190*** (0.2681) 0.6675*** (0.1159)
MISS_PERC70%Missing x i(IMP_METHOD,ref=“Observed”)PMM -2.217** (0.6549) 1.097** (0.2181)
MISS_TYPEDEMOG x i(IMP_METHOD,ref=“Observed”)L2PAN_LONG -4.116*** (0.4859) 0.0917 (0.1576)
MISS_TYPEGROWTH x i(IMP_METHOD,ref=“Observed”)L2PAN_LONG -5.965*** (1.269) -0.5213* (0.2102)
MISS_TYPEDEMOG x i(IMP_METHOD,ref=“Observed”)L2PAN -4.801*** (0.6665) -0.6722*** (0.1123)
MISS_TYPEGROWTH x i(IMP_METHOD,ref=“Observed”)L2PAN -7.112** (1.604) -1.273*** (0.1917)
MISS_TYPEDEMOG x i(IMP_METHOD,ref=“Observed”)RQ -5.068*** (0.7507) -1.298*** (0.1347)
MISS_TYPEGROWTH x i(IMP_METHOD,ref=“Observed”)RQ -7.660** (1.753) -2.276*** (0.1951)
MISS_TYPEDEMOG x i(IMP_METHOD,ref=“Observed”)RF -4.506*** (0.7761) -0.9273*** (0.1157)
MISS_TYPEGROWTH x i(IMP_METHOD,ref=“Observed”)RF -6.971** (1.635) -1.883*** (0.1616)
MISS_TYPEDEMOG x i(IMP_METHOD,ref=“Observed”)L2PMM -4.754*** (0.6372) -1.308*** (0.1173)
MISS_TYPEGROWTH x i(IMP_METHOD,ref=“Observed”)L2PMM -6.948*** (1.475) -2.383*** (0.1520)
MISS_TYPEDEMOG x i(IMP_METHOD,ref=“Observed”)PMM -5.091*** (0.7480) -1.275*** (0.1372)
MISS_TYPEGROWTH x i(IMP_METHOD,ref=“Observed”)PMM -7.767*** (1.749) -2.261*** (0.1950)
Fixed-Effects: ——————- ——————- ——————- ——————-
GRADE^CONTENT_AREA Yes Yes Yes Yes
________________________________________ ___________________ ___________________ ___________________ ___________________
S.E.: Clustered by: GRA.^CON. by: GRA.^CON. by: GRA.^CON. by: GRA.^CON.
Observations 97,146 97,146 52,542 52,542
R2 0.33276 0.38934 0.18317 0.19723
Within R2 0.29642 0.35607 0.17733 0.19149

\(~\)

We can also re-fit the scale score models using only observations from grades 5 through 8.

Table 2.7: Linear fixed-effect regression models for raw and absolute scale score bias when removing grades 3 and 4
Scale Score Raw Bias: Additive Scale Score Raw Bias: Interaction Scale Score Absolute Bias: Additive Scale Score Absolute Bias: Interaction
N 0.0019 (0.0014) 0.0067. (0.0029) -0.0080*** (0.0013) 0.0002 (0.0018)
MISS_PERC50%Missing 0.7460*** (0.0896) 2.059*** (0.1201) 1.456*** (0.0690) 2.733*** (0.0768)
MISS_PERC70%Missing 1.854*** (0.2536) 4.262*** (0.2860) 3.254*** (0.1980) 5.647*** (0.2431)
MISS_TYPEDEMOG 2.147*** (0.1595) 8.023*** (0.2453) 1.548*** (0.0822) 6.570*** (0.2140)
MISS_TYPEGROWTH 3.656*** (0.3730) 13.10*** (0.5265) 2.917*** (0.1585) 11.67*** (0.5044)
i(var=IMP_METHOD,ref=“Observed”)L2PAN_LONG -6.449*** (0.7746) 2.986*** (0.4177) -4.823*** (0.3294) 2.392*** (0.4104)
i(var=IMP_METHOD,ref=“Observed”)L2PAN -7.411*** (0.2719) 2.719*** (0.1721) -6.383*** (0.2014) 2.460*** (0.1342)
i(var=IMP_METHOD,ref=“Observed”)RQ -7.501*** (0.2374) 2.310*** (0.1512) -5.227*** (0.2408) 3.721*** (0.2423)
i(var=IMP_METHOD,ref=“Observed”)RF -6.774*** (0.2405) 2.358*** (0.1315) -5.917*** (0.2000) 2.678*** (0.2268)
i(var=IMP_METHOD,ref=“Observed”)L2PMM -5.747*** (0.2659) 2.008*** (0.1562) -4.595*** (0.2846) 3.298*** (0.2499)
i(var=IMP_METHOD,ref=“Observed”)PMM -7.725*** (0.2192) 2.483*** (0.1346) -5.269*** (0.2473) 3.742*** (0.2406)
N x MISS_PERC50%Missing 0.0002 (0.0008) -0.0023** (0.0005)
N x MISS_PERC70%Missing -0.0009 (0.0020) -0.0055** (0.0014)
N x MISS_TYPEDEMOG -0.0019 (0.0015) 0.0009 (0.0009)
N x MISS_TYPEGROWTH -0.0059. (0.0029) -0.0007 (0.0017)
N x i(IMP_METHOD,ref=“Observed”)L2PAN_LONG -0.0117* (0.0043) -0.0101*** (0.0015)
N x i(IMP_METHOD,ref=“Observed”)L2PAN -0.0020 (0.0015) -0.0059** (0.0012)
N x i(IMP_METHOD,ref=“Observed”)RQ 0.0012 (0.0013) -0.0059* (0.0019)
N x i(IMP_METHOD,ref=“Observed”)RF -0.0019 (0.0017) -0.0070** (0.0015)
N x i(IMP_METHOD,ref=“Observed”)L2PMM -0.0009 (0.0015) -0.0044* (0.0018)
N x i(IMP_METHOD,ref=“Observed”)PMM 0.0018 (0.0013) -0.0061* (0.0022)
MISS_PERC50%Missing x MISS_TYPEDEMOG 0.9041*** (0.1102) 0.5824*** (0.0411)
MISS_PERC70%Missing x MISS_TYPEDEMOG 1.845*** (0.1886) 1.221*** (0.1025)
MISS_PERC50%Missing x MISS_TYPEGROWTH 1.494*** (0.1892) 1.015*** (0.0560)
MISS_PERC70%Missing x MISS_TYPEGROWTH 3.759*** (0.4948) 2.593*** (0.2525)
MISS_PERC50%Missing x i(IMP_METHOD,ref=“Observed”)L2PAN_LONG -2.172*** (0.3319) -1.732*** (0.1532)
MISS_PERC70%Missing x i(IMP_METHOD,ref=“Observed”)L2PAN_LONG -3.613* (1.072) -2.998** (0.6656)
MISS_PERC50%Missing x i(IMP_METHOD,ref=“Observed”)L2PAN -2.652*** (0.1604) -2.365*** (0.0976)
MISS_PERC70%Missing x i(IMP_METHOD,ref=“Observed”)L2PAN -5.464*** (0.3735) -4.677*** (0.2763)
MISS_PERC50%Missing x i(IMP_METHOD,ref=“Observed”)RQ -2.716*** (0.1338) -1.898*** (0.1300)
MISS_PERC70%Missing x i(IMP_METHOD,ref=“Observed”)RQ -5.535*** (0.2995) -3.981*** (0.2943)
MISS_PERC50%Missing x i(IMP_METHOD,ref=“Observed”)RF -2.404*** (0.1338) -2.069*** (0.1049)
MISS_PERC70%Missing x i(IMP_METHOD,ref=“Observed”)RF -4.757*** (0.3312) -4.017*** (0.2755)
MISS_PERC50%Missing x i(IMP_METHOD,ref=“Observed”)L2PMM -2.117*** (0.1746) -1.659*** (0.1598)
MISS_PERC70%Missing x i(IMP_METHOD,ref=“Observed”)L2PMM -4.368*** (0.3852) -3.453*** (0.3944)
MISS_PERC50%Missing x i(IMP_METHOD,ref=“Observed”)PMM -2.821*** (0.1336) -1.911*** (0.1427)
MISS_PERC70%Missing x i(IMP_METHOD,ref=“Observed”)PMM -5.795*** (0.3250) -4.024*** (0.3083)
MISS_TYPEDEMOG x i(IMP_METHOD,ref=“Observed”)L2PAN_LONG -7.293*** (0.8623) -5.412*** (0.4781)
MISS_TYPEGROWTH x i(IMP_METHOD,ref=“Observed”)L2PAN_LONG -12.95*** (1.745) -9.545*** (0.8786)
MISS_TYPEDEMOG x i(IMP_METHOD,ref=“Observed”)L2PAN -8.377*** (0.2375) -6.678*** (0.1862)
MISS_TYPEGROWTH x i(IMP_METHOD,ref=“Observed”)L2PAN -13.52*** (0.5859) -11.66*** (0.4340)
MISS_TYPEDEMOG x i(IMP_METHOD,ref=“Observed”)RQ -8.303*** (0.2112) -7.187*** (0.1673)
MISS_TYPEGROWTH x i(IMP_METHOD,ref=“Observed”)RQ -13.11*** (0.5569) -12.64*** (0.4230)
MISS_TYPEDEMOG x i(IMP_METHOD,ref=“Observed”)RF -7.748*** (0.2094) -6.712*** (0.2066)
MISS_TYPEGROWTH x i(IMP_METHOD,ref=“Observed”)RF -12.11*** (0.4668) -11.63*** (0.4671)
MISS_TYPEDEMOG x i(IMP_METHOD,ref=“Observed”)L2PMM -6.322*** (0.2525) -6.566*** (0.2226)
MISS_TYPEGROWTH x i(IMP_METHOD,ref=“Observed”)L2PMM -10.28*** (0.5286) -11.16*** (0.4990)
MISS_TYPEDEMOG x i(IMP_METHOD,ref=“Observed”)PMM -8.652*** (0.1873) -7.193*** (0.1601)
MISS_TYPEGROWTH x i(IMP_METHOD,ref=“Observed”)PMM -13.71*** (0.5055) -12.72*** (0.3912)
Fixed-Effects: —————— —————— ——————- ——————-
GRADE^CONTENT_AREA Yes Yes Yes Yes
________________________________________ __________________ __________________ ___________________ ___________________
S.E.: Clustered by: GRA.^CON. by: GRA.^CON. by: GRA.^CON. by: GRA.^CON.
Observations 52,542 52,542 52,542 52,542
R2 0.24881 0.37143 0.28053 0.41316
Within R2 0.24476 0.36804 0.27626 0.40967

2.4.2 School Level

In these models, the data are aggregated at the school level.

Table 2.8: Linear fixed-effect regression models for raw bias at the school level
Scale Scores: Additive Scale Scores: Interaction SGPs: Additive SGPs: Interaction
(Intercept) 3.313*** (0.1268) -2.369*** (0.2377) -0.1926 (0.1315) 0.1976 (0.2784)
N -0.0023*** (0.0001) 0.0028*** (0.0004) 0.0009*** (0.0001) -0.0007. (0.0004)
MISS_PERC50%Missing 1.563*** (0.0875) 1.642*** (0.2521) -0.1834* (0.0908) -0.2607 (0.2953)
MISS_PERC70%Missing 3.657*** (0.0875) 3.611*** (0.2521) -0.3536*** (0.0908) -0.3676 (0.2953)
MISS_TYPEDEMOG 3.802*** (0.0875) 8.155*** (0.2521) -0.0912 (0.0908) -0.1054 (0.2953)
MISS_TYPEGROWTH 7.908*** (0.0875) 13.73*** (0.2521) 0.0677 (0.0908) 0.3147 (0.2953)
i(var=IMP_METHOD,ref=“Observed”)L2PAN_LONG -4.773*** (0.1337) 2.662*** (0.3038) -0.0923 (0.1387) -0.2010 (0.3558)
i(var=IMP_METHOD,ref=“Observed”)L2PAN -5.254*** (0.1337) 2.439*** (0.3038) -0.1684 (0.1387) -0.2914 (0.3558)
i(var=IMP_METHOD,ref=“Observed”)RQ -5.332*** (0.1337) 2.030*** (0.3038) -0.5141*** (0.1387) -0.9991** (0.3558)
i(var=IMP_METHOD,ref=“Observed”)RF -4.701*** (0.1337) 2.155*** (0.3038) -0.3176* (0.1387) -0.5147 (0.3558)
i(var=IMP_METHOD,ref=“Observed”)L2PMM -4.367*** (0.1337) 1.723*** (0.3038) -0.5913*** (0.1387) -0.9403** (0.3558)
i(var=IMP_METHOD,ref=“Observed”)PMM -5.423*** (0.1337) 2.128*** (0.3038) -0.6258*** (0.1387) -1.059** (0.3558)
N x MISS_PERC50%Missing -0.0008** (0.0003) 0.0005 (0.0003)
N x MISS_PERC70%Missing -0.0020*** (0.0003) 0.0009** (0.0003)
N x MISS_TYPEDEMOG -0.0023*** (0.0003) 5.18e-5 (0.0003)
N x MISS_TYPEGROWTH -0.0059*** (0.0003) -0.0003 (0.0003)
N x i(IMP_METHOD,ref=“Observed”)L2PAN_LONG -0.0030*** (0.0004) 0.0005 (0.0005)
N x i(IMP_METHOD,ref=“Observed”)L2PAN -0.0019*** (0.0004) 0.0009. (0.0005)
N x i(IMP_METHOD,ref=“Observed”)RQ -0.0012** (0.0004) 0.0022*** (0.0005)
N x i(IMP_METHOD,ref=“Observed”)RF -0.0018*** (0.0004) 0.0012* (0.0005)
N x i(IMP_METHOD,ref=“Observed”)L2PMM -0.0009* (0.0004) 0.0019*** (0.0005)
N x i(IMP_METHOD,ref=“Observed”)PMM -0.0013** (0.0004) 0.0022*** (0.0005)
MISS_PERC50%Missing x MISS_TYPEDEMOG 1.605*** (0.1897) 0.0484 (0.2222)
MISS_PERC70%Missing x MISS_TYPEDEMOG 3.477*** (0.1897) -0.1010 (0.2222)
MISS_PERC50%Missing x MISS_TYPEGROWTH 3.267*** (0.1897) 0.1144 (0.2222)
MISS_PERC70%Missing x MISS_TYPEGROWTH 7.545*** (0.1897) 0.0560 (0.2222)
MISS_PERC50%Missing x i(IMP_METHOD,ref=“Observed”)L2PAN_LONG -1.516*** (0.2898) -0.0591 (0.3395)
MISS_PERC70%Missing x i(IMP_METHOD,ref=“Observed”)L2PAN_LONG -2.769*** (0.2898) -0.1091 (0.3395)
MISS_PERC50%Missing x i(IMP_METHOD,ref=“Observed”)L2PAN -1.763*** (0.2898) -0.1108 (0.3395)
MISS_PERC70%Missing x i(IMP_METHOD,ref=“Observed”)L2PAN -3.737*** (0.2898) -0.2709 (0.3395)
MISS_PERC50%Missing x i(IMP_METHOD,ref=“Observed”)RQ -1.808*** (0.2898) -0.2136 (0.3395)
MISS_PERC70%Missing x i(IMP_METHOD,ref=“Observed”)RQ -3.750*** (0.2898) -0.4343 (0.3395)
MISS_PERC50%Missing x i(IMP_METHOD,ref=“Observed”)RF -1.563*** (0.2898) -0.1560 (0.3395)
MISS_PERC70%Missing x i(IMP_METHOD,ref=“Observed”)RF -3.177*** (0.2898) -0.3275 (0.3395)
MISS_PERC50%Missing x i(IMP_METHOD,ref=“Observed”)L2PMM -1.495*** (0.2898) -0.2673 (0.3395)
MISS_PERC70%Missing x i(IMP_METHOD,ref=“Observed”)L2PMM -3.114*** (0.2898) -0.4810 (0.3395)
MISS_PERC50%Missing x i(IMP_METHOD,ref=“Observed”)PMM -1.844*** (0.2898) -0.2588 (0.3395)
MISS_PERC70%Missing x i(IMP_METHOD,ref=“Observed”)PMM -3.831*** (0.2898) -0.5061 (0.3395)
MISS_TYPEDEMOG x i(IMP_METHOD,ref=“Observed”)L2PAN_LONG -5.981*** (0.2898) 0.0619 (0.3395)
MISS_TYPEGROWTH x i(IMP_METHOD,ref=“Observed”)L2PAN_LONG -8.810*** (0.2898) -0.0522 (0.3395)
MISS_TYPEDEMOG x i(IMP_METHOD,ref=“Observed”)L2PAN -6.540*** (0.2898) 0.0012 (0.3395)
MISS_TYPEGROWTH x i(IMP_METHOD,ref=“Observed”)L2PAN -9.037*** (0.2898) -0.2240 (0.3395)
MISS_TYPEDEMOG x i(IMP_METHOD,ref=“Observed”)RQ -6.413*** (0.2898) -0.0047 (0.3395)
MISS_TYPEGROWTH x i(IMP_METHOD,ref=“Observed”)RQ -8.774*** (0.2898) -0.2285 (0.3395)
MISS_TYPEDEMOG x i(IMP_METHOD,ref=“Observed”)RF -5.804*** (0.2898) 0.0190 (0.3395)
MISS_TYPEGROWTH x i(IMP_METHOD,ref=“Observed”)RF -8.138*** (0.2898) -0.2383 (0.3395)
MISS_TYPEDEMOG x i(IMP_METHOD,ref=“Observed”)L2PMM -5.380*** (0.2898) 0.0168 (0.3395)
MISS_TYPEGROWTH x i(IMP_METHOD,ref=“Observed”)L2PMM -7.366*** (0.2898) -0.2680 (0.3395)
MISS_TYPEDEMOG x i(IMP_METHOD,ref=“Observed”)PMM -6.542*** (0.2898) -0.0015 (0.3395)
MISS_TYPEGROWTH x i(IMP_METHOD,ref=“Observed”)PMM -9.061*** (0.2898) -0.2591 (0.3395)
________________________________________ ___________________ ___________________ ___________________ __________________
S.E. type Standard Standard Standard Standard
Observations 14,616 14,616 14,616 14,616
R2 0.46577 0.58271 0.00730 0.01095
Adj. R2 0.46537 0.58131 0.00655 0.00763
Table 2.9: Linear fixed-effect regression models for absolute bias at the school level
Scale Scores: Additive Scale Scores: Interaction SGPs: Additive SGPs: Interaction
(Intercept) 3.950*** (0.1186) -1.758*** (0.2225) 1.170*** (0.0872) 0.5429** (0.1833)
N -0.0024*** (0.0001) 0.0018*** (0.0003) -0.0025*** (8.66e-5) -0.0007* (0.0003)
MISS_PERC50%Missing 1.781*** (0.0819) 2.096*** (0.2359) 1.181*** (0.0602) 0.8555*** (0.1944)
MISS_PERC70%Missing 4.109*** (0.0819) 4.440*** (0.2359) 2.463*** (0.0602) 2.056*** (0.1944)
MISS_TYPEDEMOG 2.914*** (0.0819) 7.558*** (0.2359) 0.6693*** (0.0602) 1.334*** (0.1944)
MISS_TYPEGROWTH 6.925*** (0.0819) 12.99*** (0.2359) 0.7760*** (0.0602) 1.830*** (0.1944)
i(var=IMP_METHOD,ref=“Observed”)L2PAN_LONG -4.405*** (0.1251) 2.158*** (0.2843) 1.175*** (0.0920) 0.7827*** (0.2342)
i(var=IMP_METHOD,ref=“Observed”)L2PAN -5.100*** (0.1251) 2.513*** (0.2843) 0.4012*** (0.0920) 0.9131*** (0.2342)
i(var=IMP_METHOD,ref=“Observed”)RQ -4.729*** (0.1251) 2.843*** (0.2843) 1.790*** (0.0920) 2.246*** (0.2342)
i(var=IMP_METHOD,ref=“Observed”)RF -4.508*** (0.1251) 2.571*** (0.2843) 0.7818*** (0.0920) 1.276*** (0.2342)
i(var=IMP_METHOD,ref=“Observed”)L2PMM -4.035*** (0.1251) 2.430*** (0.2843) 1.801*** (0.0920) 2.222*** (0.2342)
i(var=IMP_METHOD,ref=“Observed”)PMM -4.756*** (0.1251) 2.942*** (0.2843) 1.799*** (0.0920) 2.233*** (0.2342)
N x MISS_PERC50%Missing -0.0009*** (0.0003) -0.0008*** (0.0002)
N x MISS_PERC70%Missing -0.0021*** (0.0003) -0.0018*** (0.0002)
N x MISS_TYPEDEMOG -0.0013*** (0.0003) -0.0007** (0.0002)
N x MISS_TYPEGROWTH -0.0043*** (0.0003) -0.0007*** (0.0002)
N x i(IMP_METHOD,ref=“Observed”)L2PAN_LONG -0.0017*** (0.0004) -8.56e-5 (0.0003)
N x i(IMP_METHOD,ref=“Observed”)L2PAN -0.0019*** (0.0004) -0.0007* (0.0003)
N x i(IMP_METHOD,ref=“Observed”)RQ -0.0012** (0.0004) -0.0007* (0.0003)
N x i(IMP_METHOD,ref=“Observed”)RF -0.0020*** (0.0004) -0.0008* (0.0003)
N x i(IMP_METHOD,ref=“Observed”)L2PMM -0.0009* (0.0004) -0.0005 (0.0003)
N x i(IMP_METHOD,ref=“Observed”)PMM -0.0013*** (0.0004) -0.0007* (0.0003)
MISS_PERC50%Missing x MISS_TYPEDEMOG 1.178*** (0.1776) 0.1787 (0.1463)
MISS_PERC70%Missing x MISS_TYPEDEMOG 2.622*** (0.1776) 0.2929* (0.1463)
MISS_PERC50%Missing x MISS_TYPEGROWTH 2.790*** (0.1776) 0.2173 (0.1463)
MISS_PERC70%Missing x MISS_TYPEGROWTH 6.525*** (0.1776) 0.4116** (0.1463)
MISS_PERC50%Missing x i(IMP_METHOD,ref=“Observed”)L2PAN_LONG -1.445*** (0.2713) 0.4563* (0.2235)
MISS_PERC70%Missing x i(IMP_METHOD,ref=“Observed”)L2PAN_LONG -2.661*** (0.2713) 0.7504*** (0.2235)
MISS_PERC50%Missing x i(IMP_METHOD,ref=“Observed”)L2PAN -1.739*** (0.2713) 0.2199 (0.2235)
MISS_PERC70%Missing x i(IMP_METHOD,ref=“Observed”)L2PAN -3.513*** (0.2713) 0.3497 (0.2235)
MISS_PERC50%Missing x i(IMP_METHOD,ref=“Observed”)RQ -1.604*** (0.2713) 0.7751*** (0.2235)
MISS_PERC70%Missing x i(IMP_METHOD,ref=“Observed”)RQ -3.250*** (0.2713) 1.235*** (0.2235)
MISS_PERC50%Missing x i(IMP_METHOD,ref=“Observed”)RF -1.495*** (0.2713) 0.5149* (0.2235)
MISS_PERC70%Missing x i(IMP_METHOD,ref=“Observed”)RF -2.935*** (0.2713) 0.8630*** (0.2235)
MISS_PERC50%Missing x i(IMP_METHOD,ref=“Observed”)L2PMM -1.377*** (0.2713) 0.7340** (0.2235)
MISS_PERC70%Missing x i(IMP_METHOD,ref=“Observed”)L2PMM -2.777*** (0.2713) 1.149*** (0.2235)
MISS_PERC50%Missing x i(IMP_METHOD,ref=“Observed”)PMM -1.592*** (0.2713) 0.7823*** (0.2235)
MISS_PERC70%Missing x i(IMP_METHOD,ref=“Observed”)PMM -3.192*** (0.2713) 1.243*** (0.2235)
MISS_TYPEDEMOG x i(IMP_METHOD,ref=“Observed”)L2PAN_LONG -5.621*** (0.2713) 0.2390 (0.2235)
MISS_TYPEGROWTH x i(IMP_METHOD,ref=“Observed”)L2PAN_LONG -8.104*** (0.2713) -0.1777 (0.2235)
MISS_TYPEDEMOG x i(IMP_METHOD,ref=“Observed”)L2PAN -6.477*** (0.2713) -0.5544* (0.2235)
MISS_TYPEGROWTH x i(IMP_METHOD,ref=“Observed”)L2PAN -9.054*** (0.2713) -0.8419*** (0.2235)
MISS_TYPEDEMOG x i(IMP_METHOD,ref=“Observed”)RQ -6.948*** (0.2713) -1.022*** (0.2235)
MISS_TYPEGROWTH x i(IMP_METHOD,ref=“Observed”)RQ -9.620*** (0.2713) -1.586*** (0.2235)
MISS_TYPEDEMOG x i(IMP_METHOD,ref=“Observed”)RF -6.072*** (0.2713) -0.7431*** (0.2235)
MISS_TYPEGROWTH x i(IMP_METHOD,ref=“Observed”)RF -8.538*** (0.2713) -1.251*** (0.2235)
MISS_TYPEDEMOG x i(IMP_METHOD,ref=“Observed”)L2PMM -6.065*** (0.2713) -1.022*** (0.2235)
MISS_TYPEGROWTH x i(IMP_METHOD,ref=“Observed”)L2PMM -8.193*** (0.2713) -1.637*** (0.2235)
MISS_TYPEDEMOG x i(IMP_METHOD,ref=“Observed”)PMM -7.022*** (0.2713) -0.9921*** (0.2235)
MISS_TYPEGROWTH x i(IMP_METHOD,ref=“Observed”)PMM -9.883*** (0.2713) -1.553*** (0.2235)
________________________________________ ___________________ ___________________ ____________________ ___________________
S.E. type Standard Standard Standard Standard
Observations 14,616 14,616 14,616 14,616
R2 0.46193 0.57924 0.19303 0.20854
Adj. R2 0.46153 0.57782 0.19242 0.20588

2.5 Key Take-Aways

Prior to comparing among the different MI methods, a handful of trends merit comment. First, we find that percent bias tends to increase as the percentage of missingness increases. There also appears to be a greater increase in percent bias (or decrease in the simplified CI coverage rate) as the missingness percentage increases for data MAR with growth compared to the other missingness types. We also see relationships between each of the three dependent variables and the unit size. For example, there is greater variation in percent bias and CI coverage rates for smaller \(N\) values (either at the grade/content area or school level). Similarly, observations with smaller \(N\) values more often also have a simplified \(F_1\) statistic indicating significant differences between the imputed and true values.

The summary tables in Section 2.1 also highlight numerous conditions wherein the CI coverage rates are relatively small. Specifically, the school-level summary shows many coverage rates lower than 0.50 when data are MAR, particularly based on status and growth. Looking at the grade/content area summaries, it seems like these low coverage rates for scale scores are largely driven by grades 3 and 4 when data are MAR based on status and growth. These observations also tend to have higher scale score percent bias than the higher grades. Alternatively, when looking at coverage rates for SGPs, many imputation methods (e.g., RF, L2PMM) showed a negative relationship between coverage rates and grade level, particularly for higher missingness percentages.

Next, we compare results among the examined MI methods. Unlike the “no impact” simulations, there are fewer clear differences in MI efficacy among the imputation methods (holding other factors constant). Still, cross-sectional L2PAN tends to outperform the other methods. For example, compared to other MI methods, L2PAN often shows

  • Larger scale score coverage rates, larger SGP coverage rates (particularly for data MAR and higher missingness percentages), and fewer significant \(F_1\) statistics for SGPs when aggregating at the grade/content area level.
  • Higher scale score and SGP coverage rates, fewer significant \(F_1\) statistics for scale scores when data are MCAR, and fewer significant SGP \(F_1\) statistics when aggregating at the school level.

Still, there are many conditions where there is not a clear “winner” among the MI methods. For example, in certain cases, L2PAN and L2PMM have similar proportions of significant scale score \(F_1\) statistics at the grade/content area level. Moreover, random forest (RF) and L2PMM often appear to be viable MI options, sometimes showing similar results to L2PAN. However, RF and L2PMM showed some conditions with higher proportions of significant \(F_1\) statistics or lower coverage rates compared to L2PAN.

In the next two sections, we further examine the MI simulation results with either (a) cross-sectional L2PAN or (b) L2PMM. We include L2PMM here because in many cases, this method seemed to perform similarly to L2PAN.

3 Evaluating Cross-Sectional L2PAN

3.1 Scale Scores

3.1.1 Descriptive Statistics: Grade/Content Area

Figure 3.1: Average SS percent bias by grade/content area quantile, grade, and missingness characteristics

Average SS percent bias by grade/content area quantile, grade, and missingness characteristics

\(~\)

Figure 3.2: Average SS coverage rate by grade/content area quantile, grade, and missingness characteristics

Average SS coverage rate by grade/content area quantile, grade, and missingness characteristics

\(~\)

Figure 3.3: Proportion of cases where a significant difference between the imputed and true SS value was found using the F1 statistic

Proportion of cases where a significant difference between the imputed and true SS value was found using the F1 statistic

3.1.2 Descriptive Statistics: School Level

Figure 3.4: Average SS percent bias by school size quantile and missingness characteristics

Average SS percent bias by school size quantile and missingness characteristics

\(~\)

Figure 3.5: Average SS coverage rate by school size quantile and missingness characteristics

Average SS coverage rate by school size quantile and missingness characteristics

\(~\)

Figure 3.6: Proportion of cases where a significant difference between the imputed and true SS value was found using the F1 statistic

Proportion of cases where a significant difference between the imputed and true SS value was found using the F1 statistic

3.2 Student Growth Percentiles

3.2.1 Descriptive Statistics: Grade/Content Area

Figure 3.7: Average SGP percent bias by grade/content area quantile, grade, and missingness characteristics

Average SGP percent bias by grade/content area quantile, grade, and missingness characteristics

\(~\)

Figure 3.8: Average SGP coverage rate by grade/content area quantile, grade, and missingness characteristics

Average SGP coverage rate by grade/content area quantile, grade, and missingness characteristics

\(~\)

Figure 3.9: Proportion of cases where a significant difference between the imputed and true SGP value was found using the F1 statistic

Proportion of cases where a significant difference between the imputed and true SGP value was found using the F1 statistic

3.2.2 Descriptive Statistics: School Level

Figure 3.10: Average SGP percent bias by school size quantile and missingness characteristics

Average SGP percent bias by school size quantile and missingness characteristics

\(~\)

Figure 3.11: Average SGP coverage rate by school size quantile and missingness characteristics

Average SGP coverage rate by school size quantile and missingness characteristics

\(~\)

Figure 3.12: Proportion of cases where a significant difference between the imputed and true SGP value was found using the F1 statistic

Proportion of cases where a significant difference between the imputed and true SGP value was found using the F1 statistic

3.3 Key Take-Aways

Beginning with the imputed scale scores, we find (a) higher percent bias, (b) lower coverage rates, and (c) higher proportions of significant \(F_1\) statistics for lower grades, particularly when data are MAR based on status and growth. We similarly see worse performance when aggregating at the school level for conditions of high missingness with data MAR based on status and growth; these school-level results are likely driven by the scale score imputation for grades 3 and 4. Still, note that the percent bias did not exceed the “problematic” threshold of 5%.

Furthermore, there is evidence of a negative relationship between the \(N\) quantile and scale score coverage rates, as well as a positive relationship between the \(N\) quantile and the proportion of significant \(F_1\) statistics, for grades 3 and 4 when data are MAR. In other words, when data are MAR, observations in grades 3 and 4 are more likely to have lower scale score coverage rates and more significant \(F_1\) statistics when the grade/content area size is in a higher quantile. These latter trends differ from observations in grades 5 and higher, where there is largely no clear relationship between the \(N\) quantile and the given dependent variable.

We next summarize the results for the imputed SGPs with cross-sectional L2PAN. Here, we find evidence of higher percent bias for lower grade/content area size quantiles, particularly among higher grades. For example, the average SGP percent bias reaches around 22% for grade 8 observations in the first \(N\) quantile with 70% of data missing at random based on status and growth. Looking at SGP coverage rates, we don’t find clear trends as a function of grade/content area quantile or grade level. However, we do see that SGP coverage rates are often lower when data are MAR, and the \(F_1\) statistic is more often significant when data are MAR based on status and growth.

Aggregating at the school level, SGP percent bias tends to increase as missingness percentage increases and when data are MAR. Moreover, missingness type seems to have a stronger relationship with SGP coverage rates than missingness percentage, with slightly lower SGP coverage rates among data MAR based on status and growth. Still, the coverage rates don’t fall below 0.80; recall that scale score coverage rates were as low as 0.10, likely as a function of the imputation difficulties with grades 3 and 4. Finally, when examining results at the school level, there is only a noticeable relationship between school size quantile and SGP percent bias, with percent bias decreasing as the \(N\) quantile increases.

4 Evaluating L2PMM

4.1 Scale Scores

4.1.1 Descriptive Statistics: Grade/Content Area

Figure 4.1: Average SS percent bias by grade/content area quantile, grade, and missingness characteristics

Average SS percent bias by grade/content area quantile, grade, and missingness characteristics

\(~\)

Figure 4.2: Average SS coverage rate by grade/content area quantile, grade, and missingness characteristics

Average SS coverage rate by grade/content area quantile, grade, and missingness characteristics

\(~\)

Figure 4.3: Proportion of cases where a significant difference between the imputed and true SS value was found using the F1 statistic

Proportion of cases where a significant difference between the imputed and true SS value was found using the F1 statistic

4.1.2 Descriptive Statistics: School Level

Figure 4.4: Average SS percent bias by school size quantile and missingness characteristics

Average SS percent bias by school size quantile and missingness characteristics

\(~\)

Figure 4.5: Average SS coverage rate by school size quantile and missingness characteristics

Average SS coverage rate by school size quantile and missingness characteristics

\(~\)

Figure 4.6: Proportion of cases where a significant difference between the imputed and true SS value was found using the F1 statistic

Proportion of cases where a significant difference between the imputed and true SS value was found using the F1 statistic

4.2 Student Growth Percentiles

4.2.1 Descriptive Statistics: Grade/Content Area

Figure 4.7: Average SGP percent bias by grade/content area quantile, grade, and missingness characteristics

Average SGP percent bias by grade/content area quantile, grade, and missingness characteristics

\(~\)

Figure 4.8: Average SGP coverage rate by grade/content area quantile, grade, and missingness characteristics

Average SGP coverage rate by grade/content area quantile, grade, and missingness characteristics

\(~\)

Figure 4.9: Proportion of cases where a significant difference between the imputed and true SGP value was found using the F1 statistic

Proportion of cases where a significant difference between the imputed and true SGP value was found using the F1 statistic

4.2.2 Descriptive Statistics: School Level

Figure 4.10: Average SGP percent bias by school size quantile and missingness characteristics

Average SGP percent bias by school size quantile and missingness characteristics

\(~\)

Figure 4.11: Average SGP coverage rate by school size quantile and missingness characteristics

Average SGP coverage rate by school size quantile and missingness characteristics

\(~\)

Figure 4.12: Proportion of cases where a significant difference between the imputed and true SGP value was found using the F1 statistic

Proportion of cases where a significant difference between the imputed and true SGP value was found using the F1 statistic

4.3 Key Take-Aways

Many of the trends from the L2PAN results replicated when looking at L2PMM. Starting with the scale scores, there is evidence of higher percent bias for grades 3 and 4 when data are MAR based on status and growth, as well as a general increase in scale score percent bias as the missingness percentage increases. Moreover, we again see substantially lower scale score coverage rates for the lower grades under MAR based on status and growth; the coverage rates decrease for these observations as the grade/content area size quantile increases. When looking at the \(F_1\) statistics for scale scores, a higher proportion are statistically significant for grades 3 and 4, particularly among higher grade/content area size quantiles.

Turning to the SGPs, we again see higher percent bias for higher grades. The SGP coverage rates slightly decrease as the percentage missingness increases, but this relationship is small to negligible. Furthermore, when data are MAR, grade 8 observations often had the highest proportion of significant \(F_1\) statistics for SGPs, particularly for smaller grade/content area size quantiles. Evaluating the results at the school level, we find evidence of a negative relationship between SGP percent bias and school size quantile, as well as evidence of worse performance with L2PMM among higher missingness percentages (holding other factors constant). At the school level, the SGP coverage rates were often lowest in the fourth school size quantile.

5 Summary

The current study focused on the efficacy of multiple imputation for creating “adjusted” scale scores and SGPs among data with a simulated COVID-19 impact. To briefly summarize the above results, we find evidence that MI may be a plausible mechanism for dealing with missing data when

  • Cross-sectional L2PAN is used with the mice R package
  • Less than 50% of the data are missing (although note that some researchers posit upper thresholds of 40% for using MI; Jakobsen et al., 2017)
  • Data are not missing at random based only on status and growth
  • Grade/content area or school sizes are relatively large

A clear trend throughout these results was that MI struggled to generate accurate scale scores and SGPs when imputing MAR data for grades 3 and 4, particularly when data were missing based on status and growth. In other instances, such as when imputing SGPs, there was higher percent bias for higher grades. These variations indicate that researchers and policymakers should examine MI performance at the grade level when evaluating the method’s accuracy.

It is important to highlight certain limitations of the present simulation study. Specifically, we cannot appropriately generalize our findings beyond the conditions examined in the simulation design. For example, it remains to be seen how L2PAN, L2PMM, and other MI methods perform when data are missing at random based on other characteristics, or are missing not at random (MNAR). If data are MNAR, Jakobsen and colleagues (2017) recommend that analyses be conducted using only the observed cases with an accompanying discussion of the missingness magnitudes (see Figure 1 in Jakobsen et al., 2017).

Although these results shed light on certain conditions wherein MI performs relatively well (in terms of percent bias, simplified CI coverage rate, and the simplified \(F_1\) statistic), it is difficult to clearly pinpoint generalizable thresholds for determining whether MI can be applied to a given data set. Rather, we recommend that descriptive analyses accompany any report on academic status and growth comparisons. These descriptives can include missingness patterns within larger participation analyses, as well as diagnostic checks after imputation to ensure that MI worked relatively well with the data.

6 References

  • Berge, L. (2018). Efficient estimation of maximum likelihood models with multiple fixed-effects: the R package FENmlm. CREA Discussion Papers.
  • Betebenner, D. W., Van Iwaarden, A. R., & Domingue, B. (2021). SGPdata: Exemplar data sets for student growth percentile (SGP) analyses. R package version 25.1-0.0. https://centerforassessment.github.io/SGPdata/
  • Demirtas, H. (2004). Simulation driven inferences for multiply imputed longitudinal datasets. Statistica neerlandica, 58(4), 466-482. https://doi.org/10.1111/j.1467-9574.2004.00271.x
  • Jakobsen, J. C., Gluud, C., Wetterslev, J., & Winkel, P. (2017). When and how should multiple imputation be used for handling missing data in randomised clinical trials - a practical guide with flowcharts. BMC Medical Research Methodology, 162, 1-10. https://doi.org/10.1186/s12874-017-0442-1
  • Miri, H. H., Hassanzadeh, J., Khaniki, S. H., Akrami, R., & Sirjani, E. (2020). Accuracy of five multiple imputation methods in estimating prevalence of Type 2 diabetes based on STEPS surveys. Journal of Epidemiology and Global Health, 10(1), 36-41. https://doi.org/10.2991/jegh.k.191207.001
  • Qi, L., Wang, Y.-F., & He, Y. (2010). A comparison of multiple imputation and fully augmented weighted estimators for Cox regression with missing covariates. Statistics in Medicine, 29(25), 2592-2604. https://doi.org/10.1002/sim.4016
  • van Buuren, S. (2018). Flexible imputation of missing data. CRC Press. https://stefvanbuuren.name/fimd/
  • van Buuren, S., & Groothuis-Oudshoorn, K. (2011). mice: Multivariate imputation by chained equations in R. Journal of Statistical Software, 45(3), 1-67. https://www.jstatsoft.org/v45/i03/
  • Vink, G., & van Buuren, S. (2014). Pooling multiple imputations when the sample happens to be the population. arXiv Pre-Print 1409.8542.
  • Zhao, J. H., & Schafer, J. L. (2018). pan: Multiple imputation for multivariate panel or clustered data. R package version 1.6.