Function produces multiple imputations for missing data via methods available for the "mice" package.
imputeScaleScore( impute.data, additional.data = NULL, include.additional.missing = TRUE, compact.results = TRUE, return.current.year.only = FALSE, diagnostics.dir = getwd(), growth.config = NULL, status.config = NULL, default.vars = c("CONTENT_AREA", "GRADE", "SCALE_SCORE", "ACHIEVEMENT_LEVEL"), demographics = NULL, institutions = NULL, impute.factors = "SCALE_SCORE", impute.long = FALSE, impute.method = NULL, cluster.institution = FALSE, # TRUE for multilevel, cross-sectional methods partial.fill = TRUE, parallel.config = NULL, # define cores, packages, cluster.type seed = 4224L, M = 10, maxit = 5, verbose=FALSE, ...)
impute.data | The incomplete dataset in which to impute student scale score values. |
---|---|
additional.data | The function will return only data that is required for a SGP analysis (based on growth.config). This allows for the addition of more data (e.g. 2019 grades not used as priors) |
include.additional.missing | TRUE. The function will create scale scores
for students present in the most recent prior year without rows in the current
year. If FALSE, assumes that rows with |
compact.results | By default (TRUE), the function returns a single data.table object with a column added for each requested imputation (M). If FALSE, function will return a list of longitudinal datassets with the current (imputed) and prior (unchanged) student records. This is helpful for diagnostics and ease of use, but also produces more redundant prior data than needed. |
return.current.year.only | FALSE. Return all records from the long data used to create the imputed scale scores. If TRUE, only records from the current year are returned. |
diagnostics.dir | Directory path for diagnostic plots to be placed. Default is the working directory. Additional subdirectories are created internally as needed. |
growth.config | An elongated SGP config script with an entry for each grade/content_area/year cohort that will be analyzed in subsequent simulations. |
status.config | An elongated SGP config script with an entry for each grade/content_area/year cohort that will be analyzed in subsequent simulations. Unlike a growth.config entry, status.config entries use data from the same grade, but from a prior year (i.e. not individual variables). For example you might impute missing 2021 3rd grade ELA scores from 2019 3rd grade school mean scale score, FRL status, etc. |
default.vars | Variables that will be used in extracting cohort records (subject and grade) and other variables from the data that the user may want returned in the imputed data. |
demographics | Demographic (factor/character) variables to use and/or return in the final dataset. |
institutions | Institution IDs that will be used/or returned in the imputed data |
impute.factors | Intersection of default.vars, demographics and institutions that will be used in the imputation calculations. An institution included will be used to construct institution level means of other student level impute.factors. For example, with c("SCHOOL_NUMBER", "SCALE_SCORE", "FREE_REDUCED_LUNCH_STATUS"), student level scores and demographics will be used along with their associated school level mean scores and proportion FRL (4 total factors). The default ('SCALE_SCORE') means that only prior scale scores are considered. |
impute.long | Defaults to FALSE, meaning scores are imputed in a "wide", or cross-sectional format. Some imputation methods (e.g., "2l.pan") can be used to impute the data in a longitudinal form (clustered by student over time/grade), in which case this argument should be set to TRUE. |
impute.method | The name of the method from the mice package or other add-on package functions used to impute the scale scores. The default is NULL, which translates to the default method in the mice package - "pmm" for predictive mean matching. |
cluster.institution | Defaults to FALSE, meaning institution IDs will not be used to identify clustering/levels in a multilevel fashion. Typically only used for multilevel methods (e.g., "2l.pan" or "2l.pmm"), but can also be used in other methods to include institution ID as a factor variable. |
partial.fill | Should an attempt be made to fill in some of the demographic and institution ID information based on students previous values? Part of the process of the imputeScaleScore function is to take the long data (impute.data) and then first widen and then re-lengthen the data, which creates holes for students with incomplete longitudinal records. This part of the function fills in these holes for students with existing missing data. |
parallel.config | The default is NULL meaning imputations will be calculated sequentially in a call to mice::mice. Alternatively, a list with named elements cores, packages and/or cluster.type can be provided. The "cores" element should be a singl numeric value for the number of CPU cores/hyperthreads to use. The "packages" element is a character string of package name(s) for the requested imputation methods. "cluster.type" specifies the parallel backend to use (typically FORK for Linux/Unix/Mac and "PSOCK" for Windows). An example list might look like: "list(packages = c('mice', 'miceadds'), cores=10)". |
seed | A random seed set for the imputation process to allow for replication of results, or for alternative results using the same code. |
M | The number of imputed datasets to return. The default is 10. |
maxit | The number of iterations allowed for each imputation process. See the 'mice' package documentation for details. |
verbose | Defaults to FALSE, meaning progress information from the mice package is not printed out to the console. |
... | Additional arguments for the mice::mice function and any particular imputation method/function. See each function/package documentation for details. |
The function returns either a single imputed data set (with additional columns for each number of requested imputations) or a list of M imputed datasets from the specified imputation method. The datasets will exclude data for students not used in any of the specified growth.config or status.config cohorts, unless the additional.data argument has been included.
Function returns either a single data set containing additional columns of imputed records of a list (default) of imputed longitudinal data sets.
not_run({ data_to_impute <- SGPdata::sgpData_LONG_COVID ### Read in STEP 0 SGP configuration scripts source("SGP_CONFIG/STEP_0/Ampute_2021/Growth.R") source("SGP_CONFIG/STEP_0/Ampute_2021/Status.R") ### NOTE: the imputeScaleScore function requires the "mice" package! Test_Data_LONG <- imputeScaleScore( impute.data = data_to_impute, growth.config = growth_config_2021, status.config = status_config_2021, M = 1) })