Abstract


Flexible Report Generation

Demonstration Session, NCME 2022 Annual Meeting


Submitted to:

Derek C. Briggs, Ph.D.
The National Council on Measurement in Education
April 23, 2022

Author(s):

Adam R. VanIwaarden and
Damian W. Betebenner


Project Team: Team SGP
For More Information:

Project Code: Github


Acknowledgements:

The Center for Assessment wants to thank our partners and colleagues at the Center for Assessment for their contributions to the creation of this report.

Suggested Citation:

The National Center for the Improvement of Educational Assessment (23,). Flexible Report Generation. Submitted to The National Council on Measurement in Education, San Diego, CA.

Why Be Flexible?

Data analysts are often tasked with writing reports that describe data, analyses, and results associated with a project. Depending upon the nature of the project, such reports are either completely customized or borrow heavily from other reports (e.g., annual reports). GitHub repositories and associated GitHub actions can be used to coordinate the writing as well as the production of final reports for dissemination. The process we demonstrate utilizes R, R Markdown, and several associated R packages as the base tools to construct these reports.

NCME_2022_Project Purpose

The concept of “flexibility” in report generation can be applied in numerous ways. It may mean generating multiple format reports (e.g., websites and PDF document), setting up workflows that can be used in various settings or for any number of clients, or combining analytics and documentation into a seamless process. In building this GitHub repository for our demonstration, we have tried to condense the lessons we have learned on being flexible in our processes of data analytics and documentation. We outline five steps that help us get from raw data to reporting results in this demonstration.

First a well structured working environment should be set up, including a uniform directory structure with any required external assets and resources, as well as easily generalized R command and function scripts. Second, the data required to generate the report must be compiled in a consistent data-object structure. Third, any time-consuming and/or state-specific external analyses should be conducted and thoroughly reviewed, and also compiled in a consistent data-object structure. Fourth, all the required meta-data and report content information is compiled from generic (Universal) and custom sources. Lastly, the desired report formats are generated using the appropriate R functions.

This repository is a template for the workflow that we have used over the past year in our efforts with multiple states to begin investigating the academic impact of the Covid-19 pandemic. In these efforts we attempted to create generalizable data formats, standardized methods for analysis and universal content for reporting. This repo serves as an “all-in-one” representation of how we structured our efforts across the various states. Combining the multiple stages and components of these projects into a single repository is helpful in that users do not need to navigate multiple repositories as we have. However, it does mean that this this repo is quite complex in and of itself - sorry :wink:

There are detailed README files in many of the sub-component directories of this repo that give more detailed information about their contents. The main components located in this top directory are:

All_States

This component contains the “state” specific data analysis and report generation content. Although this project has been framed as a workflow across multiple states, this could be envisioned in other ways where many projects resemble each other, but separation is required: school level analysis and reporting, different branches of a simulation study, annual analysis/reporting within a single organization, technical reports, etc.

Each “state” has its own sub directory that houses “Initial_Data_Analysis”, “Report_Analyses”, “Data” and “Documentation” directories. These represent the various stages of our analysis and reporting efforts, and are typically located in different areas of our work environments.

  • Initial_Data_Analysis represents the framework used for typical annual data analysis used to clean, prepare and calculate Student Growth Percentiles (SGPs) for the states we work with. This typically includes:
    • confidential student data (housed securely)
    • R scripts (shared openly on Github)
  • Report_Analyses contains the R scripts used for each state after the initial calculation of SGPs. That is, additional analyses that were carried out in our efforts to investigate academic impact. Typically this is included in the “Documentation” directory/repo, but placed here for emphasis/differentiation.
  • Data is where we keep specifically formatted student data and results from the impact related analyses. Again, housed outside of any Github repo because it contains confidential data.
  • Documentation includes all the R Markdown based code, content and assets for generating reports. This is typically a separate Github repo (with final reports provided to clients, not included in the repo). This is the heart of the “Flexible Report Generation” portion of the NCME Demonstration session.

The data, R code and reports obviously are not typically stored together like this for confidentiality and other considerations. This demonstration uses the simulated student data (sgpData_LONG_COVID) from the SGPData package.

Universal_Content

The ability to generate reports flexibly and automatically with data and analytic results from multiple sources requires the identification of what content is universal and what must be customized to meet specific circumstances and situations. By “universal”, we mean elements that can (and often should) be used in all cases and updates or improvements are applied consistently. In our experience, every report begins as a fully custom report. As the process is repeated over and again, the pieces that are common to all become apparent and moved to an external source where they can be shared and accessed.

This is true of R code as well, as spaghetti code morphs into custom functions, and then formal functions and packages. Whether talking about text or code, the use of universal “parameters” also applies - small bits of information that define how the results are rendered, which are universal in their requirement and application, but their specification usually depends on the context or use case.

The contents of this directory represents what has been distilled into universal components.

  • Functions - As we create R code to do specific tasks and then re-use that code in other areas, we find it necessary to formalize the code into a function. Here we have examples of that where, rather than having the same chunk(s) of code to add simulated academic impact into each state’s branch of the repo, we create a function (addImpact) that can be applied universally to the data.
    • Since the function is sourced in from this directory, any updates, improvements or bug fixes made to it are applied to each analysis (when it is re-run).
    • The function could be added into a package after it is tested and proven in a project like this.
  • rmarkdown - This is where we store the common bits and blobs of text and other code snippets (to produce tables, plots, etc.), as well as other assets like css and JavaScript code, HTML templates, etc. used to format reports.
    • The Rmd files are used as “children”, which allows the task creating the report then to be assembling the parts (children) into the desired combination and order in a master (parent) document.
    • The css, JavaScript code, etc. in the “templates” subdirectory are structured in a way that is typically used (required?) in some packages commonly used to render R Markdown into documents (e.g., rmarkdown, bookdown, pagedown or our Literasee package).

Custom_Content

The other side of “Universal_Content” is “Custom_Content” - often we need specific tools for special cases. The “Custom_Content” directory here serves more as a placeholder or template for these components. They really belong in the “State_*” directories (usually put in the “Documentation/assets/rmd” subdirectory). However, the custom content for one project can often serve as a good template for others (similar to, but not quite, universal components). We include this element here for emphasis and differentiation.

Five steps to report generation

Generation of multiple format reports (e.g., a bookdown website and a pagedown PDF document) can be generally conducted in five steps. These steps are lined out generically in the .R scripts included in this repo.

First a consistent working environment should be set up, including a directory structure with all the required external assets and libraries, as well as the R command and function scripts included in this repo. Second, the data required to generate the report must be compiled in a consistent data-object structure. Third, any time-consuming and/or state-specific external analyses should be conducted and thoroughly reviewed, and also compiled in a consistent data-object structure. Fourth, list-objects that contain all the required meta-data and content indices are compiled from generic (Universal) and custom sources to create dual-format configuration scripts. Lastly, the desired formats are generated using the appropriate R functions.

Included files

  • The 1_Repo_Setup_and_Maintenance.R file contains R code from which required R packages can be installed or updated, and other assets can be copied into the report directory.
    • A script, Universal_Content/Meta_Data/Report_Packages.R, is available to both document and help install/update any R packages required for analyses to be run and the report(s) generated.
    • The “Documentation” directory for each branch (here “State_A and”State_B”) can be be set up using the setupReportDirectory function. This function pulls in assets from the Literasee package.
      • These include css, javascript, pandoc and Rmarkdown (.Rmd) assets the Literasee package needs to create a “NCIEA” themed report and website.
      • Templates for custom child.Rmd files can also be added to the directory (available from the Universal_Content repo/submodule). Alternatively, template custom content can be copied over from another state/project directory.
    • Changes/updates/upgrades to these assets in the Literasee package can be pulled into the “Documentation” directory using the updateAssets() function.
  • The 2_Report_Data.R script runs all formatting, cleaning and subsetting required to compile all data sources into a single dataset that will be used at report run-time (rendering).
    • Create/format/alter/augment one or more raw data sets including State_Assessment, College_Entrance, ELP_Assessment and (potentially multiple) Interim_Assessment data objects.
    • The compiled data must be a named list object called Report_Data, saved in a “Documentation/Data/” directory (NOT included in the Github repo!).
  • The script 3_Report_Analyses.R is meant to house report-specific analyses that can be re/run before the report is compiled and may take an inordinate amount of time to run or requires extensive evaluation and investigation before inclusion in the report.
    • These analyses may be universal enough to run for all states, or may be unique enough for each state that the analysis is customized for each state.
    • Ideally each of the data sources (State_Assessment, College_Entrance, ELP_Assessment and Interim_Assessment) will have the same or similar analysis types (e.g., participation, multiple_imputation, academic_impact, etc.).
    • Some analyses may be short and included directly within child RMD scripts, while others may be placed externally in the “Report_Analyses” directory. The compiled analysis results are stored in a named list object called Report_Analysis, which is saved in a “Documentation/Data” directory (NOT included in the GitHub repo!).
  • Generation of the multiple format reports (e.g., a bookdown website and a pagedown PDF document) typically depends on different types of configuration scripts that list the child documents to knit together: _bookdown.yml and index.Rmd for bookdown, and a parent .Rmd file (e.g., STATE_X_Academic_Impact_Analysis.Rmd) for pagedown.
    • The child documents can be generic (i.e. Universal_Content) or customized/novel content. The 4_Make_Configs.R script creates custom configuration lists that identify 1) state specific meta-data and report parameters and 2) a list of any custom Rmarkdown content to be used in place of, or in addition to, the universal report content.
    • These configuration and content lists are then combined with the Universal_Content configuration and content lists by source(...)’ing the R scripts Universal_Content/Meta_Data/Report_Configs.R and Universal_Content/Meta_Data/Report_Content.R.
    • The scripts are set up to give priority to the custom content, so that generic elements can be easily overridden. The combined custom and universal information list objects are then used by functions in the Literasee package to create the YAML and RMD files that control the report generation output.
  • The 5_Make_Report.R file contains R code to render the report using the pagedown templates (and workable scripts to create abookdown website).

Reporting Step 1: Create (and maintain) report directory

Install and update R packages required to run analyses and create report. This step may also require us to copy any assets, custom content templates, and other R scripts contained in the “Universal_Content” directory.

#   Set R working directory to the Documentation folder
# setwd("./Documentation")

# Locate the "Universal_Content" directory
universal.content.path <- file.path("")

# Install/update packages used in the report
source("../../../Universal_Content/Meta_Data/Report_Packages.R")

###   Load packages required for report setup
require(Literasee)

template.path <- file.path("../../../Custom_Content/assets/Child_RMD")
setupReportDirectory(custom.content.path = template.path)

# It may be necessary to occasionally update Literasee package assets.
# updateAssets(asset.type=c("css", "js", "pandoc", "rmd", "images"))

# copy additional assets from the Universal_Content directory
R.utils::copyDirectory(to = "assets/fonts",
         from = "../../../Universal_Content/rmarkdown/assets/fonts")

# setwd("..")

Reporting Step 2: Report_Data

In the data preparation and cleaning step of the the academic impact analysis, we create a Report_Data object with data (sub)sets from various available sources (including the statewide, English language proficiency, college entrance and/or interim assessments). We format, alter, augment the available/desired data and store it in a single named list object (Report_Data) that can be passed to 1) standardized and customized analyses and 2) the report generating code/process. In some cases this object can also be used used to create the report params.

This step assumes the user is operating with their working directory set to “NCME_2022_Project/All_States/State_A/Documentation”.

# setwd("./Documentation")

Load packages and custom functions.

The SGP package is required for the data analysis with simulated data.

require(SGP)
require(data.table)
require(cfaTools)

State_Assessment

In this data simulation and reporting exercise we are using State_A_SGP directly from the “Initial_Analysis” step. Typically we would need to load data from external sources at this stage (e.g. ‘State_A_SGP.Rdata’ that would have been saved/output in the “Initial_Data_Analysis/State_A_Baseline_SGP_Analyses.R” script.

Here we simply copy the data from the SGP object and name it State_A_Data.

State_A_Data <- copy(State_A_SGP@Data)

Data loading is typically followed by an initial subsetting to select only the report-relevant information (years, content areas and grades to be reported, variables used in report analyses, student/organization/other exclusion criteria, etc.). This process has already been carried out in the “Initial_Data_Analysis/State_A_Data_LONG.R” script.

Create lagged variables

Lagged scale and standardized score variables (and their associated achievement levels) are required for our academic impact analyses. Here we first create a standardized scale score variable that uses 2019 means and standard deviations to create a referenced standardization. This prevents us from “washing out” impact in 2021 that would happen if we standardized by year. We also standardize the scores by content area and grade level.

The SGP analyses already include some version of these lagged and/or standardized variables, however, these are only included for students for whom growth was calculated. This means that students with missing prior scores (and potentially others such as students with repeated/accelerated grade progressions) would not have these data in some cases. The following code chunk creates these variables.

##    Standardize SCALE_SCORE by CONTENT_AREA and GRADE using 2019 norms
State_A_Data[, SCALE_SCORE_STANDARDIZED :=
                      Z(.SD, "SCALE_SCORE", reference.year = "2019"),
                  by = list(CONTENT_AREA, GRADE),
                  .SDcols = c("YEAR", "CONTENT_AREA", "GRADE", "SCALE_SCORE")]

#   Need to run this again here to get SCALE_SCORE_PRIOR_*YEAR
#   Seems to get deleted in abcSGP - names are too close! Something to look into.

# getShiftedValues DOES NOT add in 2020 (YEAR completely missing from data)
# We need to add in this information with a small data.table - `missing_data`
missing_data <- data.table(YEAR = "2020",
                           GRADE = c(3:8, 3:8),
                           CONTENT_AREA = c(rep("ELA", 6),
                                            rep("MATHEMATICS", 6)))
State_A_Data <- rbindlist(list(State_A_Data, missing_data), fill=TRUE)

shift.key <- c("ID", "CONTENT_AREA", "YEAR", "GRADE", "VALID_CASE")
setkeyv(State_A_Data, shift.key)

getShiftedValues(State_A_Data,
                 shift_amount = c(1L, 2L),
                 shift_variable = c("SCALE_SCORE",
                                    "SCALE_SCORE_STANDARDIZED",
                                    "ACHIEVEMENT_LEVEL"))
# Check initial agreement between `SGP` and `shift` generated lagged variables:
# table(State_A_Data[YEAR==current.year,
#            ACHIEVEMENT_LEVEL_PRIOR, ACHIEVEMENT_LEVEL_LAG_2], exclude=NULL)

#  Clean up - remove 2020 dummy data and rename according to old conventions
State_A_Data <- State_A_Data[YEAR != '2020']
setnames(State_A_Data, gsub("LAG_1", "PRIOR_1YEAR", names(State_A_Data)))
setnames(State_A_Data, gsub("LAG_2", "PRIOR_2YEAR", names(State_A_Data)))

##    Fix 2021 Lags for Grades 3 & 4 - repeaters:
# table(State_A_Data[, YEAR, is.na(SCALE_SCORE_PRIOR_2YEAR)], exclude=NULL)
# table(State_A_Data[, GRADE, is.na(SCALE_SCORE_PRIOR_2YEAR)], exclude=NULL)
State_A_Data[GRADE %in% c(3, 4) | YEAR %in% c("2016", "2017"),
                    SCALE_SCORE_PRIOR_2YEAR := NA]
State_A_Data[GRADE %in% c(3, 4) | YEAR %in% c("2016", "2017"),
                    SCALE_SCORE_STANDARDIZED_PRIOR_2YEAR := NA]
State_A_Data[GRADE %in% c(3, 4) | YEAR %in% c("2016", "2017"),
                    ACHIEVEMENT_LEVEL_PRIOR_2YEAR := NA]
State_A_Data[GRADE == 3 | YEAR == "2016",
                    SCALE_SCORE_PRIOR_1YEAR := NA]
State_A_Data[GRADE == 3 | YEAR == "2016",
                    SCALE_SCORE_STANDARDIZED_PRIOR_1YEAR := NA]
State_A_Data[GRADE == 3 | YEAR == "2016",
                    ACHIEVEMENT_LEVEL_PRIOR_1YEAR := NA]

Create other variables

Through the course of analyzing academic impact, we may find there are additional variables that need to be created often (i.e. used in multiple analyses or in both an analysis and the report generation). In these cases it is best to create those variables here so that the process is standardized across all use cases.

For example, we may need a variable that converts the ACHIEVEMENT_LEVEL and/or ACHIEVEMENT_LEVEL_PRIOR variable into a dichotomous “Proficient/Not Proficient” variable. The following example creates those variables such that they are consistent not only in “State_A” data for use across analyses/reports, but also consistent across states so that variables with the same naming conventions could be used in the same R code or custom functions.

State_A_Data[, ACHIEVEMENT_ProfandAbove := fcase(
  ACHIEVEMENT_LEVEL %in% c("Partially Proficient", "Unsatisfactory"), "Not Proficient",
  ACHIEVEMENT_LEVEL %in% c("Advanced", "Proficient"), "Proficient")]

State_A_Data[YEAR == "2021", ACHIEVEMENT_LEVEL_PRIOR := ACHIEVEMENT_LEVEL_PRIOR_2YEAR]
State_A_Data[YEAR != "2021", ACHIEVEMENT_LEVEL_PRIOR := ACHIEVEMENT_LEVEL_PRIOR_1YEAR]
State_A_Data[, PRIOR_ACHIEVEMENT_ProfandAbove := fcase(
  ACHIEVEMENT_LEVEL_PRIOR %in% c("Partially Proficient", "Unsatisfactory"), "Not Proficient",
  ACHIEVEMENT_LEVEL_PRIOR %in% c("Advanced", "Proficient"), "Proficient")]

Remove redundant/replaced variables

In this data preparation and formatting step, we have created some variables that are now redundant or only needed in the variable creation process. We now do one last data cleaning step to make sure we are only saving the variables that we will need/want.

State_A_Data[, c("SCALE_SCORE_PRIOR",
                      "SCALE_SCORE_PRIOR_STANDARDIZED",
                      "ACHIEVEMENT_LEVEL_PRIOR",
                      "SGP_NORM_GROUP_BASELINE",
                      "SGP_NORM_GROUP_BASELINE_SCALE_SCORES") := NULL]

In this exercise, we are only using a simulated data source that mimics a statewide assessment (i.e. one given annually for accountability purposes). The process outlined above could also be applied to ELP, college entrance, interim or any other data source a state has. If the report analysis and reporting code is set up to accommodate these multiple data sources, then identical reporting can be done for each of these sources as well.

Combine all data and save

Finally we will combine all the data sources that will be used in the report analyses and reporting. Note that, regardless of the state/consortium/organization from which the data comes, we name the data object Report_Data. This allows us to flexibly and automatically apply R code and functions to this data source in a standardized way. In this way, Report_Data is a required parameter for this exercise.

Report_Data <- vector("list", 4);
names(Report_Data) <- c("State_Assessment", "College_Entrance", "ELP_Assessment", "Interim_Assessment")

Report_Data[["State_Assessment"]] <- State_A_Data; rm(State_A_Data)

if (!dir.exists(file.path("..", "Data"))) dir.create(file.path("..", "Data"))
save(Report_Data, file = file.path("..", "Data", "Report_Data.Rdata"))

# setwd("..")

Reporting Step 3: Report_Analyses

In this step we run any external analyses using data from any/all elements of the Report_Data object for the academic impact report and house the results in a seperate object named Report_Analyses. This object will eventually be passed to the rendered report. As with Report_Data, this can include State_Assessment, College_Entrance, ELP_Assessment and (potentially multiple) Interim_Assessment branches with multiple analysis results types common to them all. For example, all assessments may have a participation slot, but only one may have a multiple_imputation analysis.

This step assumes the user is operating with their working directory set to “NCME_2022_Project/All_States/State_A/Documentation”.

# setwd("./Documentation")
##   Locate the "Universal_Content" directory
universal.content.path <- file.path("..", "..", "..", "Universal_Content")

#  Load packages used in the Report Analyses (and install/update as necessary).
#  Here we assume updates have been made in Step 1, and that each report analysis
#  script has been run, but sourcing the "Report_Packages.R" may be necessary.

# source(file.path(universal.content.path, "Meta_Data", "Report_Packages.R"))

Load Report_Data and Report_Analyses objects

Running analyses for the report assumes that formatted data is available from the 2_Report_Data.R script. All custom functions and/or R scripts use Report_Data as the data source. Although this script may be sourced in its entirety, it is also possible to load the object and only (re)run certain sections. For that reason, we first load any existing Report_Analyses objects.

if (!exists("Report_Data")) load("../Data/Report_Data.Rdata")

##  Load an existing `Report_Analyses` object or set up list
if (file.exists("../Data/Report_Analyses.Rdata")) {
  load("../Data/Report_Analyses.Rdata")
} else { #  or create a new one and run all analyses
  Report_Analyses <- list()
}

State assessment analyses

For the statewide assessment in this demonstration report for State A, we will conduct two primary analyses: missing data and academic impact. We will also create a table that will be included in all state reports showing the percent proficient trend over several years . Because State A was not set up to include missing data, there is no multiple imputation analysis done here, and therefore a similar table is not produced showing the estimated impact on percent proficient from the missing data (via a multiple imputation analysis).

###   Declare an assessment flavor
assessment <- "State_Assessment"

###   Missing data visualizations
source("../Report_Analyses/State_A_Missing_Data_2021.R")

###   Academic impact visualizations
source("../Report_Analyses/State_A_Academic_Impact_Visualization.R")

###   Percent Proficient
pct_proficient <- function(achievement_level) {
  tmp.table <- table(achievement_level)
  100*sum(tmp.table["Proficient"])/sum(tmp.table)
}

tmp <- Report_Data[[assessment]][
          YEAR %in% c("2018", "2019", "2021") &
          CONTENT_AREA %in% c("ELA", "MATHEMATICS"),
            .(PERCENT_PROFICIENT= round(pct_proficient(ACHIEVEMENT_ProfandAbove), 1)),
          keyby=c("YEAR", "CONTENT_AREA", "GRADE")]

Report_Analyses[["Summary_Tables"]][[assessment]][[
                 "Achievement"]][["Overall_PctProf"]] <-
                     dcast(tmp, CONTENT_AREA + GRADE ~ YEAR,
                           sep=".", drop=FALSE, value.var="PERCENT_PROFICIENT")

##  Imputation Difference Summaries
# source(file.path(universal.content.path, "Functions",
#                  "Percent_Prof_Imputations_Overall_ContentArea_by_Grade.R"))
# Report_Analyses[["Summary_Tables"]][[assessment]][[
#                  "Imputation"]][["CONTENT_AREA__GRADE"]] <-
#                      State_Assessment_Summaries

Combine all data analyses into Report_Analyses and Save

##  This logic is required for running this script while generating the report:
if (exists("params")) {
  if (params$appendix.c)
    save(Report_Analyses, file = "../../../../Data/Report_Analyses.Rdata")
} else save(Report_Analyses, file = "../Data/Report_Analyses.Rdata")

# rm(params)
# setwd("..")

The Universality of Meta-Data

Before turning to Step 4, it may be useful to first look at the Universal Content on which this step interacts. In the “Universal_Content/Meta_Data” directory there are two files that contain a running list of the elements and resources that are available and/or required for the report generation in Step 5. The purpose of Step 4 is to combine the generic meta-data with the custom meta-data in hopes of making a complete picture of what the report will be.

Included files

  • The Report_Configs.R file contains R code that stores general report information in an appropriately formatted (named) list object. This object is given the name report.config and will be available for manipulation upon using source(...) to load it into the working R environment.
    • Information such as the report title/subtitle, author names/affiliations and parameters (data) that are required to created report content (tables/plots/conditional text) are stored in this file.
    • This universal list can be combined with another list with customized meta-data (named custom.config in order to add client specific meta-data into the report.config list (state name, people to acknowledge etc.), or to change/update any of the universal meta-data that is stored in the report.config list.
  • The Report_Content.R source code provides child RMD file information also stored as a list object. This meta-data includes the names of child files that all reports will likely contain (at a minimum). This object is given the name rmd.files and will be available for manipulation upon using source(...) to load it into the working R environment.
    • File name and order information is stored as the file name only (not a file path). The file paths are constructed based on the universal and custom child document paths provided in the report.config object (report.config$params$unvrsl.rmd.path and report.config$params$custom.rmd.path).
    • Customized .Rmd documents based on “universal” child documents should be named identically if they are meant to replace the universal version entirely. If they are meant to supplement or are entirely novel, they should be named uniquely and their position in the report noted in the final rmd.files$file.order element of the list.
    • Other information about the file order/content is also included here. For example, if file order should be different for the pagedown output (so as to optimize page breaks or table placement), this can be specified here. Or if the documents include citations that should be included in a “References” section.
    • Supplemental appendices are included in a separate “branch” of the list than the main report.

These two lists that are subsequently used by functions in the Literasee package to create a “NCIEA” themed report and website. See the “NCME_2022_Project/Documentation” subdirectory for example scripts of how to source(...) these two documents to create the master lists for the “Demonstration_COVID” toy data analysis and render the report/website.

Reporting Step 4: Parameters and configurations

In this step we set up report configuration and content lists. This means we specify any necessary meta-data and parameters required to run the report and create/customize/complete the required YAML and RMD files that produce the report(s).

This step assumes the user is operating with their working directory set to “NCME_2022_Project/All_States/State_A/Documentation”.

# setwd("./Documentation")
##  Load `Literasee` package
require(Literasee)
##  Locate the "Universal_Content" directory
universal.content.path <- file.path("..", "..", "..", "Universal_Content")

Load existing Report_Data and Report_Analyses objects from steps 2 and 3.

if (!exists("Report_Data")) load("../Data/Report_Data.Rdata")
if (!exists("Report_Analyses")) load("../Data/Report_Analyses.Rdata")

Report configurations - data and meta-data.

A custom.config list is created to supply project specific information. It can also be used to override some of the Universal settings (authors, etc.) We create a “master” config list in the “NCME_2022_Project/Universal_Content/Meta_Data” directory, and can merge the two together (prioritizing the elements in the custom list).

In this project, the config list is comprised of three sub-lists: params, client.info and top.level information.

The params object is given special treatment in rmarkdown. An object of this name can be passed to R and executed internally either by creating a named list and passing it in as an argument to rmarkdown::render or by creating a section if the report yaml with the proper structure.

The individual elements of the params can be any valid R data type. Character strings can be used to provide text (from single words to larger chunks), index a list, be fed into a functions argument, etc. The same is true for numeric values. A list object could be passed in as well to provide several typed of data for a specific task. The params specified here are used to run further analyses on the Report_Data, provide logic to execute code/text chunks conditionally, and supply required state-specific meta-data.

An exhaustive list of the report should be kept to ensure all params are defined. In the list below, we define only a few of the possible parameters, and the unspecified ones are filled in internally in the params.Rmd script, which is run at the beginning of the report rendering process.

This list is semi-exhaustive of what can be supplied to the .Rmd.

params = list(
  state.name = "State A", # required at a minimum
  state.abv = "S.A.",
  state.org = "State A Department of Education",
  state.org.abv = "SADoE",
  draft = TRUE, # NULL to remove draft status
  draft.text = "DRAFT REPORT -- DO NOT CITE", # default if `draft`=TRUE
  keyword = "academic impact", # lower case. Camel applied as needed or can be customized as keyword_camel
  imputations = FALSE,
  min.size.school = 15,  #  N size cutoff for inclusion in summaries/analyses
  min.size.district = 50, # N size cutoff for inclusion in summaries/analyses
  sgp.abv = list( # `SGP` abbreviation for accessing `SGPstateData` meta-data.
    State_Assessment = "State_A"#,
    # College_Entrance = c(),
    # ELP_Assessment = c(),
    # Interim_Assessment = c()
  ),
  sgp.name = list(
    State_Assessment = "State A"
  ),
  test.name = list(
    State_Assessment = "A+ State Assessment Program"
  ),
  test.abv = list(
    State_Assessment = "ASAP"
  ),
  test.url = list(
    State_Assessment = "https://centerforassessment.github.io/SGPdata/"
  ),
  code.url = list(
    State_Assessment = "https://github.com/CenterForAssessment/NCME_2022_Project/tree/main/All_States/State_A/Initial_Data_Analysis"
  ),
  gof.path = list(
    State_Assessment = file.path("..", "Initial_Data_Analysis", "Goodness_of_Fit")
  )
)

The client.info and top.level lists are all text elements that need to be customized for each state. The client.info section is used exclusively in the nciea_report theme in this demonstration project. The top.level list includes the information the report such as the title, authors, etc. that will be included in the report front-matter.

client.info = list(
  state.name = "State A", # required at a minimum
  state.abv = "D.C.", # for cover page, not SGPstateData
  city.name = "Washington",
  organization = "State A Department of Education",
  org.head = "Secretary Miguel Cardona",
  github.repo = "https://github.com/CenterForAssessment/NCME_2022_Project/tree/main/All_States/State_A",
  acknowledgements = "the entire staff of the SADoE Assessment and Accountability Office, and particularly Maggie Q. Public,"
)

# Title/subtitle, author.names, author.affil, date
top.level = list(  #  Title/subtitle, author.names, author.affil, date
  title = "Academic Impact in State A",
  subtitle = "Student Achievement and Growth during the COVID-19 Pandemic"
)

A custom.files list supplies a list of the child document. That is, it defines what content will be included, excluded or customized in the report. As with the custom.config list, we create a “master” content (child) list in the “NCME_2022_Project/Universal_Content/Meta_Data” directory, and can merge the two together (prioritizing the elements in the custom list).

In this project, the content list is comprised of two main sub-lists: report, and appendices. The report list below is identical to the one in the master Report_Content.R script and references an individual child .Rmd file in the “NCME_2022_Project/Universal_Content/rmarkdown/Child_RMD” directory. One can re-order these files to change the ordering in the report. It is also possible to edit/modify the “universal” child documents and keep the new version in a separate directory (typically “State_X/Documentation/assets/rmd/Custom_Content”).

##  List and order of child .Rmd files to be used in report/appendices
##  The first two must be supplied and appear at the top of the report.
report = list(
  file.order = c(
    "setup.Rmd", # REQUIRED! - load R packages, declares knitr configs, etc.
    "params.Rmd",# REQUIRED! - check/add to `params` supplied by list or yaml
    "0_Abstract.Rmd",
    "1_Intro__Overview.Rmd",
    "1_Intro_Background.Rmd",
    "1_Intro_Methods.Rmd",
    "1_Intro_Data_Sources.Rmd",
    "2_Participate__Analysis.Rmd",
    "2_Participate__Overview.Rmd",
    "2_Participate_Counts.Rmd",
    "3_Impact__Overview.Rmd",
    "3_Impact_Achievement_Analysis.Rmd",
    "3_Impact_Achievement_Overview.Rmd",
    "3_Impact_Growth_Analysis.Rmd",
    "3_Impact_Growth_Overview.Rmd",
    "4_Summary.Rmd"
  ),
  references = TRUE # Are references/citations used?
)

Besides adding/reordering Rmd files though custom.files, one can request a subset of files. This will result in a truncated report, allowing section editing/development. You always need to include setup.Rmd and params.Rmd!

This report contains several appendices. The first shows detailed visualizations of academic impact. The second is a detailed description of the data preparation and analysis (created directly from the R code!). Third is the appendix you are reading now (also generated directly from R code ) regarding how to create flexible, reproducible reports. The final appendix is simply the R session information detailing the machinery behind the analysis and reporting.

appendices = list(
  A = list(
    title = "Academic Impact Overview",
    file.order = c(
      "params.Rmd",
      "setup_impact_overview_appendix.Rmd",
      "Appendix_Impact_Intro.Rmd",
      "Appendix_Impact_Grade_Level_State.Rmd"
    ),
    references = NULL
  ),
  B = list(
    title = "Initial SGP Analysis",
    file.order = c(
      "params.Rmd",
      "setup_sgp_appendix.Rmd",
      "Appendix_SGP_Analysis.Rmd"
    ),
    references = NULL
  ),
  C = list(
    title = "Impact Report Generation",
    file.order = c(
      "Appendix_Impact_Report_Generation.html"
    ),
    references = NULL
  ),
  R = c()
)

Combine report meta data and generate .Rmd parents

The following script will merge the report.config (universal) and custom.config lists and return ‘report.config’.

custom.config <- list(client.info = client.info, top.level = top.level, params = params)
source(file.path(universal.content.path, "Meta_Data", "Report_Configs.R"))
rm(params) # remove this to prevent conflicts with `render` later.

Now merge the rmd.files (universal) and custom.files lists and return ‘rmd.files’ to be used in next steps. The custom.files will override defaults if they exist in “assets/rmd/Custom_Content”.

custom.files <- list(report = report, appendices = appendices)
source(file.path(universal.content.path, "Meta_Data", "Report_Content.R"))

With the combined meta-data, we can now create the .yml and .Rmd “master/parent” documents for a nciea_report and/or bookdown site. These scripts can also be used as “skeletons” for other reports (such as working drafts, alternate output formats/templates, etc.)

createReportScripts(report_config=report.config, rmd_file_list=rmd.files)
##  Save report YAML and file configurations
save(list=c("report.config", "rmd.files"), file = "Report_Configuration_MetaData.rda")
# setwd("..")

Reporting Step 5: Produce report and appendices

Now the fun begins! In this step we (finally!) generate reports using the results from the analyses we ran and the information we gathered and organized in the four previous steps.

This step assumes the user is operating from different working directories depending on the report we are generating. At this point in the demonstration, we can consider this step a “choose your own adventure” story. On the one hand, we can start at “NCME_2022_Project/All_States/State_A/Documentation” if all the prior steps have been run together already and simply (re)load the necessary data and results objects saved from steps 2 and 3 before moving on to the “Initial_Data_Analysis” report generation. On the other hand, the code is set up to where we could just start here without having run anything yet. That is, the R code and Rmd files are set up to run the analysis code and generate reports automatically afterwords.

We begin by creating appendix B, which pertains to the initial SGP analyses. With those data and results available in the R workspace, we proceed with the report analyses and reporting (generated simultaneously by spining and evaluating steps 2 through 4 (steps 1 and 5 can/should be done separately). Finally we create an academic impact overview appendix and then combine all the main report child .Rmd files into the final report(s).

# setwd("./Documentation")
##  Load `Literasee` package
require(Literasee)

##  Locate the "Universal_Content" directory
universal.content.path <- file.path("..", "..", "..", "Universal_Content")
closet <- "../../../Universal_Content/rmarkdown/closet/"

##  Load existing `Report_Data` and `Report_Analyses` objects from steps 2 and 3.
##  Or... Skip the data loading and proceed to the next step to re-run while reporting.
# if (!exists("Report_Data")) load("../Data/Report_Data.Rdata")
# if (!exists("Report_Analyses")) load("../Data/Report_Analyses.Rdata")

##  Create directories if needed for results
if(!dir.exists("../Documentation/report"))
        dir.create("../Documentation/report", recursive = TRUE)

Initial data analysis and reporting

One intent of this demonstration has been to show how reporting can be automated, and one way this can be done is to combine the (final) data analysis and reporting into a single step. The R code in the “Initial_Data_Analysis” directory has been set up so that a report can be generated from it directly (here to be used as an appendix to the final report). See this link for details on how to spin your goat hair R code into a knit’d report.

We will first construct a parent .Rmd file from pieces provided in the closet full of skeletons. In order to evaluate the code while generating a report, it seems to be necessary to work from the “Initial_Data_Analysis”. The documentation itself is saved where it can be accessed easily in the final report generation.

We will first use a simple, clean template called “working paper”.

setwd("../Initial_Data_Analysis/")

writeLines(c("---", readLines(paste0(closet, "skeleton_sgp_ida_wp.yml")),
             "---", readLines(paste0(closet, "skeleton_sgp_ida.Rmd"))),
           "tmp_file.Rmd")

rmarkdown::render(input="tmp_file.Rmd",
    output_file="../Documentation/report/Appendix_SGP_Analysis_WP.html")
frm.tf <- file.remove("tmp_file.Rmd")

Render again in nciea_report theme

In the final part of step 4, we generated five .Rmd scripts for generating reports with the nciea_report template from the Literasee package we maintain on GitHub. These are all renderable from the “Documentation” directory, with the exception of the initial SGP analysis appendix. Again, since it is rendering R code directly and not just being rendered as text (i.e. eval=TRUE is assumed in the code as it is interpreted) we need to run the analyses and reporting from the “Initial_Data_Analysis” directory. Doing this breaks many of the relative paths in the automatically generated script.

Just as we did for the draft version above, we will first construct a Rmd parent and then render it.

writeLines(c("---", readLines(paste0(closet, "skeleton_sgp_ida_nciea.yml")),
             "---", readLines(paste0(closet, "skeleton_sgp_ida.Rmd"))),
           "tmp_file.Rmd")

rmarkdown::render(input="tmp_file.Rmd",
        output_file="../Documentation/report/Appendix_SGP_Analysis.html")
frm.tf <- file.remove("tmp_file.Rmd")

Academic impact analyses (and reporting)

Now that we have the initial data analyses (re)run we can do another round of simultaneous analysis and report generation. This time we will create the “C” appendix, which is probably what you are reading right now. This again requires us to render the report from a specific directory - this time the “Documentation” directory (which is where we assumed all of the report data set up, analyses, etc. where to be conducted).

In the “Appendix_Impact_Report_Generation.Rmd”, the yaml has been written to create reports in both the “working paper” and “nciea report” themes. We will run them concurrently and generate both formats from the same call to render.

setwd("../Documentation/")

##  Copy Rmd to the "Documentation" directory to match paths in R code.
fcp.tf <- file.copy(
    "assets/rmd/Custom_Content/Appendix_Impact_Report_Generation.Rmd", ".")

rmarkdown::render(
    input = "Appendix_Impact_Report_Generation.Rmd",
    output_file = c("AIRG_WP.html", "Appendix_Impact_Report_Generation.html"),
    output_format = "all", output_dir = "report")

frm.tf <- file.remove("Appendix_Impact_Report_Generation.Rmd")

First draft of final report

We have delayed satisfaction long enough! We now compile our first draft of the final report along with the appendices we just generated. For this we will continue to use the “working paper” template.

We will use a similar process to the one above to assemble a parent document for the draft output. The draft is then rendered and saved in “report” directory and converted to a pdf there using pagedown::chrome_print.

writeLines(c("---", readLines(paste0(closet, "skeleton_main_wp.yml")),
                  readLines(paste0(closet, "skeleton_main_params.yml")),
             "---", readLines(paste0(closet, "skeleton_main_alt.Rmd"))),
           "report/tmp_file.Rmd")

rmarkdown::render(input="report/tmp_file.Rmd",
                  output_file="State_A_Academic_Impact_Analysis_WP.html")
frm.tf <- file.remove("report/tmp_file.Rmd")
pagedown::chrome_print("report/State_A_Academic_Impact_Analysis_WP.html")

Alternate pagedown templates

There are a handful of templates within the pagedown package one can use to format reports, and the pagedreport package offers three packaged report templates that one can use. We can try one out here with our skeletons.

writeLines(c("---", readLines(paste0(closet, "skeleton_main_pgdrpt.yml")),
              readLines(paste0(closet, "skeleton_main_params.yml")),
             "---", readLines(paste0(closet, "skeleton_main_alt.Rmd"))),
           "report/tmp_file.Rmd")

rmarkdown::render(input="report/tmp_file.Rmd",
        output_file="State_A_Academic_Impact_Analysis_PgdRpt.html")
frm.tf <- file.remove("report/tmp_file.Rmd")
pagedown::chrome_print("report/State_A_Academic_Impact_Analysis_PgdRpt.html")

This report looks pretty nice right out-of-the-box. There are some formatting that would need to be cleaned up, and possibly some customization that could be done without too little effort. One thing to point out is that appendices would all need to be rendered separately or included as additional sections at the end of the paper. A common complaint seen (or at least requested feature) is that users can not add prefixes to page, figure or table numbers (e.g., “Figure A1.”, Table B-2, etc.). These issues have been worked out in the nciea template, and improvements/workarounds are underway in the “working paper”. ## Academic impact overview appendix

We will now turn to an appendix with detailed plots of the academic impact in State A. These plots contain quite a bit of information and were formatted originally to be 11”x17” pdfs. The code for this appendix takes the R graphical objects and converts them to svg images (pdf and html do not play well together) before adding them to the report in a portrait layout. Once rendered and printed to a preliminary PDF report, those pages are then rotated 90 degrees to a landscape layout using the qpdf package.

Note that for this appendix we can use the automatically generated Rmd script created in step 4 without modification.

source("../../../Universal_Content/Functions/covidImpactSubVisualization.R")
rmarkdown::render("report/Academic_Impact_Overview_APPENDIX_A.Rmd")
pagedown::chrome_print("report/Academic_Impact_Overview_APPENDIX_A.html",
                       output = "report/Temp_ApdxA.pdf")
#  Need to remove - seems to mess up attempts to render the `bookdown` site ...
unlink(file.path("report", "_bookdown.yml"))

Adjust academic impact plots produced internally (from GROBS)

##  Manually locate the pages to rotate
rpt.list <- list(
  FRONT = 1:3,      # frontmatter - title page, TOC, any intro, etc.
  ELA = 4,          # Any ELA specific section (text, tables, etc.)
  MATHEMATICS = 11, # Any math specific section (text, tables, etc.)
  BACK = NULL       # backmatter - discussion, conclusions, etc.
)

base.pdf <- "report/Temp_ApdxA.pdf"

##  Rotate landscape pages.
all.pages <- seq(qpdf::pdf_length(base.pdf))
pages.to.rotate <- setdiff(all.pages, unlist(rpt.list, use.names = FALSE))

qpdf::pdf_rotate_pages(
  input = base.pdf,
  output = "report/Academic_Impact_Analyses_APPENDIX_A.pdf", # Must rename
  pages = pages.to.rotate, angle = 90,
)

frm.tf <- file.remove(base.pdf)

Final report draft

We now have all the pieces in place to run the final report. We do not need to modify the parent script createReportScripts produced. This is now the easy part! We then add in final appendix, which displays the R session information (the user’s system specifications, package versions, etc.).

rmarkdown::render("report/State_A_Academic_Impact_Analysis.Rmd")

here <- getwd()
rmarkdown::render(
    "../../../Universal_Content/rmarkdown/Child_RMD/APPENDIX_R_NCIEA.Rmd",
    output_file = paste0(here, "/report/Appendix_Academic_Impact_R.html"))

Produce PDF reports for the remaining files

pagedown::chrome_print("report/State_A_Academic_Impact_Analysis.html")
# pagedown::chrome_print("report/Appendix_A.html") # Done above with rotation
pagedown::chrome_print("report/Appendix_SGP_Analysis.html")
pagedown::chrome_print("report/Appendix_Impact_Report_Generation.html")
pagedown::chrome_print("report/Appendix_Academic_Impact_R.html")

Creating a bookdown website

It is also possible to create a bookdown website using the same Rmd child documents as those used in the pagedown based reports. As part of the createReportScripts function ran in 4_Make_Configs.R, the required yaml and index.Rmd files were created. Given the complexity of some of the appendix generation, it is advised that these scripts be modified to remove those from the _bookdown.yml file first. It may also be necessary to copy some assets (such as figures, plots, css, etc.) from “Universal_Content” to the state “Documentation” directory.

Begin by copying the PDF reports to the “site” directory for download links and then rendering the website.

# if (!dir.exists(file.path("site", "downloads")))
#         dir.create(file.path("site", "downloads"), recursive = TRUE)
#
# file.copy(c(file.path("report", "State_A_Academic_Impact_Analysis.pdf"),
#             file.path("report", "Academic_Impact_Analyses_APPENDIX_A.pdf"),
#             file.path("report", "Appendix_SGP_Analysis.pdf"),
#             file.path("report", "Appendix_Impact_Report_Generation.pdf"),
#             file.path("report", "Appendix_Academic_Impact_R.pdf")),
#           file.path("site", "downloads"), overwrite = TRUE)
#
# bookdown::render_book(".", "bookdown::gitbook") # delete all appendices from yaml
#
# # Serve the site directory on a local host to see the results:
# servr::httw(dir = "site", watch = "site", port=4224)
# servr::daemon_stop()

Bonus Report: The paper from this session

The paper that accompanies this demonstration is built from the README.md files sprinkled throughout the repository. Here’s how I put it together:

## Everything is meant to be run from the "report" directory...
R.utils::copyDirectory(from = "assets/js", to = "../assets/js")
R.utils::copyDirectory(from = "assets/images", to = "../assets/images")
rmarkdown::render("Flexible_Report_Generation.Rmd")
pagedown::chrome_print("Flexible_Report_Generation.html",
                       output = "report/Flexible_Report_Generation.pdf")

# setwd("..")

The End

Hope that gets you started! There is a lot to dig into and we will do our best to help you out and answer any questions.