Why Be Flexible?
Data analysts are often tasked with writing reports that describe data, analyses,
and results associated with a project. Depending upon the nature of the project,
such reports are either completely customized or borrow heavily from other reports
(e.g., annual reports). GitHub repositories and associated GitHub actions can be
used to coordinate the writing as well as the production of final reports for
dissemination. The process we demonstrate utilizes R, R Markdown, and several
associated R packages as the base tools to construct these reports.
NCME_2022_Project Purpose
The concept of “flexibility” in report generation can be applied in numerous ways. It may mean generating multiple format reports (e.g., websites and PDF document), setting up workflows that can be used in various settings or for any number of clients, or combining analytics and documentation into a seamless process. In building this GitHub repository for our demonstration, we have tried to condense the lessons we have learned on being flexible in our processes of data analytics and documentation. We outline five steps that help us get from raw data to reporting results in this demonstration.
First a well structured working environment should be set up, including a uniform
directory structure with any required external assets and resources, as well as
easily generalized R command and function scripts. Second, the data required to
generate the report must be compiled in a consistent data-object structure. Third,
any time-consuming and/or state-specific external analyses should be conducted
and thoroughly reviewed, and also compiled in a consistent data-object structure.
Fourth, all the required meta-data and report content information is compiled
from generic (Universal) and custom sources. Lastly, the desired report formats
are generated using the appropriate R functions.
This repository is a template for the workflow that we have used over the past year in our efforts with multiple states to begin investigating the academic impact of the Covid-19 pandemic. In these efforts we attempted to create generalizable data formats, standardized methods for analysis and universal content for reporting. This repo serves as an “all-in-one” representation of how we structured our efforts across the various states. Combining the multiple stages and components of these projects into a single repository is helpful in that users do not need to navigate multiple repositories as we have. However, it does mean that this this repo is quite complex in and of itself - sorry :wink:
There are detailed README files in many of the sub-component directories of this repo that give more detailed information about their contents. The main components located in this top directory are:
All_States
This component contains the “state” specific data analysis and report generation content. Although this project has been framed as a workflow across multiple states, this could be envisioned in other ways where many projects resemble each other, but separation is required: school level analysis and reporting, different branches of a simulation study, annual analysis/reporting within a single organization, technical reports, etc.
Each “state” has its own sub directory that houses “Initial_Data_Analysis”, “Report_Analyses”, “Data” and “Documentation” directories. These represent the various stages of our analysis and reporting efforts, and are typically located in different areas of our work environments.
- Initial_Data_Analysis represents the framework used for typical annual data
analysis used to clean, prepare and calculate Student Growth Percentiles (SGPs)
for the states we work with. This typically includes:
- confidential student data (housed securely)
Rscripts (shared openly on Github)
- Report_Analyses contains the
Rscripts used for each state after the initial calculation of SGPs. That is, additional analyses that were carried out in our efforts to investigate academic impact. Typically this is included in the “Documentation” directory/repo, but placed here for emphasis/differentiation. - Data is where we keep specifically formatted student data and results from the impact related analyses. Again, housed outside of any Github repo because it contains confidential data.
- Documentation includes all the R Markdown based code, content and assets for generating reports. This is typically a separate Github repo (with final reports provided to clients, not included in the repo). This is the heart of the “Flexible Report Generation” portion of the NCME Demonstration session.
The data, R code and reports obviously are not typically stored together like
this for confidentiality and other considerations. This demonstration uses the
simulated student data (sgpData_LONG_COVID) from the
SGPData package.
Universal_Content
The ability to generate reports flexibly and automatically with data and analytic results from multiple sources requires the identification of what content is universal and what must be customized to meet specific circumstances and situations. By “universal”, we mean elements that can (and often should) be used in all cases and updates or improvements are applied consistently. In our experience, every report begins as a fully custom report. As the process is repeated over and again, the pieces that are common to all become apparent and moved to an external source where they can be shared and accessed.
This is true of R code as well, as spaghetti code morphs into custom functions,
and then formal functions and packages. Whether talking about text or code, the
use of universal “parameters” also applies - small bits of information that define
how the results are rendered, which are universal in their requirement and
application, but their specification usually depends on the context or use case.
The contents of this directory represents what has been distilled into universal components.
- Functions - As we create
Rcode to do specific tasks and then re-use that code in other areas, we find it necessary to formalize the code into a function. Here we have examples of that where, rather than having the same chunk(s) of code to add simulated academic impact into each state’s branch of the repo, we create a function (addImpact) that can be applied universally to the data.- Since the function is sourced in from this directory, any updates, improvements or bug fixes made to it are applied to each analysis (when it is re-run).
- The function could be added into a package after it is tested and proven in a project like this.
- rmarkdown - This is where we store the common bits and blobs of text and
other code snippets (to produce tables, plots, etc.), as well as other assets
like css and JavaScript code, HTML templates, etc. used to format reports.
- The Rmd files are used as “children”, which allows the task creating the report then to be assembling the parts (children) into the desired combination and order in a master (parent) document.
- The css, JavaScript code, etc. in the “templates” subdirectory are structured
in a way that is typically used (required?) in some packages commonly used to
render R Markdown into documents (e.g.,
rmarkdown,bookdown,pagedownor ourLiteraseepackage).
Custom_Content
The other side of “Universal_Content” is “Custom_Content” - often we need specific tools for special cases. The “Custom_Content” directory here serves more as a placeholder or template for these components. They really belong in the “State_*” directories (usually put in the “Documentation/assets/rmd” subdirectory). However, the custom content for one project can often serve as a good template for others (similar to, but not quite, universal components). We include this element here for emphasis and differentiation.
Five steps to report generation
Generation of multiple format reports (e.g., a bookdown website and a pagedown
PDF document) can be generally conducted in five steps. These steps are lined out
generically in the .R scripts included in this repo.
First a consistent working environment should be set up, including a directory
structure with all the required external assets and libraries, as well as the R
command and function scripts included in this repo. Second, the data required to
generate the report must be compiled in a consistent data-object structure. Third,
any time-consuming and/or state-specific external analyses should be conducted
and thoroughly reviewed, and also compiled in a consistent data-object structure.
Fourth, list-objects that contain all the required meta-data and content indices
are compiled from generic (Universal) and custom sources to create dual-format
configuration scripts. Lastly, the desired formats are generated using the
appropriate R functions.
Included files
- The
1_Repo_Setup_and_Maintenance.Rfile containsRcode from which requiredRpackages can be installed or updated, and other assets can be copied into the report directory.- A script,
Universal_Content/Meta_Data/Report_Packages.R, is available to both document and help install/update anyRpackages required for analyses to be run and the report(s) generated. - The “Documentation” directory for each branch (here “State_A and”State_B”)
can be be set up using the
setupReportDirectoryfunction. This function pulls in assets from theLiteraseepackage.- These include
css,javascript,pandocandRmarkdown(.Rmd) assets theLiteraseepackage needs to create a “NCIEA” themed report and website. - Templates for custom child.Rmd files can also be added to the directory
(available from the
Universal_Contentrepo/submodule). Alternatively, template custom content can be copied over from another state/project directory.
- These include
- Changes/updates/upgrades to these assets in the
Literaseepackage can be pulled into the “Documentation” directory using theupdateAssets()function.
- A script,
- The
2_Report_Data.Rscript runs all formatting, cleaning and subsetting required to compile all data sources into a single dataset that will be used at report run-time (rendering).- Create/format/alter/augment one or more raw data sets including
State_Assessment,College_Entrance,ELP_Assessmentand (potentially multiple)Interim_Assessmentdata objects. - The compiled data must be a named
listobject calledReport_Data, saved in a “Documentation/Data/” directory (NOT included in the Github repo!).
- Create/format/alter/augment one or more raw data sets including
- The script
3_Report_Analyses.Ris meant to house report-specific analyses that can be re/run before the report is compiled and may take an inordinate amount of time to run or requires extensive evaluation and investigation before inclusion in the report.- These analyses may be universal enough to run for all states, or may be unique enough for each state that the analysis is customized for each state.
- Ideally each of the data sources (
State_Assessment,College_Entrance,ELP_AssessmentandInterim_Assessment) will have the same or similar analysis types (e.g.,participation,multiple_imputation,academic_impact, etc.). - Some analyses may be short and included directly within child RMD scripts,
while others may be placed externally in the “Report_Analyses” directory.
The compiled analysis results are stored in a named
listobject calledReport_Analysis, which is saved in a “Documentation/Data” directory (NOT included in the GitHub repo!).
- Generation of the multiple format reports (e.g., a
bookdownwebsite and apagedownPDF document) typically depends on different types of configuration scripts that list the child documents to knit together: _bookdown.yml and index.Rmd forbookdown, and a parent .Rmd file (e.g., STATE_X_Academic_Impact_Analysis.Rmd) forpagedown.- The child documents can be generic (i.e.
Universal_Content) or customized/novel content. The4_Make_Configs.Rscript creates custom configuration lists that identify 1) state specific meta-data and report parameters and 2) a list of any custom Rmarkdown content to be used in place of, or in addition to, the universal report content. - These configuration and content lists are then combined with the
Universal_Contentconfiguration and content lists bysource(...)’ing the R scriptsUniversal_Content/Meta_Data/Report_Configs.RandUniversal_Content/Meta_Data/Report_Content.R. - The scripts are set up to give priority to the custom content, so that generic
elements can be easily overridden. The combined custom and universal information
listobjects are then used by functions in theLiteraseepackage to create the YAML and RMD files that control the report generation output.
- The child documents can be generic (i.e.
- The
5_Make_Report.Rfile containsRcode to render the report using thepagedowntemplates (and workable scripts to create abookdownwebsite).
Reporting Step 1: Create (and maintain) report directory
Install and update R packages required to run analyses and create report.
This step may also require us to copy any assets, custom content templates,
and other R scripts contained in the “Universal_Content” directory.
# Set R working directory to the Documentation folder
# setwd("./Documentation")
# Locate the "Universal_Content" directory
universal.content.path <- file.path("")
# Install/update packages used in the report
source("../../../Universal_Content/Meta_Data/Report_Packages.R")
### Load packages required for report setup
require(Literasee)
template.path <- file.path("../../../Custom_Content/assets/Child_RMD")
setupReportDirectory(custom.content.path = template.path)
# It may be necessary to occasionally update Literasee package assets.
# updateAssets(asset.type=c("css", "js", "pandoc", "rmd", "images"))
# copy additional assets from the Universal_Content directory
R.utils::copyDirectory(to = "assets/fonts",
from = "../../../Universal_Content/rmarkdown/assets/fonts")
# setwd("..")Reporting Step 2: Report_Data
In the data preparation and cleaning step of the the academic impact analysis,
we create a Report_Data object with data (sub)sets from various available
sources (including the statewide, English language proficiency, college entrance
and/or interim assessments). We format, alter, augment the available/desired
data and store it in a single named list object (Report_Data) that can be
passed to 1) standardized and customized analyses and 2) the report generating
code/process. In some cases this object can also be used used to create the
report params.
This step assumes the user is operating with their working directory set to “NCME_2022_Project/All_States/State_A/Documentation”.
# setwd("./Documentation")Load packages and custom functions.
The SGP package is required for the data analysis with simulated data.
require(SGP)
require(data.table)
require(cfaTools)State_Assessment
In this data simulation and reporting exercise we are using State_A_SGP
directly from the “Initial_Analysis” step. Typically we would need to load
data from external sources at this stage (e.g. ‘State_A_SGP.Rdata’ that would
have been saved/output in the “Initial_Data_Analysis/State_A_Baseline_SGP_Analyses.R”
script.
Here we simply copy the data from the SGP object and name it State_A_Data.
State_A_Data <- copy(State_A_SGP@Data)Data loading is typically followed by an initial subsetting to select only the report-relevant information (years, content areas and grades to be reported, variables used in report analyses, student/organization/other exclusion criteria, etc.). This process has already been carried out in the “Initial_Data_Analysis/State_A_Data_LONG.R” script.
Create lagged variables
Lagged scale and standardized score variables (and their associated achievement levels) are required for our academic impact analyses. Here we first create a standardized scale score variable that uses 2019 means and standard deviations to create a referenced standardization. This prevents us from “washing out” impact in 2021 that would happen if we standardized by year. We also standardize the scores by content area and grade level.
The SGP analyses already include some version of these lagged and/or standardized
variables, however, these are only included for students for whom growth was
calculated. This means that students with missing prior scores (and potentially
others such as students with repeated/accelerated grade progressions) would
not have these data in some cases. The following code chunk creates these
variables.
## Standardize SCALE_SCORE by CONTENT_AREA and GRADE using 2019 norms
State_A_Data[, SCALE_SCORE_STANDARDIZED :=
Z(.SD, "SCALE_SCORE", reference.year = "2019"),
by = list(CONTENT_AREA, GRADE),
.SDcols = c("YEAR", "CONTENT_AREA", "GRADE", "SCALE_SCORE")]
# Need to run this again here to get SCALE_SCORE_PRIOR_*YEAR
# Seems to get deleted in abcSGP - names are too close! Something to look into.
# getShiftedValues DOES NOT add in 2020 (YEAR completely missing from data)
# We need to add in this information with a small data.table - `missing_data`
missing_data <- data.table(YEAR = "2020",
GRADE = c(3:8, 3:8),
CONTENT_AREA = c(rep("ELA", 6),
rep("MATHEMATICS", 6)))
State_A_Data <- rbindlist(list(State_A_Data, missing_data), fill=TRUE)
shift.key <- c("ID", "CONTENT_AREA", "YEAR", "GRADE", "VALID_CASE")
setkeyv(State_A_Data, shift.key)
getShiftedValues(State_A_Data,
shift_amount = c(1L, 2L),
shift_variable = c("SCALE_SCORE",
"SCALE_SCORE_STANDARDIZED",
"ACHIEVEMENT_LEVEL"))
# Check initial agreement between `SGP` and `shift` generated lagged variables:
# table(State_A_Data[YEAR==current.year,
# ACHIEVEMENT_LEVEL_PRIOR, ACHIEVEMENT_LEVEL_LAG_2], exclude=NULL)
# Clean up - remove 2020 dummy data and rename according to old conventions
State_A_Data <- State_A_Data[YEAR != '2020']
setnames(State_A_Data, gsub("LAG_1", "PRIOR_1YEAR", names(State_A_Data)))
setnames(State_A_Data, gsub("LAG_2", "PRIOR_2YEAR", names(State_A_Data)))
## Fix 2021 Lags for Grades 3 & 4 - repeaters:
# table(State_A_Data[, YEAR, is.na(SCALE_SCORE_PRIOR_2YEAR)], exclude=NULL)
# table(State_A_Data[, GRADE, is.na(SCALE_SCORE_PRIOR_2YEAR)], exclude=NULL)
State_A_Data[GRADE %in% c(3, 4) | YEAR %in% c("2016", "2017"),
SCALE_SCORE_PRIOR_2YEAR := NA]
State_A_Data[GRADE %in% c(3, 4) | YEAR %in% c("2016", "2017"),
SCALE_SCORE_STANDARDIZED_PRIOR_2YEAR := NA]
State_A_Data[GRADE %in% c(3, 4) | YEAR %in% c("2016", "2017"),
ACHIEVEMENT_LEVEL_PRIOR_2YEAR := NA]
State_A_Data[GRADE == 3 | YEAR == "2016",
SCALE_SCORE_PRIOR_1YEAR := NA]
State_A_Data[GRADE == 3 | YEAR == "2016",
SCALE_SCORE_STANDARDIZED_PRIOR_1YEAR := NA]
State_A_Data[GRADE == 3 | YEAR == "2016",
ACHIEVEMENT_LEVEL_PRIOR_1YEAR := NA]Create other variables
Through the course of analyzing academic impact, we may find there are additional variables that need to be created often (i.e. used in multiple analyses or in both an analysis and the report generation). In these cases it is best to create those variables here so that the process is standardized across all use cases.
For example, we may need a variable that converts the ACHIEVEMENT_LEVEL and/or
ACHIEVEMENT_LEVEL_PRIOR variable into a dichotomous “Proficient/Not Proficient”
variable. The following example creates those variables such that they are
consistent not only in “State_A” data for use across analyses/reports, but also
consistent across states so that variables with the same naming conventions
could be used in the same R code or custom functions.
State_A_Data[, ACHIEVEMENT_ProfandAbove := fcase(
ACHIEVEMENT_LEVEL %in% c("Partially Proficient", "Unsatisfactory"), "Not Proficient",
ACHIEVEMENT_LEVEL %in% c("Advanced", "Proficient"), "Proficient")]
State_A_Data[YEAR == "2021", ACHIEVEMENT_LEVEL_PRIOR := ACHIEVEMENT_LEVEL_PRIOR_2YEAR]
State_A_Data[YEAR != "2021", ACHIEVEMENT_LEVEL_PRIOR := ACHIEVEMENT_LEVEL_PRIOR_1YEAR]
State_A_Data[, PRIOR_ACHIEVEMENT_ProfandAbove := fcase(
ACHIEVEMENT_LEVEL_PRIOR %in% c("Partially Proficient", "Unsatisfactory"), "Not Proficient",
ACHIEVEMENT_LEVEL_PRIOR %in% c("Advanced", "Proficient"), "Proficient")]Remove redundant/replaced variables
In this data preparation and formatting step, we have created some variables that are now redundant or only needed in the variable creation process. We now do one last data cleaning step to make sure we are only saving the variables that we will need/want.
State_A_Data[, c("SCALE_SCORE_PRIOR",
"SCALE_SCORE_PRIOR_STANDARDIZED",
"ACHIEVEMENT_LEVEL_PRIOR",
"SGP_NORM_GROUP_BASELINE",
"SGP_NORM_GROUP_BASELINE_SCALE_SCORES") := NULL]In this exercise, we are only using a simulated data source that mimics a statewide assessment (i.e. one given annually for accountability purposes). The process outlined above could also be applied to ELP, college entrance, interim or any other data source a state has. If the report analysis and reporting code is set up to accommodate these multiple data sources, then identical reporting can be done for each of these sources as well.
Combine all data and save
Finally we will combine all the data sources that will be used in the report
analyses and reporting. Note that, regardless of the state/consortium/organization
from which the data comes, we name the data object Report_Data. This allows
us to flexibly and automatically apply R code and functions to this data
source in a standardized way. In this way, Report_Data is a required parameter
for this exercise.
Report_Data <- vector("list", 4);
names(Report_Data) <- c("State_Assessment", "College_Entrance", "ELP_Assessment", "Interim_Assessment")
Report_Data[["State_Assessment"]] <- State_A_Data; rm(State_A_Data)
if (!dir.exists(file.path("..", "Data"))) dir.create(file.path("..", "Data"))
save(Report_Data, file = file.path("..", "Data", "Report_Data.Rdata"))
# setwd("..")Reporting Step 3: Report_Analyses
In this step we run any external analyses using data from any/all elements of
the Report_Data object for the academic impact report and house the results
in a seperate object named Report_Analyses. This object will eventually be
passed to the rendered report. As with Report_Data, this can include
State_Assessment, College_Entrance, ELP_Assessment and (potentially
multiple) Interim_Assessment branches with multiple analysis results types
common to them all. For example, all assessments may have a participation
slot, but only one may have a multiple_imputation analysis.
This step assumes the user is operating with their working directory set to “NCME_2022_Project/All_States/State_A/Documentation”.
# setwd("./Documentation")
## Locate the "Universal_Content" directory
universal.content.path <- file.path("..", "..", "..", "Universal_Content")
# Load packages used in the Report Analyses (and install/update as necessary).
# Here we assume updates have been made in Step 1, and that each report analysis
# script has been run, but sourcing the "Report_Packages.R" may be necessary.
# source(file.path(universal.content.path, "Meta_Data", "Report_Packages.R"))Load Report_Data and Report_Analyses objects
Running analyses for the report assumes that formatted data is available from
the 2_Report_Data.R script. All custom functions and/or R scripts use
Report_Data as the data source. Although this script may be sourced in its
entirety, it is also possible to load the object and only (re)run certain
sections. For that reason, we first load any existing Report_Analyses objects.
if (!exists("Report_Data")) load("../Data/Report_Data.Rdata")
## Load an existing `Report_Analyses` object or set up list
if (file.exists("../Data/Report_Analyses.Rdata")) {
load("../Data/Report_Analyses.Rdata")
} else { # or create a new one and run all analyses
Report_Analyses <- list()
}State assessment analyses
For the statewide assessment in this demonstration report for State A, we will conduct two primary analyses: missing data and academic impact. We will also create a table that will be included in all state reports showing the percent proficient trend over several years . Because State A was not set up to include missing data, there is no multiple imputation analysis done here, and therefore a similar table is not produced showing the estimated impact on percent proficient from the missing data (via a multiple imputation analysis).
### Declare an assessment flavor
assessment <- "State_Assessment"
### Missing data visualizations
source("../Report_Analyses/State_A_Missing_Data_2021.R")
### Academic impact visualizations
source("../Report_Analyses/State_A_Academic_Impact_Visualization.R")
### Percent Proficient
pct_proficient <- function(achievement_level) {
tmp.table <- table(achievement_level)
100*sum(tmp.table["Proficient"])/sum(tmp.table)
}
tmp <- Report_Data[[assessment]][
YEAR %in% c("2018", "2019", "2021") &
CONTENT_AREA %in% c("ELA", "MATHEMATICS"),
.(PERCENT_PROFICIENT= round(pct_proficient(ACHIEVEMENT_ProfandAbove), 1)),
keyby=c("YEAR", "CONTENT_AREA", "GRADE")]
Report_Analyses[["Summary_Tables"]][[assessment]][[
"Achievement"]][["Overall_PctProf"]] <-
dcast(tmp, CONTENT_AREA + GRADE ~ YEAR,
sep=".", drop=FALSE, value.var="PERCENT_PROFICIENT")
## Imputation Difference Summaries
# source(file.path(universal.content.path, "Functions",
# "Percent_Prof_Imputations_Overall_ContentArea_by_Grade.R"))
# Report_Analyses[["Summary_Tables"]][[assessment]][[
# "Imputation"]][["CONTENT_AREA__GRADE"]] <-
# State_Assessment_SummariesCombine all data analyses into Report_Analyses and Save
## This logic is required for running this script while generating the report:
if (exists("params")) {
if (params$appendix.c)
save(Report_Analyses, file = "../../../../Data/Report_Analyses.Rdata")
} else save(Report_Analyses, file = "../Data/Report_Analyses.Rdata")
# rm(params)
# setwd("..")The Universality of Meta-Data
Before turning to Step 4, it may be useful to first look at the Universal Content on which this step interacts. In the “Universal_Content/Meta_Data” directory there are two files that contain a running list of the elements and resources that are available and/or required for the report generation in Step 5. The purpose of Step 4 is to combine the generic meta-data with the custom meta-data in hopes of making a complete picture of what the report will be.
Included files
- The
Report_Configs.Rfile containsRcode that stores general report information in an appropriately formatted (named)listobject. This object is given the namereport.configand will be available for manipulation upon usingsource(...)to load it into the workingRenvironment.- Information such as the report title/subtitle, author names/affiliations and parameters (data) that are required to created report content (tables/plots/conditional text) are stored in this file.
- This universal list can be combined with another list with customized meta-data
(named
custom.configin order to add client specific meta-data into thereport.configlist (state name, people to acknowledge etc.), or to change/update any of the universal meta-data that is stored in thereport.configlist.
- The
Report_Content.Rsource code provides child RMD file information also stored as alistobject. This meta-data includes the names of child files that all reports will likely contain (at a minimum). This object is given the namermd.filesand will be available for manipulation upon usingsource(...)to load it into the workingRenvironment.- File name and order information is stored as the file name only (not a file
path). The file paths are constructed based on the universal and custom child
document paths provided in the
report.configobject (report.config$params$unvrsl.rmd.pathandreport.config$params$custom.rmd.path).
- Customized .Rmd documents based on “universal” child documents should be named
identically if they are meant to replace the universal version entirely. If
they are meant to supplement or are entirely novel, they should be named
uniquely and their position in the report noted in the final
rmd.files$file.orderelement of the list. - Other information about the file order/content is also included here. For
example, if file order should be different for the
pagedownoutput (so as to optimize page breaks or table placement), this can be specified here. Or if the documents include citations that should be included in a “References” section. - Supplemental appendices are included in a separate “branch” of the list than the main report.
- File name and order information is stored as the file name only (not a file
path). The file paths are constructed based on the universal and custom child
document paths provided in the
These two lists that are subsequently used by functions in the Literasee package
to create a “NCIEA” themed report and website. See the “NCME_2022_Project/Documentation”
subdirectory for example scripts of how to source(...) these two documents to
create the master lists for the “Demonstration_COVID” toy data analysis and render
the report/website.
Reporting Step 4: Parameters and configurations
In this step we set up report configuration and content lists. This means we specify any necessary meta-data and parameters required to run the report and create/customize/complete the required YAML and RMD files that produce the report(s).
This step assumes the user is operating with their working directory set to “NCME_2022_Project/All_States/State_A/Documentation”.
# setwd("./Documentation")
## Load `Literasee` package
require(Literasee)
## Locate the "Universal_Content" directory
universal.content.path <- file.path("..", "..", "..", "Universal_Content")Load existing Report_Data and Report_Analyses objects from steps 2 and 3.
if (!exists("Report_Data")) load("../Data/Report_Data.Rdata")
if (!exists("Report_Analyses")) load("../Data/Report_Analyses.Rdata")Report configurations - data and meta-data.
A custom.config list is created to supply project specific information. It
can also be used to override some of the Universal settings (authors, etc.) We
create a “master” config list in the “NCME_2022_Project/Universal_Content/Meta_Data”
directory, and can merge the two together (prioritizing the elements in the
custom list).
In this project, the config list is comprised of three sub-lists: params,
client.info and top.level information.
The params object is given special treatment in rmarkdown. An object of this
name can be passed to R and executed internally either by creating a named
list and passing it in as an argument to rmarkdown::render or by creating a
section if the report yaml with the proper structure.
The individual elements of the params can be any valid R data type. Character
strings can be used to provide text (from single words to larger chunks), index
a list, be fed into a functions argument, etc. The same is true for numeric values.
A list object could be passed in as well to provide several typed of data for
a specific task. The params specified here are used to run further analyses
on the Report_Data, provide logic to execute code/text chunks conditionally,
and supply required state-specific meta-data.
An exhaustive list of the report should be kept to ensure all params are
defined. In the list below, we define only a few of the possible parameters,
and the unspecified ones are filled in internally in the params.Rmd script,
which is run at the beginning of the report rendering process.
This list is semi-exhaustive of what can be supplied to the .Rmd.
params = list(
state.name = "State A", # required at a minimum
state.abv = "S.A.",
state.org = "State A Department of Education",
state.org.abv = "SADoE",
draft = TRUE, # NULL to remove draft status
draft.text = "DRAFT REPORT -- DO NOT CITE", # default if `draft`=TRUE
keyword = "academic impact", # lower case. Camel applied as needed or can be customized as keyword_camel
imputations = FALSE,
min.size.school = 15, # N size cutoff for inclusion in summaries/analyses
min.size.district = 50, # N size cutoff for inclusion in summaries/analyses
sgp.abv = list( # `SGP` abbreviation for accessing `SGPstateData` meta-data.
State_Assessment = "State_A"#,
# College_Entrance = c(),
# ELP_Assessment = c(),
# Interim_Assessment = c()
),
sgp.name = list(
State_Assessment = "State A"
),
test.name = list(
State_Assessment = "A+ State Assessment Program"
),
test.abv = list(
State_Assessment = "ASAP"
),
test.url = list(
State_Assessment = "https://centerforassessment.github.io/SGPdata/"
),
code.url = list(
State_Assessment = "https://github.com/CenterForAssessment/NCME_2022_Project/tree/main/All_States/State_A/Initial_Data_Analysis"
),
gof.path = list(
State_Assessment = file.path("..", "Initial_Data_Analysis", "Goodness_of_Fit")
)
)The client.info and top.level lists are all text elements that need to be
customized for each state. The client.info section is used exclusively in
the nciea_report theme in this demonstration project. The top.level list
includes the information the report such as the title, authors, etc. that
will be included in the report front-matter.
client.info = list(
state.name = "State A", # required at a minimum
state.abv = "D.C.", # for cover page, not SGPstateData
city.name = "Washington",
organization = "State A Department of Education",
org.head = "Secretary Miguel Cardona",
github.repo = "https://github.com/CenterForAssessment/NCME_2022_Project/tree/main/All_States/State_A",
acknowledgements = "the entire staff of the SADoE Assessment and Accountability Office, and particularly Maggie Q. Public,"
)
# Title/subtitle, author.names, author.affil, date
top.level = list( # Title/subtitle, author.names, author.affil, date
title = "Academic Impact in State A",
subtitle = "Student Achievement and Growth during the COVID-19 Pandemic"
)A custom.files list supplies a list of the child document. That is, it
defines what content will be included, excluded or customized in the report.
As with the custom.config list, we create a “master” content (child) list
in the “NCME_2022_Project/Universal_Content/Meta_Data” directory, and can
merge the two together (prioritizing the elements in the custom list).
In this project, the content list is comprised of two main sub-lists: report,
and appendices. The report list below is identical to the one in the master
Report_Content.R script and references an individual child .Rmd file in the
“NCME_2022_Project/Universal_Content/rmarkdown/Child_RMD” directory. One can
re-order these files to change the ordering in the report. It is also possible
to edit/modify the “universal” child documents and keep the new version in a
separate directory (typically “State_X/Documentation/assets/rmd/Custom_Content”).
## List and order of child .Rmd files to be used in report/appendices
## The first two must be supplied and appear at the top of the report.
report = list(
file.order = c(
"setup.Rmd", # REQUIRED! - load R packages, declares knitr configs, etc.
"params.Rmd",# REQUIRED! - check/add to `params` supplied by list or yaml
"0_Abstract.Rmd",
"1_Intro__Overview.Rmd",
"1_Intro_Background.Rmd",
"1_Intro_Methods.Rmd",
"1_Intro_Data_Sources.Rmd",
"2_Participate__Analysis.Rmd",
"2_Participate__Overview.Rmd",
"2_Participate_Counts.Rmd",
"3_Impact__Overview.Rmd",
"3_Impact_Achievement_Analysis.Rmd",
"3_Impact_Achievement_Overview.Rmd",
"3_Impact_Growth_Analysis.Rmd",
"3_Impact_Growth_Overview.Rmd",
"4_Summary.Rmd"
),
references = TRUE # Are references/citations used?
)Besides adding/reordering Rmd files though custom.files, one can request a
subset of files. This will result in a truncated report, allowing section
editing/development. You always need to include setup.Rmd and params.Rmd!
This report contains several appendices. The first shows detailed visualizations
of academic impact. The second is a detailed description of the data preparation
and analysis (created directly from the R code!). Third is the appendix
you are reading now (also generated directly from R code ) regarding how to
create flexible, reproducible reports. The final appendix is simply the R
session information detailing the machinery behind the analysis and reporting.
appendices = list(
A = list(
title = "Academic Impact Overview",
file.order = c(
"params.Rmd",
"setup_impact_overview_appendix.Rmd",
"Appendix_Impact_Intro.Rmd",
"Appendix_Impact_Grade_Level_State.Rmd"
),
references = NULL
),
B = list(
title = "Initial SGP Analysis",
file.order = c(
"params.Rmd",
"setup_sgp_appendix.Rmd",
"Appendix_SGP_Analysis.Rmd"
),
references = NULL
),
C = list(
title = "Impact Report Generation",
file.order = c(
"Appendix_Impact_Report_Generation.html"
),
references = NULL
),
R = c()
)Combine report meta data and generate .Rmd parents
The following script will merge the report.config (universal) and custom.config lists and return ‘report.config’.
custom.config <- list(client.info = client.info, top.level = top.level, params = params)
source(file.path(universal.content.path, "Meta_Data", "Report_Configs.R"))
rm(params) # remove this to prevent conflicts with `render` later.Now merge the rmd.files (universal) and custom.files lists and return ‘rmd.files’
to be used in next steps. The custom.files will override defaults if they
exist in “assets/rmd/Custom_Content”.
custom.files <- list(report = report, appendices = appendices)
source(file.path(universal.content.path, "Meta_Data", "Report_Content.R"))With the combined meta-data, we can now create the .yml and .Rmd “master/parent”
documents for a nciea_report and/or bookdown site. These scripts can also
be used as “skeletons” for other reports (such as working drafts, alternate
output formats/templates, etc.)
createReportScripts(report_config=report.config, rmd_file_list=rmd.files)
## Save report YAML and file configurations
save(list=c("report.config", "rmd.files"), file = "Report_Configuration_MetaData.rda")
# setwd("..")Reporting Step 5: Produce report and appendices
Now the fun begins! In this step we (finally!) generate reports using the results from the analyses we ran and the information we gathered and organized in the four previous steps.
This step assumes the user is operating from different working directories
depending on the report we are generating. At this point in the demonstration,
we can consider this step a “choose your own adventure” story. On the one hand,
we can start at “NCME_2022_Project/All_States/State_A/Documentation” if all
the prior steps have been run together already and simply (re)load the necessary
data and results objects saved from steps 2 and 3 before moving on
to the “Initial_Data_Analysis” report generation. On the other hand, the code
is set up to where we could just start here without having run anything yet.
That is, the R code and Rmd files are set up to run the analysis code and
generate reports automatically afterwords.
We begin by creating appendix B, which pertains to the initial SGP analyses.
With those data and results available in the R workspace, we proceed with
the report analyses and reporting (generated simultaneously by spining
and evaluating steps 2 through 4 (steps 1 and 5 can/should be done separately).
Finally we create an academic impact overview appendix and then combine all
the main report child .Rmd files into the final report(s).
# setwd("./Documentation")
## Load `Literasee` package
require(Literasee)
## Locate the "Universal_Content" directory
universal.content.path <- file.path("..", "..", "..", "Universal_Content")
closet <- "../../../Universal_Content/rmarkdown/closet/"
## Load existing `Report_Data` and `Report_Analyses` objects from steps 2 and 3.
## Or... Skip the data loading and proceed to the next step to re-run while reporting.
# if (!exists("Report_Data")) load("../Data/Report_Data.Rdata")
# if (!exists("Report_Analyses")) load("../Data/Report_Analyses.Rdata")
## Create directories if needed for results
if(!dir.exists("../Documentation/report"))
dir.create("../Documentation/report", recursive = TRUE)Initial data analysis and reporting
One intent of this demonstration has been to show how reporting can be automated,
and one way this can be done is to combine the (final) data analysis and reporting
into a single step. The R code in the “Initial_Data_Analysis” directory
has been set up so that a report can be generated from it directly (here to be
used as an appendix to the final report). See this link
for details on how to spin your goat hair R code into a knit’d report.
We will first construct a parent .Rmd file from pieces provided in the closet
full of skeletons. In order to evaluate the code while generating a report,
it seems to be necessary to work from the “Initial_Data_Analysis”. The
documentation itself is saved where it can be accessed easily in the final
report generation.
We will first use a simple, clean template called “working paper”.
setwd("../Initial_Data_Analysis/")
writeLines(c("---", readLines(paste0(closet, "skeleton_sgp_ida_wp.yml")),
"---", readLines(paste0(closet, "skeleton_sgp_ida.Rmd"))),
"tmp_file.Rmd")
rmarkdown::render(input="tmp_file.Rmd",
output_file="../Documentation/report/Appendix_SGP_Analysis_WP.html")
frm.tf <- file.remove("tmp_file.Rmd")Render again in nciea_report theme
In the final part of step 4, we generated five .Rmd scripts for generating
reports with the nciea_report template from the Literasee package we
maintain on GitHub. These are all renderable from the “Documentation”
directory, with the exception of the initial SGP analysis appendix. Again,
since it is rendering R code directly and not just being rendered as text
(i.e. eval=TRUE is assumed in the code as it is interpreted) we need to run
the analyses and reporting from the “Initial_Data_Analysis” directory. Doing
this breaks many of the relative paths in the automatically generated script.
Just as we did for the draft version above, we will first construct a Rmd parent and then render it.
writeLines(c("---", readLines(paste0(closet, "skeleton_sgp_ida_nciea.yml")),
"---", readLines(paste0(closet, "skeleton_sgp_ida.Rmd"))),
"tmp_file.Rmd")
rmarkdown::render(input="tmp_file.Rmd",
output_file="../Documentation/report/Appendix_SGP_Analysis.html")
frm.tf <- file.remove("tmp_file.Rmd")Academic impact analyses (and reporting)
Now that we have the initial data analyses (re)run we can do another round of simultaneous analysis and report generation. This time we will create the “C” appendix, which is probably what you are reading right now. This again requires us to render the report from a specific directory - this time the “Documentation” directory (which is where we assumed all of the report data set up, analyses, etc. where to be conducted).
In the “Appendix_Impact_Report_Generation.Rmd”, the yaml has been written to
create reports in both the “working paper” and “nciea report” themes. We will
run them concurrently and generate both formats from the same call to render.
setwd("../Documentation/")
## Copy Rmd to the "Documentation" directory to match paths in R code.
fcp.tf <- file.copy(
"assets/rmd/Custom_Content/Appendix_Impact_Report_Generation.Rmd", ".")
rmarkdown::render(
input = "Appendix_Impact_Report_Generation.Rmd",
output_file = c("AIRG_WP.html", "Appendix_Impact_Report_Generation.html"),
output_format = "all", output_dir = "report")
frm.tf <- file.remove("Appendix_Impact_Report_Generation.Rmd")First draft of final report
We have delayed satisfaction long enough! We now compile our first draft of the final report along with the appendices we just generated. For this we will continue to use the “working paper” template.
We will use a similar process to the one above to assemble a parent document
for the draft output. The draft is then rendered and saved in “report”
directory and converted to a pdf there using pagedown::chrome_print.
writeLines(c("---", readLines(paste0(closet, "skeleton_main_wp.yml")),
readLines(paste0(closet, "skeleton_main_params.yml")),
"---", readLines(paste0(closet, "skeleton_main_alt.Rmd"))),
"report/tmp_file.Rmd")
rmarkdown::render(input="report/tmp_file.Rmd",
output_file="State_A_Academic_Impact_Analysis_WP.html")
frm.tf <- file.remove("report/tmp_file.Rmd")
pagedown::chrome_print("report/State_A_Academic_Impact_Analysis_WP.html")Alternate pagedown templates
There are a handful of templates within the pagedown package one can use to
format reports, and the pagedreport package offers three packaged report
templates that one can use. We can try one out here with our skeletons.
writeLines(c("---", readLines(paste0(closet, "skeleton_main_pgdrpt.yml")),
readLines(paste0(closet, "skeleton_main_params.yml")),
"---", readLines(paste0(closet, "skeleton_main_alt.Rmd"))),
"report/tmp_file.Rmd")
rmarkdown::render(input="report/tmp_file.Rmd",
output_file="State_A_Academic_Impact_Analysis_PgdRpt.html")
frm.tf <- file.remove("report/tmp_file.Rmd")
pagedown::chrome_print("report/State_A_Academic_Impact_Analysis_PgdRpt.html")This report looks pretty nice right out-of-the-box. There are some formatting
that would need to be cleaned up, and possibly some customization that could
be done without too little effort. One thing to point out is that appendices
would all need to be rendered separately or included as additional sections
at the end of the paper. A common complaint seen (or at least requested feature)
is that users can not add prefixes to page, figure or table numbers (e.g.,
“Figure A1.”, Table B-2, etc.). These issues have been worked out in the nciea
template, and improvements/workarounds are underway in the “working paper”.
## Academic impact overview appendix
We will now turn to an appendix with detailed plots of the academic impact
in State A. These plots contain quite a bit of information and were formatted
originally to be 11”x17” pdfs. The code for this appendix takes the R graphical
objects and converts them to svg images (pdf and html do not play well together)
before adding them to the report in a portrait layout. Once rendered and printed
to a preliminary PDF report, those pages are then rotated 90 degrees to a
landscape layout using the qpdf package.
Note that for this appendix we can use the automatically generated Rmd script created in step 4 without modification.
source("../../../Universal_Content/Functions/covidImpactSubVisualization.R")
rmarkdown::render("report/Academic_Impact_Overview_APPENDIX_A.Rmd")
pagedown::chrome_print("report/Academic_Impact_Overview_APPENDIX_A.html",
output = "report/Temp_ApdxA.pdf")
# Need to remove - seems to mess up attempts to render the `bookdown` site ...
unlink(file.path("report", "_bookdown.yml"))Adjust academic impact plots produced internally (from GROBS)
## Manually locate the pages to rotate
rpt.list <- list(
FRONT = 1:3, # frontmatter - title page, TOC, any intro, etc.
ELA = 4, # Any ELA specific section (text, tables, etc.)
MATHEMATICS = 11, # Any math specific section (text, tables, etc.)
BACK = NULL # backmatter - discussion, conclusions, etc.
)
base.pdf <- "report/Temp_ApdxA.pdf"
## Rotate landscape pages.
all.pages <- seq(qpdf::pdf_length(base.pdf))
pages.to.rotate <- setdiff(all.pages, unlist(rpt.list, use.names = FALSE))
qpdf::pdf_rotate_pages(
input = base.pdf,
output = "report/Academic_Impact_Analyses_APPENDIX_A.pdf", # Must rename
pages = pages.to.rotate, angle = 90,
)
frm.tf <- file.remove(base.pdf)Final report draft
We now have all the pieces in place to run the final report. We do not need
to modify the parent script createReportScripts produced. This is now the
easy part! We then add in final appendix, which displays the R session information
(the user’s system specifications, package versions, etc.).
rmarkdown::render("report/State_A_Academic_Impact_Analysis.Rmd")
here <- getwd()
rmarkdown::render(
"../../../Universal_Content/rmarkdown/Child_RMD/APPENDIX_R_NCIEA.Rmd",
output_file = paste0(here, "/report/Appendix_Academic_Impact_R.html"))Produce PDF reports for the remaining files
pagedown::chrome_print("report/State_A_Academic_Impact_Analysis.html")
# pagedown::chrome_print("report/Appendix_A.html") # Done above with rotation
pagedown::chrome_print("report/Appendix_SGP_Analysis.html")
pagedown::chrome_print("report/Appendix_Impact_Report_Generation.html")
pagedown::chrome_print("report/Appendix_Academic_Impact_R.html")Creating a bookdown website
It is also possible to create a bookdown website using the same Rmd child
documents as those used in the pagedown based reports. As part of the
createReportScripts function ran in 4_Make_Configs.R, the required yaml
and index.Rmd files were created. Given the complexity of some of the appendix
generation, it is advised that these scripts be modified to remove those from
the _bookdown.yml file first. It may also be necessary to copy some assets
(such as figures, plots, css, etc.) from “Universal_Content” to the state
“Documentation” directory.
Begin by copying the PDF reports to the “site” directory for download links and then rendering the website.
# if (!dir.exists(file.path("site", "downloads")))
# dir.create(file.path("site", "downloads"), recursive = TRUE)
#
# file.copy(c(file.path("report", "State_A_Academic_Impact_Analysis.pdf"),
# file.path("report", "Academic_Impact_Analyses_APPENDIX_A.pdf"),
# file.path("report", "Appendix_SGP_Analysis.pdf"),
# file.path("report", "Appendix_Impact_Report_Generation.pdf"),
# file.path("report", "Appendix_Academic_Impact_R.pdf")),
# file.path("site", "downloads"), overwrite = TRUE)
#
# bookdown::render_book(".", "bookdown::gitbook") # delete all appendices from yaml
#
# # Serve the site directory on a local host to see the results:
# servr::httw(dir = "site", watch = "site", port=4224)
# servr::daemon_stop()Bonus Report: The paper from this session
The paper that accompanies this demonstration is built from the README.md files sprinkled throughout the repository. Here’s how I put it together:
## Everything is meant to be run from the "report" directory...
R.utils::copyDirectory(from = "assets/js", to = "../assets/js")
R.utils::copyDirectory(from = "assets/images", to = "../assets/images")
rmarkdown::render("Flexible_Report_Generation.Rmd")
pagedown::chrome_print("Flexible_Report_Generation.html",
output = "report/Flexible_Report_Generation.pdf")
# setwd("..")The End
Hope that gets you started! There is a lot to dig into and we will do our best to help you out and answer any questions.