Flexible Report Generation and Dissemination with GitHub

.title[
# Flexible Report Generation and Dissemination with GitHub
]
.subtitle[
## Using GitHub for Open-Source Analytics, Reporting, and Dissemination of Research
]
.author[
### Adam VanIwaarden
]
.institute[
### Center for Assessment
]
.date[
### April 23rd, 2022<br>NCME 2022 Annual Meeting<br>Demonstration Session
<a href="https://github.com/CenterForAssessment/NCME_2022_Demonstration/blob/main/Presentations/Report_Generation_and_Dissemination_Using_GitHub_and_RMarkdown.Rmd" target="_blank" class="github-corner" aria-label="View source on Github"><svg viewBox="0 0 250 250" aria-hidden="true"><path d="M0,0 L115,115 L130,115 L142,142 L250,250 L250,0 Z"></path><path class="octo-arm" d="M128.3,109.0 C113.8,99.7 119.0,89.6 119.0,89.6 C122.0,82.7 120.5,78.6 120.5,78.6 C119.2,72.0 123.4,76.3 123.4,76.3 C127.3,80.9 125.5,87.3 125.5,87.3 C122.9,97.6 130.6,101.9 134.4,103.2" fill="currentColor" style="transform-origin: 130px 106px;"></path><path class="octo-body" d="M115.0,115.0 C114.9,115.1 118.7,116.5 119.8,115.4 L133.7,101.6 C136.9,99.2 139.9,98.4 142.2,98.6 C133.8,88.0 127.5,74.4 143.8,58.0 C148.5,53.4 154.0,51.2 159.7,51.0 C160.3,49.4 163.2,43.6 171.4,40.1 C171.4,40.1 176.1,42.5 178.8,56.2 C183.1,58.6 187.2,61.8 190.9,65.4 C194.5,69.0 197.7,73.2 200.1,77.6 C213.8,80.2 216.3,84.9 216.3,84.9 C212.7,93.1 206.9,96.0 205.4,96.6 C205.1,102.4 203.0,107.8 198.3,112.5 C181.9,128.9 168.3,122.5 157.7,114.1 C157.9,116.9 156.7,120.9 152.7,124.9 L141.0,136.5 C139.8,137.7 141.6,141.9 141.8,141.8 Z" fill="currentColor"></path></svg></a>
]

---

<div class="footer" style="left: 5%;"><img src="https://mirrors.creativecommons.org/presskit/icons/cc.svg" style = "max-width: 40%; opacity: 0.5;"></img></div>
<div class="footer" style="left: 7.5%;"><img src="https://mirrors.creativecommons.org/presskit/icons/by.svg" style = "max-width: 40%; opacity: 0.5;"></img></div>
<div class="footer" style="left: 40%;"><a class="footer">Flexible Report Generation</a></div>

---

#  Flexible Report Generation

---

#  Why be flexible?

If a reporting workflow isn't broke... why change it?

* Upfront costs of time and effort:
  - HURTS!
  - But will pay dividends in streamlined, efficient workflow.

---

#  What is flexibility?

* Ability to use the same process and code:
  - annual reports
  - across multiple clients/projects
  - simulation iterations
  - multiple formats and audiences
  - seamless/simultaneous analysis and reporting

???

Vertical (over time) and horizontal component to stretch your code in different
directions.

---

#  Demonstration, please!

* I took this demonstration as an opportunity to:

--
  - show how we used flexibility while investigating COVID impact

--
  - revisiting and improving that workflow

* Created a [GitHub repo](https://github.com/CenterForAssessment/NCME_2022_Project) that consolidates our efforts
  - typical/annual analyses
  - new analytic approaches
  - reporting

If a reporting workflow isn't broke... why change it?

???

What started out as reason for choosing this path for became an existential question.
It was a painful experience and its still a work in progress, but I'm growing and getting better/stronger/wiser...

---
class: highlight-last-item

#  Demonstration, please!

The [NCME_2022_Project repo](https://github.com/CenterForAssessment/NCME_2022_Project)
has two main components.

* Typical (for us) data analysis

--
  - In part because we need results to report on...

--
  - More importantly, simultaneous analysis and reporting
    + atypical (for us)

--
* Complete workflow of (post-initial) data analysis and reporting

--
  - 5 steps to flexible, extensible reporting.

--
* <hr />

---
class: inverse, center, middle
background-image: url(libs/cfa_assets/img/CFA_Logo_alpha_07.svg)
background-size: contain

# Key Concepts

---

#  Key Concepts

Finding balance between automation & flexibility

* Universal vs Custom Content/Code
  - everything starts off as custom content (too flexible)
  - repeating patterns and code become universal
  - what may seem unique can be applicable in other contexts

???

Both automation and flexibility require identifying what content is universal and
what must be customized to meet specific circumstances and situations.

By "universal", we mean elements that can (and often should) be used in all cases
  - updates or improvements are applied consistently.

Custom = right tool for the right job.

---
class: highlight-last-item

#  Universal vs Custom Content

--
* Make structure and format universal from the beginning
  - data & object naming conventions
  - data structures (tables and lists)
  - file organization

--
* Make your code (`R` and `.Rmd`) flexible from the beginning
  - allow for the idiosyncratic to be universal

--
* ***parameterization***
  - Applies to data, text and code.
  - combines both automation and flexibility
    - universal elements with specific application
--
* <hr />

---

#  Universal & Custom Content

* parameterization
  - small bits of information define how results are rendered
  - as simple as `TRUE`/`FALSE` conditional execution
    - inclusion/exclusion
    - evaluation
    - text bits and chunks
    - whole sections

---

#  R Markdown

This is NOT a demonstration for beginners.

--
* Assumptions
  - familiar and comfortable with data wrangling and analysis in `R`
  - some experience rendering (`knit`ing) documents based on R markdown
  - I don't care if you use Rstudio, but I don't (usually) and the code
    here is meant to be run from the console and/or `source()`d in.
    + But I do give a shortcut in running all the code expeditiously ;-)
--
* If you are a beginner, [start reading](https://bookdown.org/yihui/rmarkdown-cookbook/)
  and practicing, and then circle back to these resources. There are many tutorials
  and examples to get started with online.

---
class: highlight-last-item

#  R Markdown

--
* Rendering packages and output formats

--
  - This demonstration deals almost exclusively with reports generated with
    [`pagedown` package](https://github.com/rstudio/pagedown) templates and output.

--
      - Developed by Yihui Xi and Romain Lesur (mainly), based on [`paged.js`](https://pagedjs.org/documentation/),
        to "paginate the HTML output of R Markdown with CSS for print". I.e., a
        sexier alternative to `LaTeX`.

--
      - This means we end up with a printable, self-contained HTML file and a
        PDF version of our reports.

--
* <hr />

---
class: highlight-last-item

#  R Markdown

- We deal (abstractly and briefly) with creating a `bookdown` website.
--

- Any output format is possible with the base content/scripts we provide/create.
    It (usually) just requires new YAML frontmatter or provided arguments to `rmarkdown::render`.
--

- some big exceptions for:
    * internal environments and working directories
    * asset location expectations

--
  - <hr />

---
class: highlight-last-item

#  R Markdown

Back to...

* parameterization

--
  - `params` is a special object in `rmarkdown`
  - can be a named list/object that is available before you render
  - supplied in the YAML
  - used to provide bits of data, text, logic, etc.
--

* <hr />

---
class: highlight-last-item

#  R Markdown

And on to...

* [parent and child Rmd documents](https://bookdown.org/yihui/rmarkdown-cookbook/child-document.html)

--
  - creates a more manageable, "master" document
    + but the kids *can* be unwieldy at first
  - division of labor
  - build (and refine) in small pieces
--

* <hr />

---
class: inverse, center, middle
background-image: url(libs/cfa_assets/img/CFA_Logo_alpha_07.svg)
background-size: contain

#  (My) Generation

---

# Git a copy for free

You can either fork or download a copy of the [repo from GitHub,](https://github.com/CenterForAssessment/NCME_2022_Project)
or do a direct download from `R`:

```r
utils::download.file(
  url="https://github.com/CenterForAssessment/NCME_2022_Project/archive/refs/heads/main.zip",
  destfile = "NCME_2022_Project.zip", method="auto")
```

---

##  Workflow of the report generation repository

There are 3 subdirectories at the head of the repo. There are "README.md" files
in the most of these and some of their subdirectories.

ALL of the action happens in "All_States". The "Universal_Content" and "Custom_Content"
directories contain shared resources.

I will still add more "State_X" examples, but for now there is only "State_A".
- apologies for the convoluted structure, but trust me, it is necessary.

The repo is in a "blank state" - there are no pre-generated reports. You will have
to create those yourself! Or look at the "Documentation" menu of the
[NCME_2022_Demonstration website.](https://centerforassessment.github.io/NCME_2022_Demonstration/index.html)

---
class: highlight-last-item

#  State A

This represents the various stages of our analysis and reporting efforts, typically 
located in different areas of our work environments.

--
* **Initial_Data_Analysis** is our typical annual data analysis
  - clean and prepare data
  - calculate Student Growth Percentiles (SGPs)
    - confidential student data (housed securely)
    - `R` scripts (shared openly on Github)

--
* <hr />

---

#  State A

* **Report_Analyses** we investigate academic impact.
  - Typically in the "Documentation" directory/repo, but placed here for emphasis/differentiation.

* **Documentation**
  - R Markdown based code, content and assets for generating reports
  - Typically a separate Github repo (with final reports)
  - ***This is the heart of "Flexible Report Generation"***
--
* <hr />

???

This component contains the "state" specific data analysis and report generation
content. Although this project has been framed as a workflow across multiple states,
this could be envisioned in other ways where many projects resemble each other,
but separation is required: school level analysis and reporting, different branches
of a simulation study, annual analysis/reporting within a single organization,
technical reports, etc.

Eventually there will be a 4th directory
* **Data** - student data and results from impact analyses.
  - housed outside of any Github repo

---
class: highlight-last-item

#  State A analysis and reporting workflow

Option 1) - Go through the directories in the order presented above

Initial_Data_Analysis:

* Run the `R` scripts like you normally would.
  - start with the data prep ("State_A_Data_LONG.R")
  - move on to "State_A_Baseline_SGP_Analyses.R"

--
* <hr />

---
class: highlight-last-item

#  State A analysis and reporting workflow

The code is *heavily* commented in a format that you might be unfamiliar with (`Roxygen`).

* Allows you to document your work as you go
  - Turn `R` code into working paper

--
    + Can be executed/sourced like normal code
    + or [`spin`](https://bookdown.org/yihui/rmarkdown-cookbook/spin.html) it into a parent/child
--
  - These scripts eventually get used directly in the report generation process
    as [child-like documents](https://yihui.org/en/2021/06/spin-child/). The data
    is analyzed and documented (in and appendix) simultaneously.
--
* <hr />

---
class: highlight-last-item

#  State A analysis and reporting workflow

--
* Report Analyses:

--
  - Gotcha!
  - These will be run as part of...

--
* Documentation:

- 5 steps through data analysis and report generation.

--
      + 5 scripts to guide you through
      + Report analyses are done as part of step 3

--
* <hr />

---
class: highlight-last-item

##  State A reporting in 5 steps

* `1_Repo_Setup_and_Maintenance.R`
  - install/update `R` packages required for all steps
  - copy assets (from Universal_Content and `Literasee` package)

* `2_Report_Data.R`
  - formatting, cleaning and subsetting data sources
  - "universal" data format, naming conventions, object name, etc.

* `3_Report_Analyses.R`
  - report-specific analyses
  - may be time consuming and/or require evaluation before reporting

* <hr />

---
class: highlight-last-item

##  State A reporting in 5 steps

* `4_Make_Configs.R`
  - create custom configuration lists that identify

--
      1. state specific meta-data and report parameters and
      2. custom Rmarkdown content to be used in place of, or in addition to, the
         universal report content.
--

* `5_Make_Report.R`
  - FINALLY!
  - `R` code to render reports using `pagedown` templates
  - workable scripts to create a`bookdown` website
--

* <hr />

---

#  State A analysis and reporting workflow

Option 2) - Cut to the chase

* The `5_Make_Report.R` script has been layed out in such a way that you can
  (almost) skip right to it and `source()` in that file to run everything.
  - first need to make sure the "Documentation" directory is set up
    and you have the required `R` packages.

* So 5 steps become 2 steps ...

---

#  State A analysis and reporting workflow

Option 2) - Cut to the chase

```r
# Set working directory to "State_A/Documentation"
setwd("NCME_2022_Project/All_States/State_A/Initial_Data_Analysis/")
# Step 1 (if you haven't already)
source("1_Repo_Setup_and_Maintenance.R")
# You might want to restart your R session after step 1.
# Step 5 (because delayed gratification isn't your thing)
source("5_Make_Report.R")
```

You will likely get some warnings about changing working directories and
`knitr::include_graphics` not liking absolute paths, but it should run without
errors.

---

##  Format comparisons

---

##  Format comparisons

---

##  Format comparisons