Introduction

The SGPdata package contains 4 examplar data set for use with student growth percentile (SGP) analyses. One of the data sets, sgpData, specifies data in the WIDE format that’s used with the lower level SGP functions studentGrowthPercentiles and studentGrowthProjections. Two of the data sets, sgpData_LONG and sgptData_LONG specify data in the LONG format used by higher level functions like abcSGP, prepareSGP, and analyzeSGP. The last data set, sgpData_INSTRUCTOR_NUMBER is a teacher-student lookup table utilized to produce teacher level aggregates. The sections that follow discuss each of the 4 data sets in greater depth.

WIDE data format: sgpData

The data set sgpData is an anonymized, panel data set comprisong 5 years of annual, vertically scaled, assessment data in WIDE format. This exemplar data set models the format for data used with the lower level studentGrowthPercentiles and studentGrowthProjections functions.

The Wide data format illustrated by sgpData and utilized by the SGP package can accomodate any number of occurrences but must follow a specific column order. Variable names are irrelevant, position in the data set is what’s important:

  • The first column must provide a unique student identifier.
  • The next set of columns must provide the grade level/time associated with the students assessment occurrences.
  • The next set of columns must provide the numeric scores associated with the students assessment occurrences.

In sgpData above, the first column, ID, provides the unique student identifier. The next 5 columns, GRADE_2013, GRADE_2014, GRADE_2015, GRADE_2016, and GRADE_2017, provide the grade level of the student assessment score in each of the 5 years. The last 5 columns, SS_2013, SS_2014, SS_2015, SS_2016, and SS_2017, provide the scale scores associated with the student in each of the 5 years. In most cases the student does not have 5 years of test data so the data shows the missing value (NA).

Using wide-format data like sgpData with the SGP package is, in general, straight forward.

> sgp_g4 <- studentGrowthPercentiles(
        panel.data=sgpData,
        sgp.labels=list(my.year=2015, my.subject="Reading"),
        percentile.cuts=c(1,35,65,99),
        grade.progression=c(3,4))

Please consult SGP package documentation for more comprehensive documentation on how to use sgpData for SGP calculations.

LONG data format: sgpData_LONG

The data set sgpData_LONG is an anonymized, panel data set comprising 5 years of annual, vertcially scaled, assessment data in LONG format for two content areas (ELA and Mathematics). This exemplar data set models the format for data used with the higher level functions abcSGP, prepareSGP, analyzeSGP, combineSGP, summarizeSGP, visualizeSGP, and outputSGP

> head(sgpData_LONG)
   VALID_CASE CONTENT_AREA      YEAR      ID LAST_NAME FIRST_NAME GRADE SCALE_SCORE    ACHIEVEMENT_LEVEL       GENDER ETHNICITY FREE_REDUCED_LUNCH_STATUS ELL_STATUS IEP_STATUS GIFTED_AND_TALENTED_PROGRAM_STATUS SCHOOL_NUMBER                  SCHOOL_NAME  EMH_LEVEL DISTRICT_NUMBER                DISTRICT_NAME SCHOOL_ENROLLMENT_STATUS DISTRICT_ENROLLMENT_STATUS STATE_ENROLLMENT_STATUS
1: VALID_CASE  MATHEMATICS 2016_2017 1000372   Daniels      Corey     3         435           Proficient Gender: Male  Hispanic   Free Reduced Lunch: Yes   ELL: Yes    IEP: No    Gifted and Talented Program: No          1851 Silk-Royal Elementary School Elementary             470 Apple Valley School District     Enrolled School: Yes     Enrolled District: Yes     Enrolled State: Yes
2: VALID_CASE  MATHEMATICS 2017_2018 1000372   Daniels      Corey     4         461           Proficient Gender: Male  Hispanic   Free Reduced Lunch: Yes   ELL: Yes    IEP: No    Gifted and Talented Program: No          1851 Silk-Royal Elementary School Elementary             470 Apple Valley School District     Enrolled School: Yes     Enrolled District: Yes     Enrolled State: Yes
3: VALID_CASE  MATHEMATICS 2018_2019 1000372   Daniels      Corey     5         444 Partially Proficient Gender: Male  Hispanic   Free Reduced Lunch: Yes   ELL: Yes    IEP: No    Gifted and Talented Program: No          1851 Silk-Royal Elementary School Elementary             470 Apple Valley School District     Enrolled School: Yes     Enrolled District: Yes     Enrolled State: Yes
4: VALID_CASE      READING 2016_2017 1000372   Daniels      Corey     3         523 Partially Proficient Gender: Male  Hispanic   Free Reduced Lunch: Yes   ELL: Yes    IEP: No    Gifted and Talented Program: No          1851 Silk-Royal Elementary School Elementary             470 Apple Valley School District     Enrolled School: Yes     Enrolled District: Yes     Enrolled State: Yes
5: VALID_CASE      READING 2017_2018 1000372   Daniels      Corey     4         540 Partially Proficient Gender: Male  Hispanic   Free Reduced Lunch: Yes   ELL: Yes    IEP: No    Gifted and Talented Program: No          1851 Silk-Royal Elementary School Elementary             470 Apple Valley School District     Enrolled School: Yes     Enrolled District: Yes     Enrolled State: Yes
6: VALID_CASE      READING 2018_2019 1000372   Daniels      Corey     5         473       Unsatisfactory Gender: Male  Hispanic   Free Reduced Lunch: Yes   ELL: Yes    IEP: No    Gifted and Talented Program: No          1851 Silk-Royal Elementary School Elementary             470 Apple Valley School District     Enrolled School: Yes     Enrolled District: Yes     Enrolled State: Yes

We recommend LONG formated data for use with operational analyses. Managing data in long format is more simple than data in the wide format. For example, when updating analyses with another year of data, the data is appended onto the bottom of the currently existing long data set. All higher level functions in the SGP package are designed for use with LONG format data. In addition, these functions often assume the existence of state specific meta-data in the embedded SGPstateData meta-data. See the SGP package documentation](https://sgp.io) for more comprehensive documentation on how to use sgpData for SGP calculations.

There are 7 required variables when using LONG data with SGP analyses: VALID_CASE, CONTENT_AREA, YEAR, ID, SCALE_SCORE, GRADE and ACHIEVEMENT_LEVEL (on required if running student growth projections). LAST_NAME and FIRST_NAME are required if creating individual level student growth and achievement plots. All other variables are demographic/student categorization variables used for creating student aggregates by the summarizeSGP function.

The sgpData_LONG data set contains data for 5 years across 2 content areas (ELA and Mathematics)

LONG teacher-student lookup: sgpData_INSTRUCTOR_NUMBER

The data set sgpData_INSTRUCTOR_NUMBER is an anonymized, student-instructor lookup table that provides insturctor information associated with each students test record. Note that just as each teacher can (and will) have more than 1 student associated with them, a student can have more than one teacher associated with their test record. That is, multiple teachers could be assigned to the student in a single content area for a given year.

Contributions & Requests

If you have a contribution or feature request for the SGPdata package, don’t hesitate to write or set up an issue on GitHub.