Skip to main content

​​​​​​​​​​​​​​​​​​​​​​​​​​​​​NAEP Technical DocumentationProportion of Variance Accounted for by Principal Components Used in NAEP Population-Structure Models

The population-structure models employed for specific national, state, and combined national and state assessment samples did not directly use the group variable specifications. As in other statistical analyses where there are a large number of correlated variables, a principal component transformation of the variable contrasts derived according to these specifications​ was performed. The principal components, rather than the original variable contrasts, are used in the analyses so that the estimation procedures are computationally stable.

To avoid overfitting the model, a large number, but not all, of the principal components based on this transformation were used as the variables in estimating the population-structure models. Prior to 2022, the correlation matrix was used in the principal component transformation. Beginning in 2022, five main reporting variables (school-reported gender, IEP status, LEP status, National School Lunch Program eligibility,​ and school-reported race/ethnicity) were included as direct covariates in the population-structure model. (Please note that the new procedure was implemented for all 2022 assessments other than the ​age 9 long-term trend assessments.) ​The remaining variables were regressed on the five main reporting variables, and a principal component analysis was applied to the residual covariance matrix of this regression. This change was made to ensure that the main reporting variables are fully accounted for in the population-structure models, and to improve parameter estimation with the use of a covariance matrix. For national assessments, the proportions of variance of the variable contrasts accounted for by the principal components are given for each grade level.​

For tables linked to this page starting with the 2002 assessment year, the following information is provided for each type of contrast:

  • the number of contrasts for each type of contrast;
  • value for the mean proportion of variance explained;
  • value for the minimum proportion of variance explained;
  • value for the maximum proportion of variance explained; and
  • the number of contrasts by proportion of variance explained.

For assessments that used the covariance matrix in the principal component transformation, the minimum and maximum proportion of variance explained were also provided for contrasts which met the minimum sample requirements for reporting.​

The proportion of variance explained in each table indicates how closely the principal components reflect the variables used to define the groups. If the proportion of variance of a group-defining variable contrast accounted for by the principal components is one, all of the variability of that contrast was taken into account by the population-structure model. If all of the principal components were used in the models, all of the proportions would be one. For the national-level conditioning models, the number of principal components was selected so that at least 95 percent of the overall variance of the group-defining variable contrasts was accounted for by the principal components. For state-level conditioning models, at least 90 percent of the overall variance of the group-defining variable contrasts was accounted for by the principal components to take into account the smaller sample size of state samples​. This results in proportions that are less than one. The values provided show that the groups are well described by the population-structure models, because of the high proportions for student groups. ​Prior to the 2002 and 2003 assessments, for variables with quadratic multi-degree-of-freedom contrasts, the variance explained by the variables was reported.

 

Links to tables of the proportion of variance for each subject area assessment's population-structure models, national and combined national and state assessments, by subject area, year, and grade: Various years, 2000–2022
Subject areaYearGrade 4Grade 8Grade 12
Arts2016 R3
2008 R3
Civics2022 R3
2018 R3
2014 R3
2010 R3 R3 R3
2006 R3 R3 R3
Economics2012 R3
2006 R3
Geography2018 R3
2014 R3
2010 R3 R3 R3
2001 R2/R3 R2/R3 R2/R3
Mathematics2022 R3 R3
2019 R3 R3 R3
2017 R3 R3
2015 R3 R3 R3
2013 R3 R3 R3
2011 R3 R3
2009 R3 R3 R3
2007 R3 R3
2005 R3 R3 R3
2003 R3 R3
2000 R2/R3 R2/R3 R2/R3
​Reading2022 R3 R3
2019 R3 R3 R3
2017 R3 R3
2015 R3 R3 R3
2013 R3 R3 R3
2011 R3 R3
2009 R3 R3 R3
2007 R3 R3
2005 R3 R3 R3
2003 R3 R3
2002 R3 R3 R3
2000 R2/R3
Science2019 R3 R3 R3
2015 R3 R3 R3
2011​
R3
2009 R3 R3 R3
2005 R3 R3 R3
2000 R2/R3 R2/R3 R2/R3
Technology and engineering literacy (TEL)
2018 R3
2014 R3
U.S. history2022 R3
2018 R3
2014 R3
2010 R3 R3 R3
2006 R3 R3 R3
2001 R2/R3 R2/R3 R2/R3
Vocabulary2015 R3 R3 R3
2011 R3 R3
2009 R3 R3 R3
Writing2011 R3 R3
2007 R3 R3
2002 R3 R3 R3
† Not applicable; subject was not assessed at this grade in this year.
NOTE: R2 is the non-accommodated reporting sample; R3 is the accommodated reporting sample. If sampled students are classified as students with disabilities (SD) or English learners (EL), and school officials, using NAEP guidelines, determine that they can meaningfully participate in the NAEP assessment with accommodation, those students are included in the NAEP assessment with accommodation along with other sampled students including SD/EL students who do not need accommodations. The R3 sample is more inclusive than the R2 sample and excludes a smaller proportion of sampled students. The R3 sample is the only reporting sample used in NAEP after 2001. In NAEP, vocabulary, reading vocabulary, and meaning vocabulary refer to the same reporting scale. Because preliminary analyses of students' writing performance in the 2017 NAEP writing assessments at grades 4 and 8 revealed potentially confounding factors in measuring performance, results will not be publicly reported.
SOURCE: U.S. Department of Education, Institute of Education Sciences, National Center for Education Statistics, National Assessment of Educational Progress (NAEP), various years, 20002022 Assessments.

 

Links to tables of the proportion of variance for each subject area assessment's population-structure models, long-term trend assessments, by subject area, year, and age: Various years, 2004–2023
Subject areaYearAge 9Age 13Age 17
Mathematics2022/2023 R3 R3
Reading R3 R3
​Mathematics2020 R3 R3
Reading R3 R3
​Mathematics2012 R3 R3 R3
Reading R3 R3 R3
Mathematics2008 R3 R3 R3
Reading R3 R3 R3
Mathematics2004 R3 R3 R3
Reading R3 R3 R3
† Not applicable.​
NOTE: R3 is the accommodated reporting sample. If sampled students are classified as students with disabilities (SD) or English learners (EL), and school officials, using NAEP guidelines, determine that they can meaningfully participate in the NAEP assessment with accommodation, those students are included in the NAEP assessment with accommodation along with other sampled students including SD/EL students who do not need accommodations. The R3 sample is more inclusive than the R2 sample and excludes a smaller proportion of sampled students. The R3 sample is the only reporting sample used in NAEP after 2001.​ In 2020 and 2022/2023, age 17 was not assessed. 
SOURCE: U.S. Department of Education, Institute of Education Sciences, National Center for Education Statistics, National Assessment of Educational Progress (NAEP), various years, 2004–2023 Mathematics and Reading Long-Term Trend Assessments.






Last updated 03 September 2024 (PG)