Do Test Score Gaps Grow Before, During, or Between the School Years? Measurement Artifacts and What We Can Know in Spite of Them. Concerns regarding gaps in student achievement for students of lower socio-economic status (SES) and students of color continue to concern educators and the public. One of the more influential studies to examine this issue was the Beginning School Study (BSS) of students in Baltimore City Public Schools in 1982 (Alexander and Entwisle, 2003). The authors found an achievement gap exists at the time student entered elementary school. More importantly, they conclude that the discrepancy in performance widened after each summer break, tripling in size by the end of middle school.
A more recent study published in 2019 by von Hippel and Hamrock offers evidence to counter the Alexander and Entwisle 2003 claims, suggesting that the growing gap is an artifact of the testing and the measurement methods used in the 2003 research. Von Hippel and Hamrock conclude the scaling method, Thurstone scaling (frequently used in the 1960s and 1970s), is flawed and is responsible for the original findings. The Thurstone scaling method has subsequently been replaced in research by more effective methods such as response theory (IRT). When the data from the study was reanalyzed using IRT, the gaps shrank. The new study concludes that gaps are already significant by the time children start school and remain relatively stable until graduation.
The von Hippel and Hamrock research looked at test score gaps for a range of populations: between boys and girls; between black, white, and Hispanic children; between the children and the mother’s education; between children in poor and nonpoor families; and the gaps between high-poverty and low-poverty schools. The researchers wanted to know whether gaps grow faster during summer or the school year. They were unable to answer this question as the results were inconclusive. Although, von Hippel and Hamrock did find the total gap in performance from kindergarten to eighth grade, is substantially smaller than the gap that exists at the time children enter school.
Von Hippel and Hamrock highlight two measurement artifacts that skewed Alexander and Entwisle results: test score scaling and changes of test content. Scaling is a mathematical method that transforms right and wrong answers into a test score. Not all scales produce the same results with important implications for whether and when score gaps happen. Along with concluding that a gap between SES populations tripled between first and eighth grade, Alexander and Entwisle found it was summer vacations where the real gap increased each year. Von Hippel and Hamrock found the BSS used CAT Form C, which was a “fixed-form” paper test. In first grade, all BSS children took a test that contained a fixed or unvarying set of questions in fall and spring. This makes sense when you want to know if students are meeting learning expectations over a specific grade.
But the Alexander and Entwisle wanted to understand the impact of summer breaks on learning, not during a school year. To obtain this information they were used a test designed for the first grade taken at the end of the school year and compared it to the second-grade test given in the fall. Using the spring test of first grader knowledge, then switching the test to the second-grade test in the fall to measure performance the impact of summer break has the effect of confounding summer learning results. Von Hippel and Hamrock propose that changing the test form had the possible effect of distorting the results. Alexander and Entwisle was not the only seasonal learning study to use fixed forms that changed after the summer. Using fixed form tests was a common practice for research from the 1960s into the 1990s. Von Hippel and Hamrock study suggests the summer learning literature was potentially vulnerable to artifacts related to scaling and changes of test form.
Fixed form tests have been replaced by the use of adaptive tests less vulnerable to artifacts that might affect summer learning. Adaptive tests do not ask the same questions of all students. Adaptive tests measure ability by increasing the difficulty of questions asked of students based on the student’s earlier performance. Hence, adaptive tests are a better tool to gauge the impact of summer on student achievement.
The von Hippel and Hamrock study concludes that gaps grow fastest in early childhood. They find no evidence of a gap doubling between first grade and eighth grade and some disparities even shrank. The summer gap growth does not hold up when the flawed instrument is replaced with adaptive tests scored using IRT ability scales. When summer learning gaps are present, most of them are small and not easily detectable. The conclusion is that gaps happen mostly in the first five years of life. Resources currently used to solve a summer learning gap that doesn’t appear to exist should be redirected toward early childhood education. Von Hippel and Hamrock’s study suggests students who are behind peers at the time they enter kindergarten should receive early remedial instruction as the most efficacious way to improve overall performance.
Citation: von Hippel, P. T., & Hamrock, C. (2019). Do test score gaps grow before, during, or between the school years? Measurement artifacts and what we can know in spite of them. Sociological Science, 6, 43-80.