Two experiments are reported which test the effect of increased three-term contingency trials on students' correct and incorrect math responses. The results warrant further research to test whether or not rates of presentation of three-term contingency trials are predictors of effective instruction.
Albers, A. E., & Greer, R. D. (1991). Is the three-term contingency trial a predictor of effective instruction?. Journal of Behavioral Education, 1(3), 337-354.
The authors investigated the hypothesis that treatment acceptability influences teachers' use of a formative evaluation system (curriculum-based measurement) and, relatedly, the amount of gain effected in math for their students.
Allinder, R. M., & Oats, R. G. (1997). Effects of acceptability on teachers' implementation of curriculum-based measurement and student achievement in mathematics computation. Remedial and Special Education, 18(2), 113-120.
The “Standards for Educational and Psychological Testing” were approved as APA policy by the APA Council of Representatives in August 2013.
American Educational Research Association, American Psychological Association, Joint Committee on Standards for Educational, Psychological Testing (US), & National Council on Measurement in Education. (1985). Standards for educational and psychological testing. American Educational Research Association.
This study evaluated the relationship between scores on high stakes test and scores on other measures of learning such as NAEP and SAT scores. In general, there was no increase in student learning as a function of high stakes testing.
Amrein, A. L., & Berliner, D. C. (2002). High-Stakes Testing, Uncertainty, and Student Learning. Education Policy Analysis Archives.
The purpose of this study is to assess whether academic achievement in fact increases after the introduction of high-stakes tests. The first objective of this study is to assess whether academic achievement has improved since the introduction of high-stakes testing policies in the 27 states with the highest stakes written into their grade 1-8 testing policies.
Amrein-Beardsley, A., & Berliner, D. C. (2002). The Impact of High-Stakes Tests on Student Academic Performance.
This book was designed as an assessment of standardized testing and its alternatives at the secondary school level.
Archbald, D. A., & Newmann, F. M. (1988). Beyond standardized testing: Assessing authentic academic achievement in the secondary school.
In current study, through a meta-analysis of 78 studies, it is aimed to determine the overall effect size for testing at different frequency levels and to find out other study characteristics, related to the effectiveness of frequent testing.
Başol, G., & Johanson, G. (2009). Effectiveness of frequent testing over achievement: A meta analysis study. Journal of Human Sciences, 6(2), 99-121.
There is also little or no evidence for the claim that teachers will be more motivated to improve student learning if teachers are evaluated or monetarily rewarded for student test score gains.
Baker, E. L., Barton, P. E., Darling-Hammond, L., Haertel, E., Ladd, H. F., Linn, R. L., ... & Shepard, L. A. (2010). Problems with the Use of Student Test Scores to Evaluate Teachers. EPI Briefing Paper# 278. Economic Policy Institute.
Standardized tests play a critical role in tracking and comparing K-12 student progress across time, student demographics, and governing bodies (states, cities, districts). One methodology is to benchmark the each state’s proficiency standards against those of the National Assessment of Educational Progress (NAEP) test. This study does just that. Using NAEP as a common yardstick allows a comparison of different state assessments. The results confirm the wide variation in proficiency standards across states. It also documents that the significant majority of states have standards are much lower than those established by the NAEP.
Bandeira de Mello, V., Rahman, T., and Park, B.J. (2018). Mapping State Proficiency Standards Onto NAEP Scales: Results From the 2015 NAEP Reading and Mathematics Assessments (NCES 2018-159). U.S. Department of Education, Washington, DC: Institute of Education Sciences, National Center for Education Statistics.
Describes the ways in which accountability methods were built into practicum experiences for specialist- and doctoral-level school psychology trainees at the University of Cincinnati.
Barnett, D. W., Daly III, E. J., Hampshire, E. M., Rovak Hines, N., Maples, K. A., Ostrom, J. K., & Van Buren, A. E. (1999). Meeting performance-based training demands: Accountability in an intervention-based practicum. School Psychology Quarterly, 14(4), 357.
A brief history of high-stakes testing is followed by an analysis of eighteen states with severe consequences attached to their testing programs.
Beardsley, A., & Berliner, D. C. (2002). High-stakes testing, uncertainty, and student learning. Education Policy Analysis Archives, 10.
The later effects of the Direct Instruction Follow Through program were assessed at five diverse sites. Low-income fifth and sixth graders who had completed the full 3 years of this first- through third-grade program were tested on the Metropolitan Achievement Test (Intermediate level) and the Wide Range Achievement Test (WRAT).
Becker, W. C., & Gersten, R. (1982). A follow-up of Follow Through: The later effects of the Direct Instruction Model on children in fifth and sixth grades. American Educational Research Journal, 19(1), 75-92.
This paper uses student-level data from a statewide community college system to examine the validity of placement tests and high school information in predicting course grades and college performance.
Belfield, C. R., & Crosta, P. M. (2012). Predicting Success in College: The Importance of Placement Tests and High School Transcripts. CCRC Working Paper No. 42. Community College Research Center, Columbia University.
As 2021 begins, we can’t make assumptions about what students have learned this school year. Education leaders and teachers, of course, have interacted with students and watched them through computer screens for many months — but we won’t truly know what happened and where learning gaps exist without statewide exams.
Bell-Ellwanger, J. (2021, January 5). Analysis: Spring exams are the best shot state leaders have at knowing what’s happening with their students. The 74
Since the passage of the No Child Left Behind Act (NCLB) in 2002 and its 2015 update, the Every Student Succeeds Act (ESSA), every third through eighth grader in U.S. public schools now takes tests calibrated to state standards, with the aggregate results made public. In a study of the nation’s largest urban school districts, students took an average of 112 standardized tests between pre-K and grade 12.
Berwick, C. (2019). What Does the Research Say About Testing? Marin County, CA: Edutopia.
This article reports on a 4-year longitudinal study of the effects of Literacy Collaborative (LC), a schoolwide reform model that relies primarily on the oneon-one coaching of teachers as a lever for improving student literacy learning.
Biancarosa, G., Bryk, A. S., & Dexter, E. R. (2010). Assessing the value-added effects of literacy collaborative professional development on student learning. The elementary school journal, 111(1), 7-34.
Firm evidence shows that formative assessment is an essential component of classroom work and that its development can raise standards of achievement, Mr. Black and Mr. Wiliam point out. Indeed, they know of no other way of raising standards for which such a strong prima facie case can be made.
Black, P., & Wiliam, D. (2010). Inside the black box: Raising standards through classroom assessment. Phi Delta Kappan, 92(1), 81-90.
The Hexagon Discussion and Analysis Tool helps organizations evaluate new and existing programs and practices. This tool is designed to be used by a team to ensure diverse perspectives are represented in a discussion of the six contextual fit and feasibility factors.
Blase, K., Kiser, L. and Van Dyke, M. (2013). The Hexagon Tool: Exploring Context. Chapel Hill, NC: National Implementation Research Network, FPG Child Development Institute, University of North Carolina at Chapel Hill.
Over the objection of the teachers' union, the Board of Education here on Thursday unanimously approved the nation's largest merit pay program, which calls for rewarding teachers based on how well their students perform on standardizes tests.
Blumenthal, R. (2006). Houston ties teachers’ pay to test scores. New York Times, 13.
The goal of this paper is to estimate the extent to which there is differential attrition based on teachers' value-added to student achievement.
Boyd, D., Grossman, P., Lankford, H., Loeb, S., & Wyckoff, J. (2008). Who leaves? Teacher attrition and student achievement. Working Paper No. 14022. Cambridge, MA: National Bureau of Economic Research. Retrieved from https://www.nber.org/papers/w14022
This paper examines New York City elementary school teachers’ decisions to stay in the same school, transfer to another school in the district, transfer to another district, or leave teaching in New York state during the first five years of their careers.
Boyd, D., Lankford, H., Loeb, S., & Wyckoff, J. (2005). Explaining the short careers of high-achieving teachers in schools with low-performing students. American Economic Review, 95(2), 166-171.
By estimating the effect of teacher attributes using a value-added model, the analyses in this paper predict that observable qualifications of teachers resulted in average improved achievement for students in the poorest decile of schools of .03 standard deviations.
Boyd, D., Lankford, H., Loeb, S., Rockoff, J., & Wyckoff, J. (2008). The narrowing gap in New York City teacher qualifications and its implications for student achievement in high‐poverty schools. Journal of Policy Analysis and Management: The Journal of the Association for Public Policy Analysis and Management, 27(4), 793-818.
This article is an extended reanalysis of high-stakes testing on achievement. The paper focuses on the performance of states, over the period 1992 to 2000, on the NAEP mathematics assessments for grades 4 and 8.
Braun, H. (2004). Reconsidering the impact of high-stakes testing. Education Policy Analysis Archives, 12(1).
This fourth edition provides in-depth treatments of critical measurement topics, and the chapter authors are acknowledged experts in their respective fields.
Brennan, R. L. (Ed.) (2006). Educational measurement (4th ed.). Westport, CT: Praeger Publishers.
This book divides itself naturally into two parts. The first part has to do with the situation in which Superintendent Brooks found himself, with his successful campaign in educating his teachers to use standardized tests, with the results which he obtained, with the way he used these results to grade his pupils, to rate his teachers, and to evaluate methods of teaching, and finally with the use he made of intelligence tests.
Brooks, S. S. (1905). Improving schools by standardized tests. Houghton Mifflin.
The National Board for Professional Teaching Standards (NBPTS) assesses teaching practice based on videos and essays submitted by teachers. They compared the performance of classrooms of elementary students in Los Angeles randomly assigned to NBPTS applicants and to comparison teachers.
Cantrell, S., Fullerton, J., Kane, T. J., & Staiger, D. O. (2008). National board certification and teacher effectiveness: Evidence from a random assignment experiment (No. w14608). National Bureau of Economic Research.
This paper review the main (four) critiques that have been made of international tests, as well as the rationales and education policy analyses accompanying these critiques. This brief also discusses a set of (four) critiques around the underlying social meaning and educational policy value of international test comparisons. These comparisons indicate how students in various countries score on a particular test, but do they carry a larger meaning? This paper also have some recommendations based on their critiques.
Carnoy, M. (2015). International Test Score Comparisons and Educational Policy: A Review of the Critiques. National Education Policy Center.
This study developed a zero-to-five index of the strength of accountability in 50 states based on the use of high-stakes testing to sanction and reward schools, and analyzed whether that index is related to student gains on the NAEP mathematics test in 1996–2000.
Carnoy, M., & Loeb, S. (2002). Does external accountability affect student outcomes? A cross-state analysis. Educational Evaluation and Policy Analysis, 24(4), 305-331.
This paper discusses the search for a “magic metric” in education: an index/number that would be generally accepted as the most efficient descriptor of school’s performance in a district.
Celio, M. B. (2013). Seeking the Magic Metric: Using Evidence to Identify and Track School System Quality. In Performance Feedback: Using Data to Improve Educator Performance (Vol. 3, pp. 97-118). Oakland, CA: The Wing Institute.
This report provides a practical “management guide,” for an evidence-based key indicator data decision system for school districts and schools.
Celio, M. B., & Harvey, J. (2005). Buried Treasure: Developing A Management Guide From Mountains of School Data. Center on Reinventing Public Education.
With Congress moving rapidly to revise the No Child Left Behind Act (NCLB), no issue has proven more contentious than whether the federal government should continue to require that states test all students in math and reading annually in grades three through eight.
Chingos, M. M., & West, M. R. (2015). Why Annual Statewide Testing Is Critical to Judging School Quality. Brookings Institution, Brown Center Chalkboard Series, January, 20.
The most recent incarnation of ESEA, signed into law in January of 2002 by President George W. Bush, is the No Child Left Behind Act (NCLB). We’re now 13 years into NCLB, so reauthorization is long overdue. It is not just the long delay that argues for congressional action, but the extent to which the Obama administration has replaced the provisions of the bill with its own set of priorities implemented through Race to the Top and state waivers.
Chingos, M. M., Dynarski, M., Whitehurst, G., & West, M. (2015, January 8). The case for annual testing. Brookings Institution
In this report, the author aim to provide an accessible introduction to these new measures of teaching quality and put them into the broader context of concerns over school quality and achievement gaps.
Corcoran, S. P. (2010). Can Teachers Be Evaluated by Their Students' Test Scores? Should They Be? The Use of Value-Added Measures of Teacher Effectiveness in Policy and Practice. Education Policy for Action Series. Annenberg Institute for School Reform at Brown University (NJ1).
One flashpoint in the incendiary debate over standardized testing in American public
schools is the area of test preparation. The focus of this chapter is test preparation in achievement testing and it's purportedly harmful effects on students and teachers.
Crocker, L. (2005). Teaching for the test: How and why test preparation is appropriate. Defending standardized testing, 159-174.
The Proficiency Illusion reveals that the tests that states use to measure academic progress under the No Child Left Behind Act are creating a false impression of success, especially in reading and especially in the early grades.
Cronin, J., Dahlin, M., Adkins, D., & Kingsbury, G. G. (2007). The Proficiency Illusion. Thomas B. Fordham Institute.
special education as a problem solving the nature of mild handicaps person-centered versus situation-centered problems.
Deno, S. L. (1989). Curriculum-based measurement and special education services: A fundamental and direct relationship.
Three concurrent validity studies were conducted to determine the relationship between performances on formative measures of reading and standardized achievement measures of reading.
Deno, S. L., Mirkin, P. K., & Chiang, B. (1982). Identifying valid measures of reading. Exceptional children, 49(1), 36-47.
On several issues, our analysis teases out nuances in public opinion by asking variations of questions to randomly selected segments of survey participants. We divided respondents at random into two or more segments and asked each group a different version of the same general question.
Education Next. (2019). Program on education policy and governance, survey 2019.
Based on the experiences of the Collaborative for Academic, Social, and Emotional Learning (CASEL) and reviews of literature addressing implementation failures, observations about failures to "scale up" are presented.
Elias, M. J., Zins, J. E., Graczyk, P. A., & Weissberg, R. P. (2003). Implementation, sustainability, and scaling up of social-emotional and academic innovations in public schools. School Psychology Review, 32(3), 303-319.
Curriculum-based measurement and performance assessments can provide valuable data for making special-education eligibility decisions. Reviews applied research on these assessment approaches and discusses the practical context of treatment validation and decisions about instructional services for students with diverse academic needs.
Elliott, S. N., & Fuchs, L. S. (1997). The Utility of Curriculum-Based Measurement and Performance Assessment as Alternatives to Traditional Intelligence and Achievement Tests. School Psychology Review, 26(2), 224-33.
The theory that measuring performance and coupling it to rewards and sanctions will cause schools and the individuals who work in them to perform at higher levels underpins performance based accountability systems. Such systems are now operating in most states and in thousands of districts, and they represent a significant change from traditional approaches to accountability.
Elmore, R. F., & Fuhrman, S. H. (2001). Holding schools accountable: Is it working?. Phi Delta Kappan, 83(1), 67-72.
A disproportionate reliance on SAT scores in college admissions has generated a growing number and volume of complaints. Some applicants, especially members of underrepresented minority groups, believe that the test is culturally biased. Other critics argue that high school GPA and results on SAT subject tests are better than scores on the SAT reasoning test at predicting college success, as measured by grades in college and college graduation.
Espenshade, T. J., & Chung, C. Y. (2010). Standardized admission tests, college performance, and campus diversity. Office of Population Research, Princeton University.
High-quality assessments are essential to effectively educating students, measuring progress, and promoting equity. Done well and thoughtfully, they provide critical information for educators, families, the public, and students themselves and create the basis for improving outcomes for all learners. Done poorly, in excess, or without clear purpose, however, they take valuable time away from teaching and learning, and may drain creative approaches from our classrooms.
Every Student Succeeds Act. (2017). Assessments under Title I, Part A & Title I, Part B: Summary of final regulations
This systematic review synthesizes the findings from 30 studies thatcompared the performance of students at schools using single‐trackyear‐round calendars to the performance of students at schools usinga traditional calendar.
Fitzpatrick, D., & Burns, J. (2019). Single‐track year‐round education for improving academic achievement in US K‐12 schools: Results of a meta‐analysis. Campbell Systematic Reviews, 15(3), e1053.
This report focuses on the performance of U.S. students2 in the major subject area of reading literacy by presenting results from a combined reading literacy scale and three reading literacy subscales: access and retrieve, integrate and interpret, and reflect and evaluate
Fleischman, H. L., Hopstock, P. J., Pelczar, M. P., & Shelley, B. E. (2010). Highlights from PISA 2009: Performance of U.S. 15-year-old students in reading, mathematics, and science literacy in an international context. (NCES 2011-004). Retrieved from National Center for Education Statistics website:http://nces.ed.gov/pubs2011/2011004.pdf
This assessment of the reliability and validity of skills analysis programs within curriculum-based measurement (CBM), with various groups of handicapped and nonhandicapped youngsters, indicated that the skills analysis programs in spelling and math provided consistent information that related well to the primary graphed CBM scores.
Fuchs, L. S. (1989). The Reliability and Validity of Skills Analysis within Curriculum-Based Measurement. Diagnostique, 14(4), 203-21.
30 special education teachers were assigned randomly to 3 groups: curriculum-based measurement (CBM) with expert system advice (CBM-ES), CBM with no expert system advice (CBM-NES), and control (i.e., no CBM).
Fuchs, L. S., Fuchs, D., Hamlett, C. L., & Allinder, R. M. (1991). Effects of expert system advice within curriculum-based measurement on teacher planning and student achievement in spelling. School Psychology Review.
This study assessed the effects of expert system instructional consultation within curriculum-based measurement (CBM).
Fuchs, L. S., Fuchs, D., Hamlett, C. L., & Ferguson, C. (1992). Effects of expert system consultation within curriculum-based measurement, using a reading maze task. Exceptional children, 58(5), 436-450.
The purpose of this study was to examine the effects of using computer software to store, graph, and analyze student performance data on teacher efficiency and satisfaction with curriculum-based progress-monitoring procedures.
Fuchs, L. S., Fuchs, D., Hamlett, C. L., & Hasselbring, T. S. (1987). Using computers with curriculum-based monitoring: Effects on teacher efficiency and satisfaction. Journal of Special Education Technology, 8(4), 14-27.
Examined the role of skills analysis (SA) in curriculum-based measurement (CBM) for the purpose of developing more effective instructional (mathematics) programs. 30 special education teachers implemented 1 of 3 treatments for 15 wks with a total of 91 mildly and moderately handicapped pupils.
Fuchs, L. S., Fuchs, D., Hamlett, C. L., & Stecker, P. M. (1990). The role of skills analysis in curriculum-based measurement in math. School Psychology Review.
This study examined the effectiveness of innovative curriculum-based measurement (CBM) classwide decision-making structures within general education mathematics instruction, with and without recommendations for how to incorporate CBM feedback into instructional planning.
Fuchs, L. S., Fuchs, D., Hamlett, C. L., Phillips, N. B., & Bentz, J. (1994). Classwide curriculum-based measurement: Helping general educators meet the challenge of student diversity. Exceptional Children, 60(6), 518-537.
The purpose of this study was to investigate technical features of a curriculum-based measurement (CBM) system that addresses a concepts and applications mathematics curriculum (i.e., number concepts, counting, applied computation, geometry, measurement, charts, graphs, money, and problem solving).
Fuchs, L. S., Fuchs, D., Hamlett, C. L., Thompson, A., Roberts, P. H., Kubek, P., & Stecker, P. M. (1994). Technical features of a mathematics concepts and applications curriculum-based measurement system. Diagnostique, 19(4), 23-49.
The purposes of this study were to examine how well 3 measures, representing 3 points on a traditional-alternative mathematics assessment continuum, interrelated and discriminated students achieving above, at, and below grade level and to explore effects of cooperative testing for the most innovative measure (performance assessment).
Fuchs, L. S., Fuchs, D., Karns, K., Hamlett, C., Katzaroff, M., & Dutka, S. (1998). Comparisons among individual and cooperative performance assessments and other measures of mathematics competence. The Elementary School Journal, 99(1), 23-51.
This study assessed the efficiency of and teacher satisfaction with curriculum-based measurement (CBM) when student performance data are collected by teachers or by computers.
Fuchs, L. S., Hamlett, C. L., Fuchs, D., Stecker, P. M., & Ferguson, C. (1988). Conducting curriculum-based measurement with computerized data collection: Effects on efficiency and teacher satisfaction. Journal of Special Education Technology, 9(2), 73-86.
The purpose of this study was to assess the effects of (a) ongoing, systematic assessment of student growth (i.e., curriculum-based measurement) and (b) expert system instructional consultation on teacher planning and student achievement in the area of mathematics operations.
Fuchs, L. S., Hamlett, D. F. C. L., & Stecker, P. M. (1991). Effects of curriculum-based measurement and consultation on teacher planning and student achievement in mathematics operations. American educational research journal, 28(3), 617-641.
Although recent data indicate that the learning losses this fall, compared with the same period last year, have not been as dire as predicted, those results likely mask high numbers of missing kids — children who lack technology for online learning or whose parents are unable to supervise their remote schooling.
Gabor, A. (2020, December 27). Education secretary’s first task: Curb standardized tests. Pittsburgh Post-Gazette
The entrenchment of standardized assessment in America's schools reflects its emergence from the dual traditions of democratic school reform and scientific measurement. Within distinct sociohistorical contexts, ambitious testing pioneers persuaded educators and policymakers to embrace the standardized testing movement.
Gallagher, C. J. (2003). Reconciling a tradition of testing with a new learning paradigm. Educational Psychology Review, 15(1), 83-99.
High-school grades are often viewed as an unreliable criterion for college admissions, owing to differences in grading standards across high schools, while standardized tests are seen as methodologically rigorous, providing a more uniform and valid yardstick for assessing student ability and achievement. The present study challenges that conventional view. The study finds that high-school grade point average (HSGPA) is consistently the best predictor not only of freshman grades in college, the outcome indicator most often employed in predictive-validity studies, but of four-year college outcomes as well.
Geiser, S., & Santelices, M. V. (2007). Validity of High-School Grades in Predicting Student Success beyond the Freshman Year: High-School Record vs. Standardized Tests as Indicators of Four-Year College Outcomes. Research & Occasional Paper Series: CSHE. 6.07. Center for studies in higher education.
In one three-week period, a pandemic has completely changed the national landscape on assessment.
Gewertz, C. (2020). It’s official: All states have been excused from statewide testing this year. Education Week.
This study examined the academic and demographic profile of the pool of prospective teachers and then explored how this profile is affected by teacher testing.
Gitomer, D. H., Latham, A. S., & Ziomek, R. (1999). The academic quality of prospective teachers: The impact of admissions and licensure testing. Princeton, NJ: Educational Testing Service. Retrieved from http://www.ets.org/Media/Research/pdf/RR-03-35.pdf
This paper provides the first empirical examination of National Council on Teacher Quality (NCTQ) ratings, beginning with a descriptive overview of the ratings and documentation of how they evolved from 2013-2016, both in aggregate and for programs with different characteristics.
Goldhaber, D., & Koedel, C. (2019). Public Accountability and Nudges: The Effect of an Information Intervention on the Responsiveness of Teacher Education Programs to External Ratings. American Educational Research Journal, 0002831218820863.
Examined the forecasting accuracy of 2 slope estimation procedures (ordinary-least-squares regression and split-middle trend lines) for reading curriculum-based measurement (CBM), a behavioral approach to the assessment of academic skills that emphasizes the direct measurement of academic behaviors.
Good, R. H., & Shinn, M. R. (1990). Forecasting accuracy of slope estimates for reading curriculum-based measurement: Empirical evidence. Behavioral Assessment.
This report provides information about new teachers' preparation experiences and explores
whether particular types of experiences are related to teachers' effectiveness in improving
their students' test scores. Prior research indicates that teaching effectiveness is the largest
in-school factor affecting student achievement.
Goodson, B., Caswell, L., Price, C., Litwok, D., Dynarski, M., Crowe, E., ... & Rice, A. (2019). Teacher Preparation Experiences and Early Teaching Effectiveness. Executive Summary. NCEE 2019-4010. National Center for Education Evaluation and Regional Assistance.
This policy proposal I suggest (1) reforms to ensure that the Title I formula gets enough resources to the neediest areas, and (2) improvements in federal guidance and fiscal compliance outreach efforts so that local districts understand the flexibility they have to spend effectively. These are first-order issues for improving high-poverty schools, but so deeply mired in technical and bureaucratic detail that they have received little public attention in the re-authorization process.
Gordon, N. (2016). Increasing targeting, flexibility, and transparency in Title I of the Elementary and Secondary Education Act to help disadvantaged students. Policy Proposal, 1.
A significant and eye-opening examination of the current state of the testing movement in the
United States, where more than 150 million standardized intelligence, aptitude, and
achievement tests are administered annually by schools, colleges, business and industrial
firms, government agencies, and the military services.
Goslin, D. A. (1963). The search for ability: Standardized testing in social perspective (Vol. 1). Russell Sage Foundation.
Discusses the uses and abuses of intelligence testing in our educational systems. Dr. Goslin
examines teachers' opinions and practices with regard to tests and finds considerable
discrepancies between attitude and behavior.
Goslin, D. A. (1967). Teachers and testing. Russell Sage Foundation.
This study examines whether the results of standardized tests are distorted when rewards and sanctions are attached to them.
Greene, J., Winters, M., & Forster, G. (2004). Testing high-stakes tests: Can we believe the results of accountability tests?. The Teachers College Record, 106(6), 1124-1144.
In schools throughout the country, it is testing season--time for students to take the Big Standardized Test (the PARCC, SBA, or your state's alternative). This ritual really blossomed way back in the days of No Child Left Behind, but after all these years, teachers are mostly unexcited about it. There are many problems with the testing regimen, but a big issue for classroom teachers is that the tests do not help the teacher do her job.
Greene, P. (2019, April 24). Why the big standardized test is useless for teachers. Forbes
When schools pushed the pandemic pause button last spring, one of the casualties was the annual ritual of taking the Big Standardized Test. There were many reasons to skip the test, but in the end, students simply weren’t in school during the usual testing time
Greene, P. (2020, August 14). Schools should scrap the big standardized test this year. Forbes
This paper describe a few promising assessment technologies tat allow us to capture more direct, repeated, and contextually based measures of student learning, and propose an improvement-oriented approach to teaching and learning.
Greenwood, C. R., & Maheady, L. (1997). Measurable change in student performance: Forgotten standard in teacher preparation?. Teacher Education and Special Education, 20(3), 265-275.
Test-based accountability systems that attach high stakes to standardized test results have
raised a number of issues on educational assessment and accountability. Do these high-
stakes tests measure student achievement accurately? How can policymakers and
educators attach the right consequences to the results of these tests? And what kinds of
tradeoffs do these testing policies introduce?
Hamilton, L. S., Stecher, B. M., & Klein, S. P. (2002). Making sense of test-based accountability in education. Rand Corporation.
The every student succeeds act (ESSA), passed into law in 2015, explicitly prohibits the federal government from creating incentives to set national standards. The law represents a major departure from recent federal initiatives, such as Race to the Top, which beginning in 2009 encouraged the adoption of uniform content standards and expectations for performance.
Hamlin, D., & Peterson, P. E. (2018). Have states maintained high expectations for student performance? An analysis of 2017 state proficiency standards. Education Next, 18(4), 42-49.
This paper provides direct evidence about the impacts of school job matching on productivity and student achievement.
Hanushek, E. A., & Rivkin, S. G. (2010). Constrained job matching: Does teacher job search harm disadvantaged urban schools? Working Paper No. 15816. Cambridge, MA: National Bureau of Economic Research. Retrieved from https://www.nber.org/papers/w15816.pdf
The authors study the effects of various types of education and training on the ability of teachers to promote student achievement.
Harris, D. N., & Sass, T. R. (2011). Teacher training, teacher quality and student achievement. Journal of Public Economics, 95(7–8), 798-812.
This book provides a complete guide to implementing a wide range of problem-solving assessment methods: functional behavioral assessment, interviews, classroom observations, curriculum-based measurement, rating scales, and cognitive instruments.
Harrison, P. L. (2012). Assessment for intervention: A problem-solving approach. Guilford Press.
this report aims to provide the public, along with teachers and leaders in the Great City Schools, with objective evidence about the extent of standardized testing in public schools and how these assessments are used.
Hart, R., Casserly, M., Uzzell, R., Palacios, M., Corcoran, A., & Spurgeon, L. (2015). Student Testing in America's Great City Schools: An Inventory and Preliminary Analysis. Council of the Great City Schools.
Testing in the nation's schools is among the most debated issues in public education today.
Much of this discussion has centered on how much we are testing students and how we use
test results to evaluate teachers, inform instructional practice, and hold schools and
Hart, R., Casserly, M., Uzzell, R., Palacios, M., Corcoran, A., & Spurgeon, L. (2015). Student Testing in America's Great City Schools: An Inventory and Preliminary Analysis. Council of the Great City Schools.
This paper summarizes recent evidence on what achievement tests measure; how achievement tests relate to other measures of "cognitive ability" like IQ and grades; the important skills that achievement tests miss or mismeasure, and how much these skills matter in life.
Heckman, J. J., & Kautz, T. (2012). Hard evidence on soft skills. Labour economics, 19(4), 451-464.
The report considers the appropriate uses and misuses of high stakes tests in making decisions for students. The fundamental question is whether test scores lead to consequences that are educationally beneficial.
Heubert, J. P., & Hauser, R. M. (1998). High stakes: Testing for tracking, promotion, and graduation. Retrieved from http://files.eric.ed.gov/fulltext/ED439151.pdf
This meta-analysis examines issues of reliability and validity of SAT tests and student grades on student performance in college.
Hezlett, S., Kuncel, N., Vey, A., Ones, D., Campbell, J. & Camara, W. (2001). “The effectiveness of the SAT in predictive success early and late in college: A comprehensive meta-analysis.” Paper presented at the annual meeting of the National Council of Measurement in Education, Seattle, WA.
The purpose of this study is to compare different statistical and methodological approaches to standard setting and determining cut scores using R- CBM and performance on high-stakes tests
Hintze, J. M., & Silberglitt, B. (2005). A longitudinal examination of the diagnostic accuracy and predictive validity of R-CBM and high-stakes testing. School Psychology Review, 34(3), 372.
A meta-analysis on the relationship between the Implicit Association Test (IAT) and corresponding explicit self-report measures was conducted.
Hofmann, W., Gawronski, B., Gschwendner, T., Le, H., & Schmitt, M. (2005). A meta-analysis on the correlation between the Implicit Association Test and explicit self-report measures. Personality and Social Psychology Bulletin, 31(10), 1369-1385.
The School-Wide Evaluation Tool (SET; Sugai, Lewis-Palmer, Todd, & Horner, 2001) was created to provide a rigorous measure of primary prevention practices within school-wide behavior support. In this article, the authors describe the SET and document its psychometric characteristics.
Horner, R. H., Todd, A. W., Lewis-Palmer, T., Irvin, L. K., Sugai, G., & Boland, J. B. (2004). The school-wide evaluation tool (SET) a research instrument for assessing school-wide positive behavior support. Journal of Positive Behavior Interventions, 6(1), 3-12.
In the last 20 years, international surveys assessing learning in reading, mathematics and science have been headline news because they put countries in rank order according to performance. The three most well known surveys are TIMSS, PISA and PIRLS. The survey offer information about international performances for the use of others in order to drive up education standards everywhere. They also emphasise that their aim is to facilitate dissemination of ideas on which features of education systems lead to the best performances.
International surveys TIMSS, PISA, PIRLS. (2017). Cambridge Assessment international Education.
This study evaluated the effects of high stakes testing on the achievement levels of students in Chicago Public Schools. The data suggests that even though scores went up on the high stakes tests scores on “low stakes” achievement tests did not improve. This suggests increases in scores was a function increases in test-specific skills rather than a general improvement in student learning. These findings give credence to the “teaching to the test” criticisms.
Jacob, B. A. (2005). Accountability, incentives and behavior: The impact of high-stakes testing in the Chicago Public Schools. Journal of public Economics. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.401.6599&rep=rep1&type=pdf
This article show evidence of ACT scores drop on 2016. ACT officials attribute the drop to the increasing percentage of high school seniors who have taken the test. Generally, when a larger share of students take a test - in some cases encouraged by state requirements more than the students necessarily being college ready - scores go down.
Jaschnik, S. (2016, August). ACT Scores Drop as More Take Test. Inside Higher Ed. Retrieved from https://www.insidehighered.com/news/2016/08/24/average-act-scores-drop-more-people-take-test
The authors highlight an under-appreciated weakness of that approach-he imprecision of school-level test score means -- and propose a method for a better discerning signal from noise in annual school report cards.
Kane, T. J., & Staiger, D. O. (2001). Improving school accountability measures (No. w8156). National Bureau of Economic Research.
In recent years, most states have constructed elaborate accountability systems using school-level test scores. We evaluate the implications for school accountability systems. For instance, rewards or sanctions for schools with scores at either extreme primarily affect small schools and provide weak incentives to large ones.
Kane, T. J., & Staiger, D. O. (2002). The promise and pitfalls of using imprecise school accountability measures. Journal of Economic perspectives, 16(4), 91-114.
This report presents an in-depth discussion of the analytical methods and findings from the Measures of Effective Teaching (MET) project’s analysis of classroom observations.1 A nontechnical companion report describes implications for policymakers and practitioners.
Kane, T. J., & Staiger, D. O. (2012). Gathering Feedback for Teaching: Combining High-Quality Observations with Student Surveys and Achievement Gains. Research Paper. MET Project. Bill & Melinda Gates Foundation.
The Kansas State Board of Education's Quality Performance Accreditation system is described. Unlike past accreditation methods, which focused on the facilities or institutional characteristics, Quality Performance Accreditation accredits schools based on student performance.
Kansas State Board of Education (199). Kansas Quality Performance Accreditation. Topeka: Author.
This article discusses key issues in identifying evidence-based treatments for children and adolescents. Among the issues discussed are obstacles in transporting treatments from research to clinical services, the weak criteria for delineating whether a treatment is evidence based, and barriers to training therapists.
Kazdin, A. E. (2004). Evidence-based treatments: Challenges and priorities for practice and research. Child and Adolescent Psychiatric Clinics, 13(4), 923-940.
The accidental education benefits of Covid-19.
Kohn, A. (2020, August 18). The accidental education benefits of Covid-19. Education Week.
For decades we’ve been studying, experimenting with, and wrangling over different approaches to improving public education, and there’s still little consensus on what works, and what to do. The one thing people seem to agree on, however, is that schools need to be held accountable—we need to know whether what they’re doing is actually working.
Koretz, D. (2017). The testing charade. University of Chicago Press.
This study evaluated the generalization from high stakes tests to other mesures of achievement. The results suggest that there is little generalization suggesting that improvement in high stakes test scores are the result of emphasis placed on the tests and time spent in test preparation rather than actual increase in student learning.
Koretz, D. M. (1991). The Effects of High-Stakes Testing on Achievement: Preliminary Findings about Generalization across Tests. ERIC. Retrieved from http://files.eric.ed.gov/fulltext/ED340730.pdf
The research reported here investigated the effects of Maryland School Performance Assessment Program (MSPAP) by surveying teachers and principals in two of the three grades in which MSPAP is administered.
Koretz, D., Mitchell, K., Barron, S., & Keith, S. (1996). The perceived effects of the Maryland school performance assessment program. Los Angeles, CA: Center for Research on Evaluation, Standards, and Student Assessment (University of California at Los Angeles).
In recent years, states have sought to increase accountability for public school teachers by implementing a package of reforms centered on high-stakes evaluation systems. We examine the effect of these reforms on the supply and quality of new teachers.
Kraft, M. A., Brunner, E. J., Dougherty, S. M., & Schwegman, D. J. (2020). Teacher accountability reforms and the supply and quality of new teachers. Journal of Public Economics, 188, 104212.
An up-to-date, practical, reader-friendly resource that will help readers navigate today's seemingly ever-changing and complex world of educational testing, assessment, and measurement. The 11th edition presents a balanced perspective of educational testing and assessment, informed by developments and the ever increasing research base.
Kubiszyn, T., & Borich, G. (1987). Educational testing and measurement. Glenview, IL: Scott, Foresman.
By using data collected for the National Longitudinal Evaluation of Comprehensive School Reform (NLECSR), this article explores the factors that predict CSR model implementation and the ways that CSR model implementation varies.
Kurki, A., Boyle, A., & Aladjem, D. K. (2006). Implementation: Measuring and explaining the fidelity of CSR implementation. Journal of Education for Students Placed at Risk, 11(3-4), 255-277.
We did not ask the panel to weigh in on debates over the role of state exams and accountability systems during a pandemic. However, the panel did discuss how districts and states can balance the goal of informing teachers and parents about individual students with the continued need to track system and school progress. The challenges in the upcoming school year will shine a harsh light on the variation in skills, knowledge, and needs that were always present.
Lake, R., & Olson, L. (2020). Learning as We Go: Principles for Effective Assessment during the COVID-19 Pandemic. Center on Reinventing Public Education.
The purpose of this study was to examine the validity of teacher evaluation scores that are derived from an observation tool, adapted from Danielson's Framework for Teaching, designed to assess 22 teaching components from four teaching domains.
Lash, A., Tran, L., & Huang, M. (2016). Examining the Validity of Ratings from a Classroom Observation Instrument for Use in a District's Teacher Evaluation System. REL 2016-135. Regional Educational Laboratory West.
what does it mean to take a scientific approach to instructional productivity? This chapter hopes to contribute to that discussion by examining the role scientific assessment can play in enhancing educational productivity.
Layng, T. J., Stikeleather, G., & Twyman, J. S. (2006). Scientific formative evaluation: The role of individual learners in generating and predicting successful educational outcomes. The scientific basis of educational productivity, 29-44.
In undertaking this study, two goals were established: (1) to obtain a better understanding of how much time students spend taking tests; and (2) to identify the degree to which the tests are mandated by districts or states.
Lazarín, M. (2014). Testing Overload in America's Schools. Center for American Progress.
This study examines the relationship between two dominant measures of teacher quality, teacher qualification and teacher effectiveness (measured by value-added modeling), in terms of their influence on students’ short-term academic growth and long-term educational success (measured by bachelor’s degree attainment).
Lee, S. W. (2018). Pulling back the curtain: Revealing the cumulative importance of high-performing, highly qualified teachers on students’ educational outcome. Educational Evaluation and Policy Analysis, 40(3), 359–381.
During the past two decades, performance-based accountability systems (PBASs), which link financial or other incentives to measured performance as a means of improving services, have gained popularity among policymakers
Leuschner, K. J. (2010). Are Performance-Based Accountability Systems Effective? Evidence from Five Sectors. Research Brief. RAND Corporation.
This paper explore the power of behavioral economics to influence the level of effort exerted by students in a low stakes testing environment. This paper find a substantial impact on test scores from incentives when the rewards are delivered immediately. There is suggestive evidence that rewards framed as losses outperform those framed as gains.
Levitt, S. D., List, J. A., Neckermann, S., & Sadoff, S. (2016). The behavioralist goes to school: Leveraging behavioral economics to improve educational performance. American Economic Journal: Economic Policy, 8(4), 183-219.
Using a randomized control trial in 11 Chinese primary schools, we studied the effects of pay-for-grades programs on academic cheating. We randomly assigned 82 classrooms into treatment or control conditions, and used a statistical algorithm to determine the occurrence of cheating.
Li, T., & Zhou, Y. (2019). Do Pay-for-Grades Programs Encourage Student Academic Cheating? Evidence from a Randomized Experiment. Frontiers of Education in China, 14(1), 117-137.
The College board was recently released SAT scores for the high school graduating class of 2015. Both math and reading scores declined from 2014, continuing a steady downward trend that has been in place for the past decade. Pundits of contrasting political stripes seized on the scores to bolster their political agendas. Petrilli argued that falling SAT scores show that high schools needs more reform. For Burris, the declining scores were evidence of the failure of policies her organization opposes. This articles pointing out that SAT was never meant to measure national achievement and provide detail explanation.
Loveless, T. (2015). No, the sky is not falling: Interpreting the latest SAT scores. Brown Center Chalkboard. Retrieved from https://www.brookings.edu/blog/brown-center-chalkboard/2015/10/01/no-the-sky-is-not-falling-interpreting-the-latest-sat-scores/
This Brown Center Report (BCR) on American Education is the sixth and final edition in the third volume and the 16th issue overall. The series began in 2000. As in the past, the report comprises three studies. Also in keeping with tradition, the first section features recent results from state, national, or international assessments; the second section investigates a thematic topic in education, either by collecting new data or by analyzing existing empirical evidence in a novel way; and the third section looks at one or more education policies.
Loveless, T. (2017). How Well Are American Students Learning? With Sections on the Latest International test Scores, Foreign Exchange Students, and School Suspensions. The 2-17 Brown Center Report on American Education. Retrieved from https://www.brookings.edu/wp-content/uploads/2017/03/2017-brown-center-report-on-american-education.pdf
The long term trend test of the National Assessment of Educational Progress (LTT NAEP) is the longest running test of student achievement that provides a scientifically valid estimate of what American students have learned.
Loveless, T., (2016, October 17). The strange case of the disappearing NAEP. Brookings Institution
This paper is an examination of the impact of high stakes testing on minority students. The outcomes suggest that high stakes testing does not have a positive impact on minority students and in some instances there is negative effects from high stakes testing.
Madaus, G. F., & Clarke, M. (2001). The Adverse Impact of High Stakes Testing on Minority Students: Evidence from 100 Years of Test Data. ERIC. Retrieved from http://files.eric.ed.gov/fulltext/ED450183.pdf
This study presents such an approach where the impact of regular and special education on 11 mildly handicapped children is studied by analyzing their slope of improvement on weekly curriculum-based measures (CBM) reading scores.
Marston, D. (1988). The effectiveness of special education: A time series analysis of reading performance in regular and special education settings. The Journal of Special Education, 21(4), 13-26.
there exists a serious need to examine alternative testing models for making educational decisions. in this chapter, this need is documented from the perspective that the traditional model has failed education in two major ways, from the technical level and from a social policy level. curriculum-based measurement procedures are proposed to redress some of the issues in these domains
Marston, D. B. (1989). A curriculum-based measurement approach to assessing academic performance: What it is and why do it. In M. R. Shinn (Ed.), The Guilford school practitioner series. Curriculum-based measurement: Assessing special children (pp. 18-78). New York, NY, US: Guilford Press.
In a series of two studies, the relative sensitivity of traditional standardized achievement tests and alternative curriculum-based measures was assessed.
Marston, D., Fuchs, L. S., & Deno, S. L. (1986). Measuring pupil progress: A comparison of standardized achievement tests and curriculum-related measures. Diagnostique, 11(2), 77-90.
School closings and the ever-increasing number of deaths provide the backdrop for a proposal by the Center for American Progress (CAP) to deny waivers of the federally mandated administration of standardized tests in spring 2021. Further, the federal government proposes to add to those assessments in ways that CAP argues would make the test results more useful.
Mathis, W. J., Berliner, D. C., & Glass, G. V. NEPC Review: Student Assessment During COVID-19 (Center for American Progress, September 2020).
This study examines the relationship between scores on the SAT and retention to second year of college using student level data from the freshman class of 2006 at 106 four-year institutions.
Mattern, K. D., & Patterson, B. F. (2009). Is performance on the SAT related to college retention?.
The Common Core. Just last year, according to a Gallup poll, most Americans had never heard of the Common Core State Standards Initiative, or "Common Core," new guidelines for what kids in grades K–12 should be able to accomplish in reading, writing, and math. Designed to raise student proficiencies so the United States can better compete in a global market, the standards were drafted in 2009 by a group of academics and assessment specialists at the request of the National Governors Association and the Council of Chief State School Officers.
McArdle, E. (2014). What happened to the Common Core. Harvard Ed. Magazine, 14.
This note provides a brief review of work to address the challenges of measuring output and productivity in the education sector, with attention also to issues related to the increasing use of technology in the provision of education services.
McGivney, E., & Foda, K. (n.d.). Productivity measurement in the education sector. Washington, DC: Brookings Institution. https://www.brookings.edu/wp-content/uploads/2017/12/productivity-measurement-in-education.pdf
This study investigated communicative strategies for helping female students cope with ‘‘stereotype threat’’. The results demonstrate that priming a positive achieved identity (e.g., private college student) can subdue stereotype threat associated with an ascribed identity (e.g., female).
McGlone, M. S., & Aronson, J. (2007). Forewarning and forearming stereotype-threatened students. Communication Education, 56(2), 119-133.
Assessment, or testing, fulfills a vital role in today’s educational environment. Assessment results often are a major force in shaping public perceptions about the capabilities of our students and the quality of our schools. As a primary tool for educators and policymakers, assessment is used for many important purposes.
Missouri Department of Elementary and Secondary Education. (2019). Missouri Assessment Program: Grade level assessments.
Assessments used in Missouri are designed to measure how well students acquire the skills and knowledge described in Missouri’s Learning Standards (MLS). The assessments yield information on academic achievement at the student, class, school, district and state levels.
Missouri Department of Elementary and Secondary Education. (2020). Missouri Assessment Program
The goal of this guide is to provide useful information about standardized testing, or assessment, for practitioners and non-practitioners who care about public schools. It includes the nature of assessment, types of assessments and tests, and definitions.
Mitchell, R. (2006). A guide to standardized testing: The nature of assessment. Center for Public Education.
The Classroom Environment Scale (CES) helps create a positive school climate in which more students succeed. The instrument evaluates the effects of course content, teaching methods, teacher personality, class composition and characteristics of the overall classroom environment.
Moors, R.H., & Trickett, E. J. (1979). Classroom Environment Scale Manual (2nd Ed.). Palo Alto, CA: Consulting Psychologists Council.
This study examines validity data for SAT scores and student grades enrolling classes of 1976 to 1985.
Morgan, R. (1989). “Analysis of the predictive validity of the SAT and high school grades from 1976 to 1983.” College Board Report No. 89-7. New York: College Board.
This article describe about the drop down of SAT score in 2016.
Mulhere, K. (2016, September). SAT Scores Take a Dip. Money. Retrieved from http://money.com/money/4508286/average-sat-scores-class-2016/
The measurement unit disability-adjusted life years (DALYs), used in recent years to quantify the burden of diseases, injuries and risk factors on human populations, is grounded on cogent economic and ethical principles and can guide policies toward delivering more cost-effective and equitable health care.
Murray, C. J., & Acharya, A. K. (1997). Understanding DALYs. Journal of health economics, 16(6), 703-730.
How did U.S. students perform on the most recent assessments? Select a jurisdiction and a result to see how students performed on the latest NAEP assessments.
National Assessment of Education Progress (NAEP). (2020) Nation’s report card. Washington, DC: National Center for Education Statistics, Institute of Education Sciences, U.S. Department of Education
National Assessment of Educational Progress (NAEP). (2011b). The nation's report card: Reading grade 12 national results. Retrieved from http://nationsreportcard.gov/ reading_2009/gr12_national.asp?subtab_id=Tab_3&tab_id=tab2#
This non-technical brochure provides introductory information on the development, administration, scoring, and reporting of the National Assessment of Educational Progress (NAEP). The brochure also provides information about the online resources available on the NAEP website.
National Center for Education Statistics (NCES). (2010a). An introduction to NAEP. (NCES 2010-468). Retrieved from National Center for Education Statistics website: https://nces.ed.gov/pubsearch/pubsinfo.asp?pubid=2010468
Twelfth-graders’ performance in reading and mathematics improves since 2005. Nationally representative samples of twelfth-graders from 1,670 public and private schools across the nation participated in the 2009 National Assessment of Educational Progress (NAEP).
National Center for Education Statistics (NCES). (2010b). The nation’s report card: Grade 12 reading and mathematics 2009 national and pilot state results. (NCES 2011-455). Retrieved http://nces.ed.gov/nationsreportcard/pdf/main2009/2011455.pdf
The Data Explorer for the Long-Term Trend assessments provides national mathematics and reading results dating from the 1970s.
National Center for Education Statistics (NCES). (2011a). Data explorer for long-term trend. [Data fle]. Retrieved from http://nces.ed.gov/nationsreportcard/lttdata/
Nationally representative samples of 209,000 fourth-graders and 175,200 eighth-graders participated in the 2011 National Assessment of Educational Progress (NAEP) in mathematics.
National Center for Education Statistics (NCES). (2011d). The nation’s report card: mathematics 2011. (NCES 2012-458). Retrieved from http://nces.ed.gov/nationsreportcard/ pdf/main2011/2012458.pdf
Percentages of students meeting state proficiency standards and performing at or above the NAEP Proficient level, by subject, grade, and state: 2009
National Center for Education Statistics (NCES). (2011f). Students meeting state profciency standards and performing at or above the NAEP profcient level: 2009. Retrieved from http://nces.ed.gov/nationsreportcard/studies/statemapping/2009_naep_state_table.asp
In this new national writing assessment sample, 24,100 eighth-graders and 28,100 twelfthgraders engaged with writing tasks and composed their responses on computer. The assessment tasks reflected writing situations common to both academic and workplace settings and asked students to write for several purposes and communicate to different audiences.
National Center for Education Statistics. (2012). The nation's report card: Writing 2011 (NCES 2012-470).
This book looks at how testing affects critical decisions for American students. The text focuses on how testing is used in schools to make decisions about tracking and placement, promotion and retention, and awarding or withholding high school diplomas. This book examines the controversies that emerge when a test score can open or close gates on a student's educational pathway.
National Research Council. (1999). High Stakes: Testing for Tracking, Promotion, and Graduation. Washington, DC: National Academies Press.
The paper examines Campbell’s law-the more any quantitative social indicator is used for social decision making the more likely the measure will corrupt the social processes it is intended to monitor.” In education, high stakes testing has resulted in widespread cheating, exclusion from low performing students from testing, encouraging students to drop out, and narrowing the curriculum.
Nichols, S. L., & Berliner, D. C. (2005). The Inevitable Corruption of Indicators and Educators through High-Stakes Testing. Education Policy Research Unit. Retrieved from http://files.eric.ed.gov/fulltext/ED508483.pdf
This study evaluated the relationship between increased accountability via high stakes testing and progress on other measures of educational progress. The results suggests that high stakes testing does not have any positive impact on student achievement.
Nichols, S. L., Glass, G. V., & Berliner, D. C. (2006). High-stakes testing and student achievement: Does accountability pressure increase student learning. education policy analysis …. Retrieved from http://epaa.asu.edu/ojs/index.php/epaa/article/download/72/198
School reformers and state and federal policymakers turned to standardized testing over the years to get a clearer sense of the return on a national investment in public education that reached $680 billion in 2018-19. They embraced testing to spur school improvement and to ensure the educational needs of traditionally underserved students were being met.
OLSON, L., & JERALD, C. (2020). THE BIG TEST.
This volume of PISA 2009 results looks at the progress countries have made in raising student performance and improving equity in the distribution of learning opportunities.
Organisation for Economic Co-operation and Development (OECD). (2010a). PISA 2009 results: Learning trends–Changes in student performance since 2000 (Volume V). Retrieved from https://www.oecd-ilibrary.org/education/pisa-2009-results-learning-trends_9789264091580-en
The OECD’s Programme for International Student Assessment (PISA) surveys, which take place every three years, have been designed to collect information about 15-year-old students in participating countries.
Organization for Economic Co-operation and Development (OECD). (2006). PISA 2006 technical report. Retrieved from http://www.oecd.org/pisa/pisaproducts/42025182.pdf
Student engagement at school and whether students feel hopeful about their future are far better factors to consider when evaluating schools than using standardized test scores, according to the results of the 47th annual PDK/Gallup Poll of the Public’s Attitudes Toward the Public Schools.
PDK/Gallup Poll (2015). Testing lacks public support. Phi Delta Kappan, 97(1), 8–10.
The education reform movement of the past two decades has focused on raising academic
standards. Some standards advocates attach a testing mechanism to gauge the extent to
which high standards are actually accomplished, whereas some critics accuse the push for
standards and testing of impeding reform and perpetuating inequality.
Phelps, R. (2005). Defending standardized testing. Psychology Press.
The Standardized Testing Primer provides non-specialists with a thorough overview of this
controversial and complicated topic. It eschews the statistical details of scaling, scoring, and
measurement that are widely available in textbooks and at testing organization Web sites,
and instead describes standardized testing's social and political roles and its practical uses-
who tests, when, where, and why.
Phelps, R. P. (2007). Standardized testing primer (Vol. 21). Peter Lang.
American teachers are feeling enormous pressure these days to raise their students' scores
on high-stakes tests. As a consequence, some teachers are providing classroom instruction
that incorporates, as practice activities, the actual items on the high-stakes tests.
Popham, W. J. (2001). Teaching to the Test?. Educational leadership, 58(6), 16-21.
Donald Campbell was an American social psychologist and noted experimental social science researcher who did pioneering work on methodology and program evaluation. He has also become—posthumously—an unlikely hero of the anti-testing and accountability movement in the United States.
Porter-Magee, K. (2013, February 26). Trust but verify: The real lessons of Campbell’s Law. Thomas B. Fordham Institute.
Former Assistant Secretary of Education Diane Ravitch was once an early advocate of No Child Left Behind, school vouchers and charter schools. No Child Left Behind required schools to administer yearly state standardized tests. Student progress on those tests was measured to see if the schools met their Adequate Yearly Progress goals. or AYP. Schools missing those goals for several years in a row could be restructured, replaced or shut down.
Ravitch, D. (2011). Standardized testing undermines teaching. National Public Radio.
The authors estimate racial/ethnic achievement gaps in several hundred metropolitan areas and several thousand school districts in the United States using the results of roughly 200 million standardized math and English language arts (ELA) tests administered to public school students from 2009 to 2013. They show that the strongest correlates of achievement gaps are local racial/ethnic differences in parental income and educational attainment, local average parental education levels, and patterns of racial/ethnic segregation, consistent with a theoretical model in which family socioeconomic factors affect educational opportunity partly through residential and school segregation patterns.
Reardon, S. F., Kalogrides, D., & Shores, K. (2019). The geography of racial/ethnic test score gaps. American Journal of Sociology, 124(4), 1164-1221.
In this paper, we analyze racial differences in the math section of the general SAT test, using publicly available College Board population data for all of the nearly 1.7 million college-bound seniors in 2015 who took the SAT. The evidence for a stubborn race gap on this test does meanwhile provide a snapshot into the extraordinary magnitude of racial inequality in contemporary American society. Standardized tests are often seen as mechanisms for meritocracy, ensuring fairness in terms of access. But test scores reflect accumulated advantages and disadvantages in each day of life up the one on which the test is taken. Race gaps on the SAT hold up a mirror to racial inequities in society as a whole. Equalizing educational opportunities and human capital acquisition earlier is the only way to ensure fairer outcomes.
Reeves, R. V., Halikias, D. (2017). Race Gap in SAT scores highlight inequality and Hinder Upward Mobility. Brookings. Retrieved from https://www.brookings.edu/research/race-gaps-in-sat-scores-highlight-inequality-and-hinder-upward-mobility/
This article show the evidence for a race gap on the SAT math score and some big issues at stake including: the value of the SAT itself; the case for broader policies to take into account socioeconomic background in college admissions; the obsession with four-year college degrees; and the danger of college as a “bottleneck” in the American opportunity structure.
Reeves, Richard. (2017, February). Race Gap in SAT Math Score are as big as Ever. Brown Center Chalkboard. Retrieved from https://www.brookings.edu/blog/brown-center-chalkboard/2017/02/01/race-gaps-in-sat-math-scores-are-as-big-as-ever/
This review examined the overlap between state-created curriculum evaluation tools and The Hexagon Tool created by the National Implementation Research Network. The author followed systematic procedures while conducting a web search and visiting each state’s department of education website in search of curriculum evaluation tools.
Rolf, R., R. (2019). State Department of Education Support for Implementation Issues Faced by School Districts during the Curriculum Adoption Process. Oakland, CA: The Wing Institute. https://www.winginstitute.org/student-research-2019.
Amrein and Berliner (2002b) compared National Assessment of Educational Progress (NAEP) results in high-stakes states against the national average for NAEP scores. In this analysis, a comparison group was formed from states that did not attach consequences to their state-wide tests.
Rosenshine, B. (2003). High-stakes testing: Another analysis. education policy analysis archives, 11, 24.
This research considers relationships between student achievement (knowledge and cognitive skill), teacher efficacy (Gibson & Dembo, 1984), and interactions with assigned coaches (self-report measures) in a sample of 18 grade 7 and 8 history teachers in 36 classes implementing a specific innovation with the help of 6 coaches.
Ross, J. A. (1992). Teacher efficacy and the effects of coaching on student achievement. Canadian Journal of Education, 17(1), 51–65.
CHILDREN take one of two types of standardized test, one ''norm-referenced,'' the other ''criteria-referenced.'' Although those names have an arcane ring, most parents are familiar with how the exams differ.
Rothstein, R. (2002, May 22). Lessons: Testing reaches a fork in the road. New York Times. http://www.nytimes.com/2002/05/22/nyregion/lessons-testing-reaches-a-fork-in-the-road. html
This table allows you to compare a student’s SAT® scores with the performance of other 2012 college-bound seniors who took the test some time in high school. Please keep in mind that relationships between test scores and other factors are complex and interdependent. Other factors do not directly affect test performance; rather, they are associated with educational experiences both on tests and in schoolwork.
SAT® Percentile Ranks for 2012 College-Bound Seniors: Critical Reading, Mathematics and Writing Percentile Ranks by Gender and Ethnic Groups. (2012). The College Board. Retrieved from http://secure-media.collegeboard.org/digitalServices/pdf/research/SAT-Percentile-Ranks-by-Gender-Ethnicity-2012.pdf
Is it time to kill annual testing? An Education Week article.
Sawchuk, S. (2019). Is it time to kill annual testing. Education Week, 8.
Part of the president Bush strategy for the transformation of "American Schools" lies in an accountability system that would track progress toward the nation's education goals as well as provide the impetus for reform. Here we focus primarily on issues of accountability and student achievement.
Shavelson, R. J., Baxter, G. P., & Pine, J. (1992). Research news and comment: Performance assessments: Political rhetoric and measurement reality. Educational Researcher, 21(4), 22-27.
In today's political climate, standardized tests are inadequate and misleading as achievement measures. Educators should employ a variety of measures, improve standardized test content and format, and remove incentives for teaching to the test. Focusing on raising test scores distorts instruction and renders scores less credible. Includes 13 references.
Shepard, L. A. (1989). Why We Need Better Assessments. Educational leadership, 46(7), 4-9.
Shinn, M. R. (1995). Best practices in curriculum-based measurement and its use in a problem-solving model. Best practices in school psychology III, 547-567.
Curriculum-Based Measurement and Special Services for Children is a concise and convenient guide to CBM that demonstrates why it is a valuable assessment procedure, and how it can be effectively utilized by school professionals.
Shinn, M. R. (Ed.). (1989). Curriculum-based measurement: Assessing special children. Guilford Press.
Developed specifically to overcome problems with traditional standardized instruments--and widely used in both general and special education settings throughout the US--curriculum-based measurement (CBM) comprises brief assessment probes of reading, spelling, written expression, and mathematics that serve both to quantify student performance and to bolster academic achievement.
Shinn, M. R. (Ed.). (1998). Advanced applications of curriculum-based measurement. Guilford Press.
n the before-times, one of the hallmarks of spring for parents, students and teachers was the ramp-up toward federally mandated standardized tests. COVID-19 had something to say about that last school year, and in mid-March, the U.S. Department of Education granted states a blanket exemption from standardized testing.
Silver, D. & Polikoff, M. (2020, November 16). Getting testy about testing: K–12 parents support canceling standardized testing this spring. That might not be a good idea. The 74.
Introduces a new analytic strategy for comparing the cognitive profiles of children developing reading skills at different rates: a regression-based logic analogous to the reading-level match design, but without some of the methodological problems of that design.
Stanovich, K. E., & Siegel, L. S. (1994). Phenotypic performance profile of children with reading disabilities: A regression-based test of the phonological-core variable-difference model. Journal of educational psychology, 86(1), 24.
Effective ongoing assessment, referred to in the education literature as formative assessment or progress monitoring, is indispensable in promoting teacher and student success. Feedback through formative assessment is ranked at or near the top of practices known to significantly raise student achievement. For decades, formative assessment has been found to be effective in clinical settings and, more important, in typical classroom settings. Formative assessment produces substantial results at a cost significantly below that of other popular school reform initiatives such as smaller class size, charter schools, accountability, and school vouchers. It also serves as a practical diagnostic tool available to all teachers. A core component of formal and informal assessment procedures, formative assessment allows teachers to quickly determine if individual students are progressing at acceptable rates and provides insight into where and how to modify and adapt lessons, with the goal of making sure that students do not fall behind.
States, J., Detrich, R. & Keyworth, R. (2017). Overview of Formative Assessment. Oakland, CA: The Wing Institute. http://www.winginstitute.org/student-formative-assessment.
Summative assessment is an appraisal of learning at the end of an instructional unit or at a specific point in time. It compares student knowledge or skills against standards or benchmarks. Summative assessment includes midterm exams, final project, papers, teacher-designed tests, standardized tests, and high-stakes tests.
States, J., Detrich, R. & Keyworth, R. (2018). Overview of Summative Assessment. Oakland, CA: The Wing Institute. https://www.winginstitute.org/assessment-summative
This investigation contributed to previous research by separating the effects of simply making instructional changes, not based on student performance data, from the effects of making instructional changes in accordance with CBM data.
Stecker, P. M. (1995). Effects of instructional modifications with and without curriculum-based measurement on the mathematics achievement of students with mild disabilities.
The purpose of this document is to provide background information that will be useful in interpreting the 2007 results from the Trends in International Mathematics and Science Study (TIMSS) by comparing its design, features, framework, and items with those of the U.S. National Assessment of Educational Progress and another international assessment in which the United States participates, the Program for International Student Assessment (PISA). The report found, because there are differences in the features, frameworks and items of the national and international assessments, direct comparisons among the assessments are not useful. Rather the results from different studies should be thought of as different lenses through which to view and better understand U.S. student performance.
Stephens, M., and Coleman, M. (2007). Comparing TIMSS with NAEP and PISA in Mathematics and Science. U.S. Department of Education. Washington, DC: National Center for Education Statistics. Retrieved from
This document provide background information that will be useful in interpreting the results from two key international assessments that are being released in November and December 2007 and in comparing these results with recent findings from the U.S. National Assessment of Educational Progress in similar subjects. In sum, there appears to be an advantage in capitalizing on the complementary information presented in national and international assessments. NAEP measures in detail the reading, mathematics and science knowledge of U.S. students as a whole, and can also provide trend information for individual states, different geographic regions, and demographic population groups. International assessments like PIRLS and PISA add value by providing a method for comparing our performance in the United States to the performance of students in other nations. However, their differences need to be recognized when interpreting results.
Stephens, M., Coleman, M. (2007). Comparing PIRLS and PISA with NAEP in Reading, Mathematics, and Science (Working Paper). U.S. Department of Education. Washington, DC: National Center for Education Statistics. Retrieved from https://nces.ed.gov/surveys/PISA/pdf/comppaper12082004.pdf
The classroom assessment procedures o f 36 teachers in grades 2 to 12 were studied in depth to determine the extent to which they measure students” higher order thinking skills in mathematics, science, social studies, and language arts.
Stiggins, RJ., Griswald, M., & Green, K. R. (1988). Measuring Thinking Skills Through Classroom Assessment. Paper presented at the 1988 annual meeting of the National Council on Measurement in Education, New Orleans, April.
America has been obsessed with student standardized tests for nearly 20 years. Now it looks like the country is at the beginning of the end of our high-stakes testing mania — both for K-12 “accountability” purposes and in college admissions.
Strauss, V. (2020). It looks like the beginning of the end of America’s obsession with student standardized tests. The Washington Post.
There are growing calls from across the political spectrum for the federal government to allow states to skip giving students federally mandated standardized tests in spring 2021 — but the man that President-elect Joe Biden tapped to be education secretary has indicated support for giving them.
Strauss, V. (2020b, December 30). Calls are growing for Biden to do what DeVos did: Let states skip annual standardized tests this spring. The Washington Post.
What you need to know about standardized testing.
Strauss, V. (2021, February 1). What you need to know about standardized testing. The Washington Post.
A meta-analysis involving 46 studies addressing the validity of this classification of poor readers revealed substantial overlap between the IQ-discrepant and IQ-consistent poor readers
Stuebing, K. K., Fletcher, J. M., LeDoux, J. M., Lyon, G. R., Shaywitz, S. E., & Shaywitz, B. A. (2002). Validity of IQ-discrepancy classifications of reading disabilities: A meta-analysis. American Educational Research Journal, 39(2), 469-518.
This paper examines the use of high stakes testing such as end of course exams in American education. The conclusions are that the exams do not produce substantive changes in instructional practices and the information is useful to measure school and system progress but has limited utility for instructional guidance.
Supovitz, J. (2009). Can high stakes testing leverage educational improvement? Prospects from the last decade of testing and accountability reform. Journal of Educational Change, 10(2-3), 211-227.
Test-based accountability systems — the use of tests to hold individuals or institutions responsible for performance and to reward achievement — have become the cornerstone of U.S. federal education policy, and the past decade has witnessed a widespread adoption of test-based accountability systems in the U.S. Consider just one material manifestation of this burgeoning trend: test sales have grown from from approximately $260 million annually in 1997 to approximately $700 million today — nearly a threefold increase.
Supowitz, J. (2021) Is high-stakes testing working? University of Pennsylvania, Graduate School of Education.
The federal role in developing the teacher workforce has increased markedly in the last
decade, but the history of such involvement dates back fifty years. Relying initially on
policies to recruit and train teachers, the federal role has expanded in recent years to
include new policy initiatives and instruments around the themes of accountability,
incentives, and qualifications, while also continuing the historic emphasis on teacher
recruitment, preparation, and development.
Sykes, G., & Dibner, K. (2009). Fifty Years of Federal Teacher Policy: An Appraisal. Center on Education Policy.
The continuing COVID-19 pandemic has forced school districts across the nation to quickly adapt their approach to teaching and learning, with widespread variation in the response. With a new school year now underway, several states require that schools provide some degree of in-person instruction, while other states have left such decisions up to local education and public health officials.
Therriault, S. B. (2020). Back-to-school metrics: How to assess conditions for teaching and learning and to measure student progress during the COVID-19 pandemic. Regional Education Laboratory Program (REL).
The study design was a declarative exposition of potential fallacies in the theoretical underpinnings of Cost-Effective Analysis (CFA).
Ubel, P. A., Nord, E., Gold, M., Menzel, P., Prades, J. L. P., & Richardson, J. (2000). Improving value measurement in cost-effectiveness analysis. Medical Care, 38(9), 892-901.
This article show different approach that researcher took to answer questions on social gradient in education between the countries. Comparing some of these results highlights weak service delivery in many developing countries. Even where resources may be similar, social gradients are steep in some, indicating much worse educational outcomes for the poor. And public resources are often extremely poorly converted into learning. The differential ability of schools and school systems to convert resources into learning outcomes remains a major impediment to improving educational outcomes, and indeed life chances, for the poor.
Van Der Berg, S. (2015). How does the rich-poor learning gap vary across countries?. Brookings Institution. Retrieved from https://www.brookings.edu/blog/future-development/2015/03/09/how-does-the-rich-poor-learning-gap-vary-across-countries/
Data from the 1992 National Assessment of Educational Progress are used to compare the performance of New Jersey public school children with those from other participating states. The comparisons are made with the raw means scores and after standardizing all state scores to a common (National U.S.) demographic mixture. It is argued that for most plausible questions about the performance of public schools the standardized scores are more useful.
Wainer, H. (1994). Academic Performance of New Jersey's Public Schools. education policy analysis archives, 2, 10.
This report describes the first of a series of researches that will attempt to characterize the performance of New Jersey's public school system.
Wainer, H. (1994). On the Academic Performance of New Jersey's Public School Children: I. Fourth and Eighth Grade Mathematics in 1992. ETS Research Report Series, 1994(1), i-17.
The use of educational data to make decisions and foster improvement is increasing dramatically. Federal and state accountability mandates have created a strong market for formal achievement testing, both in terms of state achievement tests and benchmarking assessments that help predict performance on these tests.
Wayman, J. C., & Cho, V. (2010). Preparing educators to effectively use student data systems (pp. 105-120). Routledge.
This investigation focused on the effects of two independent variables; (a) teacher-developed goals and monitoring systems versus a curriculum-based measurement (CBM) goal and monitoring system; and (b) individual expert versus group follow-up consultation.
Wesson, C. L. (1990). Curriculum-based measurement and two models of follow-up consultation. Exceptional Children, 57(3), 246-256.
This article will describe a CBM which is very efficient and provides the teacher with adequate information for grouping and monitoring progress throughout the school year.
Wesson, C. L., Vierthaler, J. M., & Haubrich, P. A. (1989). An efficient technique for establishing reading groups. The Reading Teacher, 42(7), 466-469.
This paper presents four studies that examine the time required to implement direct and frequent curriculum-based measurement (CBM) as well as strategies to improve the efficiency of CBM. Ten rural special education resource teachers were the subjects.
Wesson, C., Fuchs, L., Tindal, E., Mirkin, P., & Deno, S. L. (1986). Facilitating the efficiency of on-going curriculum-based measurement. Teacher Education and Special Education, 9(4), 166-172.
In this book, Grant P. Wiggins clarifies the limits of testing in an assessment system. Beginning with the premise that student assessment should improve performance, not just audit it, Wiggins analyzes some time-honored but morally and intellectually problematic practices in test design, such as the use of secrecy, distracters, scoring on a curve, and formats that allow for no explanation by students of their answers.
Wiggins, G. P. (1993). Assessing student performance: Exploring the purpose and limits of testing. Jossey-Bass.
This paper conducts an analytic literature review to examine the use and operationalization of the term in multiple academic fields
York, T. T., Gibson, C., & Rankin, S. (2015). Defining and measuring academic success. Practical Assessment, Research & Evaluation, 20(5), 1–20. Retrieved from https://scholarworks.umass.edu/pare/vol20/iss1/5/
This study investigates the prediction of college success as defined by a student’s college GPA. We predict college GPA mid-way through and at the end of their college careers using high school GPA (HSGPA), college entrance exam scores (SAT/ACT) and an open-ended, performance-based assessment of critical thinking and writing skills (CLA). 3,137 college sophomores and 1,330 college seniors participated in this study.
Zahner, D., Ramsaran, L. M., & Steedle, J. T. (2012). Comparing alternatives in the prediction of college success. In Annual Meeting of the American Educational Research Association, Vancouver, Canada.
In this study, the reliability of the MAS was reexamined with two independent groups of developmentally disabled individuals who exhibited SIB (N = 55).
Zarcone, J. R., Rodgers, T. A., Iwata, B. A., Rourke, D. A., & Dorsey, M. F. (1991). Reliability analysis of the Motivation Assessment Scale: A failure to replicate. Research in Developmental Disabilities, 12(4), 349-360.
The number of students taking Advanced Placement (AP) tests has grown to more than 2.5 million students annually. Overall test scores have remained relatively constant despite a 60% increase in the number of students taking AP exams since 2006. In school year 2015–16, 20% of students taking an AP test passed and were eligible for college credit. The College Board also reports a continuing trend in the significant increase in the number of low-income students participating in the program. Unfortunately, this trend may be negatively impacted by changes in funding. The federal grant program subsidizing AP tests for low-income students has been replaced by block grants in the Every Student Succeeds Act. These funds may still be applied to subsidize low-income populations but are not mandated for this purpose as in the past.
Zubrzycki, J. (2017). 1 in 5 Public School Students in the Class of 2016 Passed an AP Exam. Education Week.