Two experiments are reported which test the effect of increased three-term contingency trials on students' correct and incorrect math responses. The results warrant further research to test whether or not rates of presentation of three-term contingency trials are predictors of effective instruction.
Albers, A. E., & Greer, R. D. (1991). Is the three-term contingency trial a predictor of effective instruction?. Journal of Behavioral Education, 1(3), 337-354.
The “Standards for Educational and Psychological Testing” were approved as APA policy by the APA Council of Representatives in August 2013.
American Educational Research Association, American Psychological Association, Joint Committee on Standards for Educational, Psychological Testing (US), & National Council on Measurement in Education. (1985). Standards for educational and psychological testing. American Educational Research Association.
This study evaluated the relationship between scores on high stakes test and scores on other measures of learning such as NAEP and SAT scores. In general, there was no increase in student learning as a function of high stakes testing.
Amrein, A. L., & Berliner, D. C. (2002). High-Stakes Testing, Uncertainty, and Student Learning. Education Policy Analysis Archives.
The purpose of this study is to assess whether academic achievement in fact increases after the introduction of high-stakes tests. The first objective of this study is to assess whether academic achievement has improved since the introduction of high-stakes testing policies in the 27 states with the highest stakes written into their grade 1-8 testing policies.
Amrein-Beardsley, A., & Berliner, D. C. (2002). The Impact of High-Stakes Tests on Student Academic Performance.
This book was designed as an assessment of standardized testing and its alternatives at the secondary school level.
Archbald, D. A., & Newmann, F. M. (1988). Beyond standardized testing: Assessing authentic academic achievement in the secondary school.
In current study, through a meta-analysis of 78 studies, it is aimed to determine the overall effect size for testing at different frequency levels and to find out other study characteristics, related to the effectiveness of frequent testing.
Başol, G., & Johanson, G. (2009). Effectiveness of frequent testing over achievement: A meta analysis study. Journal of Human Sciences, 6(2), 99-121.
There is also little or no evidence for the claim that teachers will be more motivated to improve student learning if teachers are evaluated or monetarily rewarded for student test score gains.
Baker, E. L., Barton, P. E., Darling-Hammond, L., Haertel, E., Ladd, H. F., Linn, R. L., ... & Shepard, L. A. (2010). Problems with the Use of Student Test Scores to Evaluate Teachers. EPI Briefing Paper# 278. Economic Policy Institute.
Standardized tests play a critical role in tracking and comparing K-12 student progress across time, student demographics, and governing bodies (states, cities, districts). One methodology is to benchmark the each state’s proficiency standards against those of the National Assessment of Educational Progress (NAEP) test. This study does just that. Using NAEP as a common yardstick allows a comparison of different state assessments. The results confirm the wide variation in proficiency standards across states. It also documents that the significant majority of states have standards are much lower than those established by the NAEP.
Bandeira de Mello, V., Rahman, T., and Park, B.J. (2018). Mapping State Proficiency Standards Onto NAEP Scales: Results From the 2015 NAEP Reading and Mathematics Assessments (NCES 2018-159). U.S. Department of Education, Washington, DC: Institute of Education Sciences, National Center for Education Statistics.
Describes the ways in which accountability methods were built into practicum experiences for specialist- and doctoral-level school psychology trainees at the University of Cincinnati.
Barnett, D. W., Daly III, E. J., Hampshire, E. M., Rovak Hines, N., Maples, K. A., Ostrom, J. K., & Van Buren, A. E. (1999). Meeting performance-based training demands: Accountability in an intervention-based practicum. School Psychology Quarterly, 14(4), 357.
A brief history of high-stakes testing is followed by an analysis of eighteen states with severe consequences attached to their testing programs.
Beardsley, A., & Berliner, D. C. (2002). High-stakes testing, uncertainty, and student learning. Education Policy Analysis Archives, 10.
The later effects of the Direct Instruction Follow Through program were assessed at five diverse sites. Low-income fifth and sixth graders who had completed the full 3 years of this first- through third-grade program were tested on the Metropolitan Achievement Test (Intermediate level) and the Wide Range Achievement Test (WRAT).
Becker, W. C., & Gersten, R. (1982). A follow-up of Follow Through: The later effects of the Direct Instruction Model on children in fifth and sixth grades. American Educational Research Journal, 19(1), 75-92.
This paper uses student-level data from a statewide community college system to examine the validity of placement tests and high school information in predicting course grades and college performance.
Belfield, C. R., & Crosta, P. M. (2012). Predicting Success in College: The Importance of Placement Tests and High School Transcripts. CCRC Working Paper No. 42. Community College Research Center, Columbia University.
This article reports on a 4-year longitudinal study of the effects of Literacy Collaborative (LC), a schoolwide reform model that relies primarily on the oneon-one coaching of teachers as a lever for improving student literacy learning.
Biancarosa, G., Bryk, A. S., & Dexter, E. R. (2010). Assessing the value-added effects of literacy collaborative professional development on student learning. The elementary school journal, 111(1), 7-34.
Firm evidence shows that formative assessment is an essential component of classroom work and that its development can raise standards of achievement, Mr. Black and Mr. Wiliam point out. Indeed, they know of no other way of raising standards for which such a strong prima facie case can be made.
Black, P., & Wiliam, D. (2010). Inside the black box: Raising standards through classroom assessment. Phi Delta Kappan, 92(1), 81-90.
Over the objection of the teachers' union, the Board of Education here on Thursday unanimously approved the nation's largest merit pay program, which calls for rewarding teachers based on how well their students perform on standardizes tests.
Blumenthal, R. (2006). Houston ties teachers’ pay to test scores. New York Times, 13.
The goal of this paper is to estimate the extent to which there is differential attrition based on teachers' value-added to student achievement.
Boyd, D., Grossman, P., Lankford, H., Loeb, S., & Wyckoff, J. (2008). Who leaves? Teacher attrition and student achievement. Working Paper No. 14022. Cambridge, MA: National Bureau of Economic Research. Retrieved from https://www.nber.org/papers/w14022
This paper examines New York City elementary school teachers’ decisions to stay in the same school, transfer to another school in the district, transfer to another district, or leave teaching in New York state during the first five years of their careers.
Boyd, D., Lankford, H., Loeb, S., & Wyckoff, J. (2005). Explaining the short careers of high-achieving teachers in schools with low-performing students. American Economic Review, 95(2), 166-171.
By estimating the effect of teacher attributes using a value-added model, the analyses in this paper predict that observable qualifications of teachers resulted in average improved achievement for students in the poorest decile of schools of .03 standard deviations.
Boyd, D., Lankford, H., Loeb, S., Rockoff, J., & Wyckoff, J. (2008). The narrowing gap in New York City teacher qualifications and its implications for student achievement in high‐poverty schools. Journal of Policy Analysis and Management: The Journal of the Association for Public Policy Analysis and Management, 27(4), 793-818.
This article is an extended reanalysis of high-stakes testing on achievement. The paper focuses on the performance of states, over the period 1992 to 2000, on the NAEP mathematics assessments for grades 4 and 8.
Braun, H. (2004). Reconsidering the impact of high-stakes testing. Education Policy Analysis Archives, 12(1).
This fourth edition provides in-depth treatments of critical measurement topics, and the chapter authors are acknowledged experts in their respective fields.
Brennan, R. L. (Ed.) (2006). Educational measurement (4th ed.). Westport, CT: Praeger Publishers.
The National Board for Professional Teaching Standards (NBPTS) assesses teaching practice based on videos and essays submitted by teachers. They compared the performance of classrooms of elementary students in Los Angeles randomly assigned to NBPTS applicants and to comparison teachers.
Cantrell, S., Fullerton, J., Kane, T. J., & Staiger, D. O. (2008). National board certification and teacher effectiveness: Evidence from a random assignment experiment (No. w14608). National Bureau of Economic Research.
This paper review the main (four) critiques that have been made of international tests, as well as the rationales and education policy analyses accompanying these critiques. This brief also discusses a set of (four) critiques around the underlying social meaning and educational policy value of international test comparisons. These comparisons indicate how students in various countries score on a particular test, but do they carry a larger meaning? This paper also have some recommendations based on their critiques.
Carnoy, M. (2015). International Test Score Comparisons and Educational Policy: A Review of the Critiques. National Education Policy Center.
This study developed a zero-to-five index of the strength of accountability in 50 states based on the use of high-stakes testing to sanction and reward schools, and analyzed whether that index is related to student gains on the NAEP mathematics test in 1996–2000.
Carnoy, M., & Loeb, S. (2002). Does external accountability affect student outcomes? A cross-state analysis. Educational Evaluation and Policy Analysis, 24(4), 305-331.
This paper discusses the search for a “magic metric” in education: an index/number that would be generally accepted as the most efficient descriptor of school’s performance in a district.
Celio, M. B. (2013). Seeking the Magic Metric: Using Evidence to Identify and Track School System Quality. In Performance Feedback: Using Data to Improve Educator Performance (Vol. 3, pp. 97-118). Oakland, CA: The Wing Institute.
This report provides a practical “management guide,” for an evidence-based key indicator data decision system for school districts and schools.
Celio, M. B., & Harvey, J. (2005). Buried Treasure: Developing A Management Guide From Mountains of School Data. Center on Reinventing Public Education.
In this report, the author aim to provide an accessible introduction to these new measures of teaching quality and put them into the broader context of concerns over school quality and achievement gaps.
Corcoran, S. P. (2010). Can Teachers Be Evaluated by Their Students' Test Scores? Should They Be? The Use of Value-Added Measures of Teacher Effectiveness in Policy and Practice. Education Policy for Action Series. Annenberg Institute for School Reform at Brown University (NJ1).
Three concurrent validity studies were conducted to determine the relationship between performances on formative measures of reading and standardized achievement measures of reading.
Deno, S. L., Mirkin, P. K., & Chiang, B. (1982). Identifying valid measures of reading. Exceptional children, 49(1), 36-47.
A disproportionate reliance on SAT scores in college admissions has generated a growing number and volume of complaints. Some applicants, especially members of underrepresented minority groups, believe that the test is culturally biased. Other critics argue that high school GPA and results on SAT subject tests are better than scores on the SAT reasoning test at predicting college success, as measured by grades in college and college graduation.
Espenshade, T. J., & Chung, C. Y. (2010). Standardized admission tests, college performance, and campus diversity. Office of Population Research, Princeton University.
This systematic review synthesizes the findings from 30 studies thatcompared the performance of students at schools using single‐trackyear‐round calendars to the performance of students at schools usinga traditional calendar.
Fitzpatrick, D., & Burns, J. (2019). Single‐track year‐round education for improving academic achievement in US K‐12 schools: Results of a meta‐analysis. Campbell Systematic Reviews, 15(3), e1053.
This assessment of the reliability and validity of skills analysis programs within curriculum-based measurement (CBM), with various groups of handicapped and nonhandicapped youngsters, indicated that the skills analysis programs in spelling and math provided consistent information that related well to the primary graphed CBM scores.
Fuchs, L. S. (1989). The Reliability and Validity of Skills Analysis within Curriculum-Based Measurement. Diagnostique, 14(4), 203-21.
30 special education teachers were assigned randomly to 3 groups: curriculum-based measurement (CBM) with expert system advice (CBM-ES), CBM with no expert system advice (CBM-NES), and control (i.e., no CBM).
Fuchs, L. S., Fuchs, D., Hamlett, C. L., & Allinder, R. M. (1991). Effects of expert system advice within curriculum-based measurement on teacher planning and student achievement in spelling. School Psychology Review.
This study assessed the effects of expert system instructional consultation within curriculum-based measurement (CBM).
Fuchs, L. S., Fuchs, D., Hamlett, C. L., & Ferguson, C. (1992). Effects of expert system consultation within curriculum-based measurement, using a reading maze task. Exceptional children, 58(5), 436-450.
The purpose of this study was to examine the effects of using computer software to store, graph, and analyze student performance data on teacher efficiency and satisfaction with curriculum-based progress-monitoring procedures.
Fuchs, L. S., Fuchs, D., Hamlett, C. L., & Hasselbring, T. S. (1987). Using computers with curriculum-based monitoring: Effects on teacher efficiency and satisfaction. Journal of Special Education Technology, 8(4), 14-27.
Examined the role of skills analysis (SA) in curriculum-based measurement (CBM) for the purpose of developing more effective instructional (mathematics) programs. 30 special education teachers implemented 1 of 3 treatments for 15 wks with a total of 91 mildly and moderately handicapped pupils.
Fuchs, L. S., Fuchs, D., Hamlett, C. L., & Stecker, P. M. (1990). The role of skills analysis in curriculum-based measurement in math. School Psychology Review.
This study examined the effectiveness of innovative curriculum-based measurement (CBM) classwide decision-making structures within general education mathematics instruction, with and without recommendations for how to incorporate CBM feedback into instructional planning.
Fuchs, L. S., Fuchs, D., Hamlett, C. L., Phillips, N. B., & Bentz, J. (1994). Classwide curriculum-based measurement: Helping general educators meet the challenge of student diversity. Exceptional Children, 60(6), 518-537.
The purpose of this study was to investigate technical features of a curriculum-based measurement (CBM) system that addresses a concepts and applications mathematics curriculum (i.e., number concepts, counting, applied computation, geometry, measurement, charts, graphs, money, and problem solving).
Fuchs, L. S., Fuchs, D., Hamlett, C. L., Thompson, A., Roberts, P. H., Kubek, P., & Stecker, P. M. (1994). Technical features of a mathematics concepts and applications curriculum-based measurement system. Diagnostique, 19(4), 23-49.
The purposes of this study were to examine how well 3 measures, representing 3 points on a traditional-alternative mathematics assessment continuum, interrelated and discriminated students achieving above, at, and below grade level and to explore effects of cooperative testing for the most innovative measure (performance assessment).
Fuchs, L. S., Fuchs, D., Karns, K., Hamlett, C., Katzaroff, M., & Dutka, S. (1998). Comparisons among individual and cooperative performance assessments and other measures of mathematics competence. The Elementary School Journal, 99(1), 23-51.
This study assessed the efficiency of and teacher satisfaction with curriculum-based measurement (CBM) when student performance data are collected by teachers or by computers.
Fuchs, L. S., Hamlett, C. L., Fuchs, D., Stecker, P. M., & Ferguson, C. (1988). Conducting curriculum-based measurement with computerized data collection: Effects on efficiency and teacher satisfaction. Journal of Special Education Technology, 9(2), 73-86.
The purpose of this study was to assess the effects of (a) ongoing, systematic assessment of student growth (i.e., curriculum-based measurement) and (b) expert system instructional consultation on teacher planning and student achievement in the area of mathematics operations.
Fuchs, L. S., Hamlett, D. F. C. L., & Stecker, P. M. (1991). Effects of curriculum-based measurement and consultation on teacher planning and student achievement in mathematics operations. American educational research journal, 28(3), 617-641.
High-school grades are often viewed as an unreliable criterion for college admissions, owing to differences in grading standards across high schools, while standardized tests are seen as methodologically rigorous, providing a more uniform and valid yardstick for assessing student ability and achievement. The present study challenges that conventional view. The study finds that high-school grade point average (HSGPA) is consistently the best predictor not only of freshman grades in college, the outcome indicator most often employed in predictive-validity studies, but of four-year college outcomes as well.
Geiser, S., & Santelices, M. V. (2007). Validity of High-School Grades in Predicting Student Success beyond the Freshman Year: High-School Record vs. Standardized Tests as Indicators of Four-Year College Outcomes. Research & Occasional Paper Series: CSHE. 6.07. Center for studies in higher education.
This study examined the academic and demographic profile of the pool of prospective teachers and then explored how this profile is affected by teacher testing.
Gitomer, D. H., Latham, A. S., & Ziomek, R. (1999). The academic quality of prospective teachers: The impact of admissions and licensure testing. Princeton, NJ: Educational Testing Service. Retrieved from http://www.ets.org/Media/Research/pdf/RR-03-35.pdf
This paper provides the first empirical examination of National Council on Teacher Quality (NCTQ) ratings, beginning with a descriptive overview of the ratings and documentation of how they evolved from 2013-2016, both in aggregate and for programs with different characteristics.
Goldhaber, D., & Koedel, C. (2019). Public Accountability and Nudges: The Effect of an Information Intervention on the Responsiveness of Teacher Education Programs to External Ratings. American Educational Research Journal, 0002831218820863.
Examined the forecasting accuracy of 2 slope estimation procedures (ordinary-least-squares regression and split-middle trend lines) for reading curriculum-based measurement (CBM), a behavioral approach to the assessment of academic skills that emphasizes the direct measurement of academic behaviors.
Good, R. H., & Shinn, M. R. (1990). Forecasting accuracy of slope estimates for reading curriculum-based measurement: Empirical evidence. Behavioral Assessment.
This policy proposal I suggest (1) reforms to ensure that the Title I formula gets enough resources to the neediest areas, and (2) improvements in federal guidance and fiscal compliance outreach efforts so that local districts understand the flexibility they have to spend effectively. These are first-order issues for improving high-poverty schools, but so deeply mired in technical and bureaucratic detail that they have received little public attention in the re-authorization process.
Gordon, N. (2016). Increasing targeting, flexibility, and transparency in Title I of the Elementary and Secondary Education Act to help disadvantaged students. Policy Proposal, 1.
This study examines whether the results of standardized tests are distorted when rewards and sanctions are attached to them.
Greene, J., Winters, M., & Forster, G. (2004). Testing high-stakes tests: Can we believe the results of accountability tests?. The Teachers College Record, 106(6), 1124-1144.
This paper describe a few promising assessment technologies tat allow us to capture more direct, repeated, and contextually based measures of student learning, and propose an improvement-oriented approach to teaching and learning.
Greenwood, C. R., & Maheady, L. (1997). Measurable change in student performance: Forgotten standard in teacher preparation?. Teacher Education and Special Education, 20(3), 265-275.
This paper provides direct evidence about the impacts of school job matching on productivity and student achievement.
Hanushek, E. A., & Rivkin, S. G. (2010). Constrained job matching: Does teacher job search harm disadvantaged urban schools? Working Paper No. 15816. Cambridge, MA: National Bureau of Economic Research. Retrieved from https://www.nber.org/papers/w15816.pdf
The authors study the effects of various types of education and training on the ability of teachers to promote student achievement.
Harris, D. N., & Sass, T. R. (2011). Teacher training, teacher quality and student achievement. Journal of Public Economics, 95(7–8), 798-812.
this report aims to provide the public, along with teachers and leaders in the Great City Schools, with objective evidence about the extent of standardized testing in public schools and how these assessments are used.
Hart, R., Casserly, M., Uzzell, R., Palacios, M., Corcoran, A., & Spurgeon, L. (2015). Student Testing in America's Great City Schools: An Inventory and Preliminary Analysis. Council of the Great City Schools.
This paper summarizes recent evidence on what achievement tests measure; how achievement tests relate to other measures of "cognitive ability" like IQ and grades; the important skills that achievement tests miss or mismeasure, and how much these skills matter in life.
Heckman, J. J., & Kautz, T. (2012). Hard evidence on soft skills. Labour economics, 19(4), 451-464.
The report considers the appropriate uses and misuses of high stakes tests in making decisions for students. The fundamental question is whether test scores lead to consequences that are educationally beneficial.
Heubert, J. P., & Hauser, R. M. (1998). High stakes: Testing for tracking, promotion, and graduation. Retrieved from http://files.eric.ed.gov/fulltext/ED439151.pdf
This meta-analysis examines issues of reliability and validity of SAT tests and student grades on student performance in college.
Hezlett, S., Kuncel, N., Vey, A., Ones, D., Campbell, J. & Camara, W. (2001). “The effectiveness of the SAT in predictive success early and late in college: A comprehensive meta-analysis.” Paper presented at the annual meeting of the National Council of Measurement in Education, Seattle, WA.
The purpose of this study is to compare different statistical and methodological approaches to standard setting and determining cut scores using R- CBM and performance on high-stakes tests
Hintze, J. M., & Silberglitt, B. (2005). A longitudinal examination of the diagnostic accuracy and predictive validity of R-CBM and high-stakes testing. School Psychology Review, 34(3), 372.
A meta-analysis on the relationship between the Implicit Association Test (IAT) and corresponding explicit self-report measures was conducted.
Hofmann, W., Gawronski, B., Gschwendner, T., Le, H., & Schmitt, M. (2005). A meta-analysis on the correlation between the Implicit Association Test and explicit self-report measures. Personality and Social Psychology Bulletin, 31(10), 1369-1385.
In the last 20 years, international surveys assessing learning in reading, mathematics and science have been headline news because they put countries in rank order according to performance. The three most well known surveys are TIMSS, PISA and PIRLS. The survey offer information about international performances for the use of others in order to drive up education standards everywhere. They also emphasise that their aim is to facilitate dissemination of ideas on which features of education systems lead to the best performances.
International surveys TIMSS, PISA, PIRLS. (2017). Cambridge Assessment international Education. Retrieved from https://www.cambridgeinternational.org/Images/271193-international-surveys-pisa-timss-pirls.pdf
This study evaluated the effects of high stakes testing on the achievement levels of students in Chicago Public Schools. The data suggests that even though scores went up on the high stakes tests scores on “low stakes” achievement tests did not improve. This suggests increases in scores was a function increases in test-specific skills rather than a general improvement in student learning. These findings give credence to the “teaching to the test” criticisms.
Jacob, B. A. (2005). Accountability, incentives and behavior: The impact of high-stakes testing in the Chicago Public Schools. Journal of public Economics. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.401.6599&rep=rep1&type=pdf
This article show evidence of ACT scores drop on 2016. ACT officials attribute the drop to the increasing percentage of high school seniors who have taken the test. Generally, when a larger share of students take a test - in some cases encouraged by state requirements more than the students necessarily being college ready - scores go down.
Jaschnik, S. (2016, August). ACT Scores Drop as More Take Test. Inside Higher Ed. Retrieved from https://www.insidehighered.com/news/2016/08/24/average-act-scores-drop-more-people-take-test
Standardized testing has increasingly been used to hold educators accountable. Incentives are often offered as a way to improve student test performance. This study examines the impact incentives for students, parents and tutors on standardized test results. The researchers provided incentives on specially designed tests that measure the same skills as the official state standardized tests; however, performance on the official tests was not incentivized. This study finds substantial improvement for performance when there were incentives on the results did not generalize to the official test. This calls into question how to effectively use incentives so they will actually produce desired outcomes.
John A. List, Jeffrey A Livingston and Susanne Neckermann. “Do Students Show What They Know on Standardized Tests?” working papers (2016) Available at: http://works.bepress.com/jeffrey_livingston/19/
This report presents an in-depth discussion of the analytical methods and findings from the Measures of Effective Teaching (MET) project’s analysis of classroom observations.1 A nontechnical companion report describes implications for policymakers and practitioners.
Kane, T. J., & Staiger, D. O. (2012). Gathering Feedback for Teaching: Combining High-Quality Observations with Student Surveys and Achievement Gains. Research Paper. MET Project. Bill & Melinda Gates Foundation.
The Kansas State Board of Education's Quality Performance Accreditation system is described. Unlike past accreditation methods, which focused on the facilities or institutional characteristics, Quality Performance Accreditation accredits schools based on student performance.
Kansas State Board of Education (199). Kansas Quality Performance Accreditation. Topeka: Author.
This article discusses key issues in identifying evidence-based treatments for children and adolescents. Among the issues discussed are obstacles in transporting treatments from research to clinical services, the weak criteria for delineating whether a treatment is evidence based, and barriers to training therapists.
Kazdin, A. E. (2004). Evidence-based treatments: Challenges and priorities for practice and research. Child and Adolescent Psychiatric Clinics, 13(4), 923-940.
This study evaluated the generalization from high stakes tests to other mesures of achievement. The results suggest that there is little generalization suggesting that improvement in high stakes test scores are the result of emphasis placed on the tests and time spent in test preparation rather than actual increase in student learning.
Koretz, D. M. (1991). The Effects of High-Stakes Testing on Achievement: Preliminary Findings about Generalization across Tests. ERIC. Retrieved from http://files.eric.ed.gov/fulltext/ED340730.pdf
The research reported here investigated the effects of Maryland School Performance Assessment Program (MSPAP) by surveying teachers and principals in two of the three grades in which MSPAP is administered.
Koretz, D., Mitchell, K., Barron, S., & Keith, S. (1996). The perceived effects of the Maryland school performance assessment program. Los Angeles, CA: Center for Research on Evaluation, Standards, and Student Assessment (University of California at Los Angeles).
The purpose of this study was to examine the validity of teacher evaluation scores that are derived from an observation tool, adapted from Danielson's Framework for Teaching, designed to assess 22 teaching components from four teaching domains.
Lash, A., Tran, L., & Huang, M. (2016). Examining the Validity of Ratings from a Classroom Observation Instrument for Use in a District's Teacher Evaluation System. REL 2016-135. Regional Educational Laboratory West.
what does it mean to take a scientific approach to instructional productivity? This chapter hopes to contribute to that discussion by examining the role scientific assessment can play in enhancing educational productivity.
Layng, T. J., Stikeleather, G., & Twyman, J. S. (2006). Scientific formative evaluation: The role of individual learners in generating and predicting successful educational outcomes. The scientific basis of educational productivity, 29-44.
In undertaking this study, two goals were established: (1) to obtain a better understanding of how much time students spend taking tests; and (2) to identify the degree to which the tests are mandated by districts or states.
Lazarín, M. (2014). Testing Overload in America's Schools. Center for American Progress.
This study examines the relationship between two dominant measures of teacher quality, teacher qualification and teacher effectiveness (measured by value-added modeling), in terms of their influence on students’ short-term academic growth and long-term educational success (measured by bachelor’s degree attainment).
Lee, S. W. (2018). Pulling back the curtain: Revealing the cumulative importance of high-performing, highly qualified teachers on students’ educational outcome. Educational Evaluation and Policy Analysis, 40(3), 359–381.
This paper explore the power of behavioral economics to influence the level of effort exerted by students in a low stakes testing environment. This paper find a substantial impact on test scores from incentives when the rewards are delivered immediately. There is suggestive evidence that rewards framed as losses outperform those framed as gains.
Levitt, S. D., List, J. A., Neckermann, S., & Sadoff, S. (2016). The behavioralist goes to school: Leveraging behavioral economics to improve educational performance. American Economic Journal: Economic Policy, 8(4), 183-219.
Using a randomized control trial in 11 Chinese primary schools, we studied the effects of pay-for-grades programs on academic cheating. We randomly assigned 82 classrooms into treatment or control conditions, and used a statistical algorithm to determine the occurrence of cheating.
Li, T., & Zhou, Y. (2019). Do Pay-for-Grades Programs Encourage Student Academic Cheating? Evidence from a Randomized Experiment. Frontiers of Education in China, 14(1), 117-137.
The College board was recently released SAT scores for the high school graduating class of 2015. Both math and reading scores declined from 2014, continuing a steady downward trend that has been in place for the past decade. Pundits of contrasting political stripes seized on the scores to bolster their political agendas. Petrilli argued that falling SAT scores show that high schools needs more reform. For Burris, the declining scores were evidence of the failure of policies her organization opposes. This articles pointing out that SAT was never meant to measure national achievement and provide detail explanation.
Loveless, T. (2015). No, the sky is not falling: Interpreting the latest SAT scores. Brown Center Chalkboard. Retrieved from https://www.brookings.edu/blog/brown-center-chalkboard/2015/10/01/no-the-sky-is-not-falling-interpreting-the-latest-sat-scores/
This Brown Center Report (BCR) on American Education is the sixth and final edition in the third volume and the 16th issue overall. The series began in 2000. As in the past, the report comprises three studies. Also in keeping with tradition, the first section features recent results from state, national, or international assessments; the second section investigates a thematic topic in education, either by collecting new data or by analyzing existing empirical evidence in a novel way; and the third section looks at one or more education policies.
Loveless, T. (2017). How Well Are American Students Learning? With Sections on the Latest International test Scores, Foreign Exchange Students, and School Suspensions. The 2-17 Brown Center Report on American Education. Retrieved from https://www.brookings.edu/wp-content/uploads/2017/03/2017-brown-center-report-on-american-education.pdf
This paper is an examination of the impact of high stakes testing on minority students. The outcomes suggest that high stakes testing does not have a positive impact on minority students and in some instances there is negative effects from high stakes testing.
Madaus, G. F., & Clarke, M. (2001). The Adverse Impact of High Stakes Testing on Minority Students: Evidence from 100 Years of Test Data. ERIC. Retrieved from http://files.eric.ed.gov/fulltext/ED450183.pdf
This study presents such an approach where the impact of regular and special education on 11 mildly handicapped children is studied by analyzing their slope of improvement on weekly curriculum-based measures (CBM) reading scores.
Marston, D. (1988). The effectiveness of special education: A time series analysis of reading performance in regular and special education settings. The Journal of Special Education, 21(4), 13-26.
there exists a serious need to examine alternative testing models for making educational decisions. in this chapter, this need is documented from the perspective that the traditional model has failed education in two major ways, from the technical level and from a social policy level. curriculum-based measurement procedures are proposed to redress some of the issues in these domains
Marston, D. B. (1989). A curriculum-based measurement approach to assessing academic performance: What it is and why do it. In M. R. Shinn (Ed.), The Guilford school practitioner series. Curriculum-based measurement: Assessing special children (pp. 18-78). New York, NY, US: Guilford Press.
In a series of two studies, the relative sensitivity of traditional standardized achievement tests and alternative curriculum-based measures was assessed.
Marston, D., Fuchs, L. S., & Deno, S. L. (1986). Measuring pupil progress: A comparison of standardized achievement tests and curriculum-related measures. Diagnostique, 11(2), 77-90.
This study examines the relationship between scores on the SAT and retention to second year of college using student level data from the freshman class of 2006 at 106 four-year institutions.
Mattern, K. D., & Patterson, B. F. (2009). Is performance on the SAT related to college retention?.
This study investigated communicative strategies for helping female students cope with ‘‘stereotype threat’’. The results demonstrate that priming a positive achieved identity (e.g., private college student) can subdue stereotype threat associated with an ascribed identity (e.g., female).
McGlone, M. S., & Aronson, J. (2007). Forewarning and forearming stereotype-threatened students. Communication Education, 56(2), 119-133.
The goal of this guide is to provide useful information about standardized testing, or assessment, for practitioners and non-practitioners who care about public schools. It includes the nature of assessment, types of assessments and tests, and definitions.
Mitchell, R. (2006). A guide to standardized testing: The nature of assessment. Center for Public Education.
The Classroom Environment Scale (CES) helps create a positive school climate in which more students succeed. The instrument evaluates the effects of course content, teaching methods, teacher personality, class composition and characteristics of the overall classroom environment.
Moors, R.H., & Trickett, E. J. (1979). Classroom Environment Scale Manual (2nd Ed.). Palo Alto, CA: Consulting Psychologists Council.
This study examines validity data for SAT scores and student grades enrolling classes of 1976 to 1985.
Morgan, R. (1989). “Analysis of the predictive validity of the SAT and high school grades from 1976 to 1983.” College Board Report No. 89-7. New York: College Board.
This article describe about the drop down of SAT score in 2016.
Mulhere, K. (2016, September). SAT Scores Take a Dip. Money. Retrieved from http://money.com/money/4508286/average-sat-scores-class-2016/
The measurement unit disability-adjusted life years (DALYs), used in recent years to quantify the burden of diseases, injuries and risk factors on human populations, is grounded on cogent economic and ethical principles and can guide policies toward delivering more cost-effective and equitable health care.
Murray, C. J., & Acharya, A. K. (1997). Understanding DALYs. Journal of health economics, 16(6), 703-730.
This book looks at how testing affects critical decisions for American students. The text focuses on how testing is used in schools to make decisions about tracking and placement, promotion and retention, and awarding or withholding high school diplomas. This book examines the controversies that emerge when a test score can open or close gates on a student's educational pathway.
National Research Council. (1999). High Stakes: Testing for Tracking, Promotion, and Graduation. Washington, DC: National Academies Press.
The paper examines Campbell’s law-the more any quantitative social indicator is used for social decision making the more likely the measure will corrupt the social processes it is intended to monitor.” In education, high stakes testing has resulted in widespread cheating, exclusion from low performing students from testing, encouraging students to drop out, and narrowing the curriculum.
Nichols, S. L., & Berliner, D. C. (2005). The Inevitable Corruption of Indicators and Educators through High-Stakes Testing. Education Policy Research Unit. Retrieved from http://files.eric.ed.gov/fulltext/ED508483.pdf
This study evaluated the relationship between increased accountability via high stakes testing and progress on other measures of educational progress. The results suggests that high stakes testing does not have any positive impact on student achievement.
Nichols, S. L., Glass, G. V., & Berliner, D. C. (2006). High-stakes testing and student achievement: Does accountability pressure increase student learning. education policy analysis …. Retrieved from http://epaa.asu.edu/ojs/index.php/epaa/article/download/72/198
The authors estimate racial/ethnic achievement gaps in several hundred metropolitan areas and several thousand school districts in the United States using the results of roughly 200 million standardized math and English language arts (ELA) tests administered to public school students from 2009 to 2013. They show that the strongest correlates of achievement gaps are local racial/ethnic differences in parental income and educational attainment, local average parental education levels, and patterns of racial/ethnic segregation, consistent with a theoretical model in which family socioeconomic factors affect educational opportunity partly through residential and school segregation patterns.
Reardon, S. F., Kalogrides, D., & Shores, K. (2019). The geography of racial/ethnic test score gaps. American Journal of Sociology, 124(4), 1164-1221.
In this paper, we analyze racial differences in the math section of the general SAT test, using publicly available College Board population data for all of the nearly 1.7 million college-bound seniors in 2015 who took the SAT. The evidence for a stubborn race gap on this test does meanwhile provide a snapshot into the extraordinary magnitude of racial inequality in contemporary American society. Standardized tests are often seen as mechanisms for meritocracy, ensuring fairness in terms of access. But test scores reflect accumulated advantages and disadvantages in each day of life up the one on which the test is taken. Race gaps on the SAT hold up a mirror to racial inequities in society as a whole. Equalizing educational opportunities and human capital acquisition earlier is the only way to ensure fairer outcomes.
Reeves, R. V., Halikias, D. (2017). Race Gap in SAT scores highlight inequality and Hinder Upward Mobility. Brookings. Retrieved from https://www.brookings.edu/research/race-gaps-in-sat-scores-highlight-inequality-and-hinder-upward-mobility/
This article show the evidence for a race gap on the SAT math score and some big issues at stake including: the value of the SAT itself; the case for broader policies to take into account socioeconomic background in college admissions; the obsession with four-year college degrees; and the danger of college as a “bottleneck” in the American opportunity structure.
Reeves, Richard. (2017, February). Race Gap in SAT Math Score are as big as Ever. Brown Center Chalkboard. Retrieved from https://www.brookings.edu/blog/brown-center-chalkboard/2017/02/01/race-gaps-in-sat-math-scores-are-as-big-as-ever/
This review examined the overlap between state-created curriculum evaluation tools and The Hexagon Tool created by the National Implementation Research Network. The author followed systematic procedures while conducting a web search and visiting each state’s department of education website in search of curriculum evaluation tools.
Rolf, R., R. (2019). State Department of Education Support for Implementation Issues Faced by School Districts during the Curriculum Adoption Process. Oakland, CA: The Wing Institute. https://www.winginstitute.org/student-research-2019.
Amrein and Berliner (2002b) compared National Assessment of Educational Progress (NAEP) results in high-stakes states against the national average for NAEP scores. In this analysis, a comparison group was formed from states that did not attach consequences to their state-wide tests.
Rosenshine, B. (2003). High-stakes testing: Another analysis. education policy analysis archives, 11, 24.
This research considers relationships between student achievement (knowledge and cognitive skill), teacher efficacy (Gibson & Dembo, 1984), and interactions with assigned coaches (self-report measures) in a sample of 18 grade 7 and 8 history teachers in 36 classes implementing a specific innovation with the help of 6 coaches.
Ross, J. A. (1992). Teacher efficacy and the effects of coaching on student achievement. Canadian Journal of Education, 17(1), 51–65.
This table allows you to compare a student’s SAT® scores with the performance of other 2012 college-bound seniors who took the test some time in high school. Please keep in mind that relationships between test scores and other factors are complex and interdependent. Other factors do not directly affect test performance; rather, they are associated with educational experiences both on tests and in schoolwork.
SAT® Percentile Ranks for 2012 College-Bound Seniors: Critical Reading, Mathematics and Writing Percentile Ranks by Gender and Ethnic Groups. (2012). The College Board. Retrieved from http://secure-media.collegeboard.org/digitalServices/pdf/research/SAT-Percentile-Ranks-by-Gender-Ethnicity-2012.pdf
Part of the president Bush strategy for the transformation of "American Schools" lies in an accountability system that would track progress toward the nation's education goals as well as provide the impetus for reform. Here we focus primarily on issues of accountability and student achievement.
Shavelson, R. J., Baxter, G. P., & Pine, J. (1992). Research news and comment: Performance assessments: Political rhetoric and measurement reality. Educational Researcher, 21(4), 22-27.
In today's political climate, standardized tests are inadequate and misleading as achievement measures. Educators should employ a variety of measures, improve standardized test content and format, and remove incentives for teaching to the test. Focusing on raising test scores distorts instruction and renders scores less credible. Includes 13 references.
Shepard, L. A. (1989). Why We Need Better Assessments. Educational leadership, 46(7), 4-9.
Shinn, M. R. (1995). Best practices in curriculum-based measurement and its use in a problem-solving model. Best practices in school psychology III, 547-567.
Curriculum-Based Measurement and Special Services for Children is a concise and convenient guide to CBM that demonstrates why it is a valuable assessment procedure, and how it can be effectively utilized by school professionals.
Shinn, M. R. (Ed.). (1989). Curriculum-based measurement: Assessing special children. Guilford Press.
Developed specifically to overcome problems with traditional standardized instruments--and widely used in both general and special education settings throughout the US--curriculum-based measurement (CBM) comprises brief assessment probes of reading, spelling, written expression, and mathematics that serve both to quantify student performance and to bolster academic achievement.
Shinn, M. R. (Ed.). (1998). Advanced applications of curriculum-based measurement. Guilford Press.
Effective ongoing assessment, referred to in the education literature as formative assessment or progress monitoring, is indispensable in promoting teacher and student success. Feedback through formative assessment is ranked at or near the top of practices known to significantly raise student achievement. For decades, formative assessment has been found to be effective in clinical settings and, more important, in typical classroom settings. Formative assessment produces substantial results at a cost significantly below that of other popular school reform initiatives such as smaller class size, charter schools, accountability, and school vouchers. It also serves as a practical diagnostic tool available to all teachers. A core component of formal and informal assessment procedures, formative assessment allows teachers to quickly determine if individual students are progressing at acceptable rates and provides insight into where and how to modify and adapt lessons, with the goal of making sure that students do not fall behind.
States, J., Detrich, R. & Keyworth, R. (2017). Overview of Formative Assessment. Oakland, CA: The Wing Institute. http://www.winginstitute.org/student-formative-assessment.
Summative assessment is an appraisal of learning at the end of an instructional unit or at a specific point in time. It compares student knowledge or skills against standards or benchmarks. Summative assessment includes midterm exams, final project, papers, teacher-designed tests, standardized tests, and high-stakes tests.
States, J., Detrich, R. & Keyworth, R. (2018). Overview of Summative Assessment. Oakland, CA: The Wing Institute. https://www.winginstitute.org/assessment-summative
This investigation contributed to previous research by separating the effects of simply making instructional changes, not based on student performance data, from the effects of making instructional changes in accordance with CBM data.
Stecker, P. M. (1995). Effects of instructional modifications with and without curriculum-based measurement on the mathematics achievement of students with mild disabilities.
The purpose of this document is to provide background information that will be useful in interpreting the 2007 results from the Trends in International Mathematics and Science Study (TIMSS) by comparing its design, features, framework, and items with those of the U.S. National Assessment of Educational Progress and another international assessment in which the United States participates, the Program for International Student Assessment (PISA). The report found, because there are differences in the features, frameworks and items of the national and international assessments, direct comparisons among the assessments are not useful. Rather the results from different studies should be thought of as different lenses through which to view and better understand U.S. student performance.
Stephens, M., and Coleman, M. (2007). Comparing TIMSS with NAEP and PISA in Mathematics and Science. U.S. Department of Education. Washington, DC: National Center for Education Statistics. Retrieved from
This document provide background information that will be useful in interpreting the results from two key international assessments that are being released in November and December 2007 and in comparing these results with recent findings from the U.S. National Assessment of Educational Progress in similar subjects. In sum, there appears to be an advantage in capitalizing on the complementary information presented in national and international assessments. NAEP measures in detail the reading, mathematics and science knowledge of U.S. students as a whole, and can also provide trend information for individual states, different geographic regions, and demographic population groups. International assessments like PIRLS and PISA add value by providing a method for comparing our performance in the United States to the performance of students in other nations. However, their differences need to be recognized when interpreting results.
Stephens, M., Coleman, M. (2007). Comparing PIRLS and PISA with NAEP in Reading, Mathematics, and Science (Working Paper). U.S. Department of Education. Washington, DC: National Center for Education Statistics. Retrieved from https://nces.ed.gov/surveys/PISA/pdf/comppaper12082004.pdf
The classroom assessment procedures o f 36 teachers in grades 2 to 12 were studied in depth to determine the extent to which they measure students” higher order thinking skills in mathematics, science, social studies, and language arts.
Stiggins, RJ., Griswald, M., & Green, K. R. (1988). Measuring Thinking Skills Through Classroom Assessment. Paper presented at the 1988 annual meeting of the National Council on Measurement in Education, New Orleans, April.
This paper examines the use of high stakes testing such as end of course exams in American education. The conclusions are that the exams do not produce substantive changes in instructional practices and the information is useful to measure school and system progress but has limited utility for instructional guidance.
Supovitz, J. (2009). Can high stakes testing leverage educational improvement? Prospects from the last decade of testing and accountability reform. Journal of Educational Change, 10(2-3), 211-227.
The study design was a declarative exposition of potential fallacies in the theoretical underpinnings of Cost-Effective Analysis (CFA).
Ubel, P. A., Nord, E., Gold, M., Menzel, P., Prades, J. L. P., & Richardson, J. (2000). Improving value measurement in cost-effectiveness analysis. Medical Care, 38(9), 892-901.
This article show different approach that researcher took to answer questions on social gradient in education between the countries. Comparing some of these results highlights weak service delivery in many developing countries. Even where resources may be similar, social gradients are steep in some, indicating much worse educational outcomes for the poor. And public resources are often extremely poorly converted into learning. The differential ability of schools and school systems to convert resources into learning outcomes remains a major impediment to improving educational outcomes, and indeed life chances, for the poor.
Van Der Berg, S. (2015). How does the rich-poor learning gap vary across countries?. Brookings Institution. Retrieved from https://www.brookings.edu/blog/future-development/2015/03/09/how-does-the-rich-poor-learning-gap-vary-across-countries/
This investigation focused on the effects of two independent variables; (a) teacher-developed goals and monitoring systems versus a curriculum-based measurement (CBM) goal and monitoring system; and (b) individual expert versus group follow-up consultation.
Wesson, C. L. (1990). Curriculum-based measurement and two models of follow-up consultation. Exceptional Children, 57(3), 246-256.
This article will describe a CBM which is very efficient and provides the teacher with adequate information for grouping and monitoring progress throughout the school year.
Wesson, C. L., Vierthaler, J. M., & Haubrich, P. A. (1989). An efficient technique for establishing reading groups. The Reading Teacher, 42(7), 466-469.
This paper presents four studies that examine the time required to implement direct and frequent curriculum-based measurement (CBM) as well as strategies to improve the efficiency of CBM. Ten rural special education resource teachers were the subjects.
Wesson, C., Fuchs, L., Tindal, E., Mirkin, P., & Deno, S. L. (1986). Facilitating the efficiency of on-going curriculum-based measurement. Teacher Education and Special Education, 9(4), 166-172.
This study investigates the prediction of college success as defined by a student’s college GPA. We predict college GPA mid-way through and at the end of their college careers using high school GPA (HSGPA), college entrance exam scores (SAT/ACT) and an open-ended, performance-based assessment of critical thinking and writing skills (CLA). 3,137 college sophomores and 1,330 college seniors participated in this study.
Zahner, D., Ramsaran, L. M., & Steedle, J. T. (2012). Comparing alternatives in the prediction of college success. In Annual Meeting of the American Educational Research Association, Vancouver, Canada.
In this study, the reliability of the MAS was reexamined with two independent groups of developmentally disabled individuals who exhibited SIB (N = 55).
Zarcone, J. R., Rodgers, T. A., Iwata, B. A., Rourke, D. A., & Dorsey, M. F. (1991). Reliability analysis of the Motivation Assessment Scale: A failure to replicate. Research in Developmental Disabilities, 12(4), 349-360.
The number of students taking Advanced Placement (AP) tests has grown to more than 2.5 million students annually. Overall test scores have remained relatively constant despite a 60% increase in the number of students taking AP exams since 2006. In school year 2015–16, 20% of students taking an AP test passed and were eligible for college credit. The College Board also reports a continuing trend in the significant increase in the number of low-income students participating in the program. Unfortunately, this trend may be negatively impacted by changes in funding. The federal grant program subsidizing AP tests for low-income students has been replaced by block grants in the Every Student Succeeds Act. These funds may still be applied to subsidize low-income populations but are not mandated for this purpose as in the past.
Zubrzycki, J. (2017). 1 in 5 Public School Students in the Class of 2016 Passed an AP Exam. Education Week.