Education DriversOverview
Decision Making- Overview
- Best Available Evidence- Continuum of Evidence
- Types Of Evidence
- Sources Of Evidence
- Professional Judgment- Sources Of Bias
- Improvement
- Practice Based Evidence
- Contextual Fit
- Client Values
- Treatment Integrity- Dimensions
- Strategies
- Problem Solving
Implementation- Overview
- Exploration/Adoption
- Installation
- Initial Implementation
- Full Implementation
- Social Influence
- Culture Change
- Values
Monitoring- Overview
- Student- Student Formative Assessment
- End of Course Exams
- Standardized Tests
- Grades
- Early Indicators
- Staff- Value Added
- Formal Evaluation
- Informal (Walk Throughs)
- Feedback (Coaching)
- Treatment Integrity
- Systems- Short Term (Formative)
- Long Term (Summative)
- Program Fidelity
External Influences- Overview
- Home- Parenting
- Home Schooling
- Homework
- Poverty- Impact
- Interventions
- Cultural Diversity- Issues
- Solutions
- Policy- Political Policy
- Governmental Agencies
- Special Interests
- Society- Communities
- Media
- Common Frames
- Standards- Societal Outcomes
- Academic Standards
- Evaluation
Quality Teachers- Overview
- Competencies- Formative Assessment
- Classroom Management
- Instructional Delivery
- Soft Skills
- Outreach- Projected Need
- Teacher Standards
- Outreach
- Selection
- Retention- Teacher Turnover Impact
- Teacher Turnover Analysis
- Retention Strategies
- Teacher Preparation- Curriculum Content
- Instructional Effectiveness
- Student Teaching
- Teacher Preparation: Models
- Program Accountability
- Professional Development- Induction
- Inservice
- CEUs / Advanced Degrees
- Certification / Licensing
- Coaching
- Evaluation- Formal Evaluation
- Performance Feedback
- Coaching
Quality Leadership- Overview
- Principal Impact
- Principal Competencies- Teacher Development
- Goals and Expectations
- Ensuring Quality Teaching
- Resourcing Strategically
- Orderly Safe Environment
- Principal Outreach- Needs Analysis
- Principal Standards
- Principal Outreach
- Principal Selection
- Principal Retention- Turnover Impact
- Turnover Analysis
- Retention Strategies
- Principal Preparation- Curriculum Content
- Instructional Effectiveness
- Clinical Practice
- Program Models
- Program Accountability
- Professional Development- Induction
- In-Service Professional Development
- CEUs / Advanced Degrees
- Certification / Licensing
- Coaching
- Evaluation- Formal Evaluation
- Performance Feedback
- Coaching
- School Metrics
- Leadership Models- Teams
- Distributed Leadership
- District/State Leadership- School Districts
- State Education Agencies
Effective Instruction- Overview
- Assessment- Formative Assessment
- Summative Assessment
- Instructional Delivery- Planning
- Active Student Responding
- Corrective Feedback
- Differential Reinforcement
- Mastery Learning
- Quantity Of Instruction
- Classroom Management- Appropriate Behaviors
- Inappropriate Behaviors
- Rules And Procedures
- Structured Environments
- Active Supervision
- Soft Skills/Personal Competencies- Teacher-Student Relationships
- Personal Organization
- Personal Problem Solving
- Evidence-Based Curriculum- Instructional Standards
- Generic Practice Elements
- Subject Practice Elements
- Systematic Reviews
- Treatment Integrity- Dimensions
- Strategies
- Problem Solving
- Remote Learning- Infrastructure
- Curriculum
- Training And Support
- School Programs- Multi-Tiered Systems / Support
- Early Childhood Education
- Vocational Education
- Special Education
- School Climate
Education Resources- Overview
- Staff- Staff Quality
- Staff Support
- Staff Equitable Distribution
- Funding- Amount
- Return On Investment
- Equitable Distribution
- Materials- Curriculum Supplies
- Facilities
- Computer / IT

Summative Assessment

Summative assessment is an appraisal of learning at the end of an instructional unit or at a specific point in time. It compares student knowledge or skills against standards or benchmarks. Summative assessment evaluates the mastery of learning whereas its counterpart, formative assessment, measures progress and functions as a diagnostic tool to help specific students. Generally, summative assessment gauges how a particular population responds to an intervention rather than focusing on an individual. It often aggregates data across students to act as an independent yardstick that allows teachers, administrators, and parents to judge the effectiveness of the materials, curriculum, and instruction used to meet national, state, or local standards. Summative assessment includes midterm exams, final project, papers, teacher-designed tests, standardized tests, and high-stakes tests. As a subset of summative assessment, standardized tests play a pivotal role in ensuring that schools are held to the same standards and that all students regardless of race or socio-economic background perform to expectations. Summative assessment provides educators with the metrics to know what’s working and what’s not.

OVERVIEW WING RESOURCES RESEARCH ORGANIZATIONS

Overview of Summative Assessment

Summative Assessment PDF

States, J., Detrich, R. & Keyworth, R. (2018). Overview of Summative Assessment. Oakland, CA: The Wing Institute. https://www.winginstitute.org/assessment-summative.

Research supports the power of assessment to amplify learning and skill acquisition (Başol & Johanson, 2009). Summative assessment is a form of appraisal that occurs at the end of an instructional unit or at a specific point in time, such as the end of the school year. It evaluates mastery of learning and offers information on what students know and do not know. Frequently, summative assessment consists of evaluation tools designed to measure student performance against predetermined criteria based on specific learning standards. Examples of commonly employed tools include Advanced Placement exams, National Assessment of Education Progress (NAEP), end-of-lesson tests, midterm exams, final project, and term papers. These assessments are routinely used for making high-stakes decisions; for this purpose, often student knowledge or skill acquisition is compared with standards or benchmarks (examples: Common Core Standards and High School Graduation Tests).

What makes summative assessment so invaluable is that each high-stakes test may result in educators using the data for decisions with significant long-term consequences affecting a student’s future. Passing bestows important benefits, such as receiving a high school diploma, a scholarship, or entry into college, and failure can affect a child’s future employment prospects and earning potential as an adult (Geiser & Santelices, 2007). Additionally, summative assessment plays a role in improving future instruction by providing educators with data on the effectiveness of curriculum and instruction. Knowing what methods worked for a lesson or semester may not help current students, but it can provide educators with the necessary insights into how and where to redesign instructional practices to elevate next year’s student scores (Moss, 2013).

Despite the important role of summative assessment in education, research finds little evidence to support it as a critical factor in improved student achievement (Rosenshine, 2003; Yeh, 2007). Figure 1 provides a comparison of the effect size of formative assessment and high-stakes testing (an instrument of summative assessment), gleaned from multiple studies conducted over more than 40 years.

Formative and Summative

Figure 1. Comparison of formative assessment and summative assessment impact on student achievement

Because summative assessment happens after instruction is over, it has little value as a diagnostic tool to guide teachers in making timely adjustments to instruction aimed at catching students who are falling behind. It does not provide teachers with vital information to use in crafting remedial instruction. Formative assessment is a much more effective instrument for adjusting instruction to assist students master material (Garrison & Ehringhaus, 2007; Harlen & James, 1997).

Despite these shortcomings, summative assessment plays a pivotal role in education by troubleshooting weaknesses in the system. It provides educators with valuable information to determine the effectiveness of instruction for a particular unit of study, to make high-stakes decisions, and to evaluate the effectiveness of schoolwide interventions. It works to improve overall instruction (1) by providing feedback on progress measured against benchmarks, (2) by helping teachers to improve, and (3) as an accountability instrument for continuous improvement of systems (Hart et al., 2015)

Types of Summative Assessment

Educators generally rely on two forms of summative assessment: teacher constructed (informal) and standardized (systematic). Teacher-constructed assessment is the most common form of assessment found in classrooms. It can provide objective data for appraising student performance, but it is vulnerable to bias. Standardized assessment is designed to overcome many of the biases that can taint teacher-constructed tools, but this form of assessment have their own limitations. Both types of summative assessment have a place in an effective education system, but for maximum positive effects they should be employed to meet the needs for which they were designed.

Teacher Constructed (Informal)

Teacher-constructed assessment, the most common and frequently applied type of summative assessment, is derived from teachers’ daily interactions and observations of how students behave and perform in school. Since schools began, teachers have depended predominantly on informal assessment, which today includes teacher-constructed tests and quizzes, grades, and portfolios, and relies heavily on a teacher’s professional judgment. Teachers inevitably form judgments, often accurate, about students and their performance (Barnett, 1988; Spencer, Detrich, & Slocum, 2012). Although many of these judgments help teachers understand where students stand in mastering a lesson, a meaningful percentage result in false understandings and conclusions. To be effective, a teacher-constructed assessment must deliver vital information needed for the teacher to make accurate conclusions about each student’s performance in a content area and to feel confident that performance is linked to instruction. Ensuring that a teacher-constructed instrument is reliable and valid is central to the assessment design process.

Research suggests that the main weaknesses of informal assessment relate to validity and reliability (AERA, 1999; Mertler, 1999). That is why it is crucial for teachers to adopt assessment procedures that are valid indicators of a student’s performance (appraise what the assessment claims to) and that the assessment is reliable (provides information that can be replicated).

Validity is a measure of how well an instrument gauges the relevant skills of a student. The research literature identifies three basic types of validity: construct, criterion, and content. Students are best served when the teacher focuses on content validity, that is, making sure the content being tested is actually the content that was taught (Popham, 2014). Content validity requires no statistical calculations whereas both construct validity and criterion validity require knowledge of statistics and thus are not well suited to classroom teachers (Allen & Yen, 2002).

Ultimately, speedy feedback of student performance after an assessment enhances the value of all forms of assessment. To maximize the positive impact, both student and teacher should be provided with detailed and specific information on a student’s achievement. Timely comments and explanations from teachers can clarify how a student performed and are essential components of quality instruction and performance improvement. This information tells students where they stand with regard to the teacher’s expectations. Timely feedback is also essential for teachers (Gibbs & Simpson, 2005). Otherwise, teachers remain in the dark about the effectiveness of their instructional strategies and methods. Research suggests that testing without feedback is likely to produce disappointing results, and the quantity and quality of the research supports including feedback as an integral part of assessment (Başol, 2003).

Designing Teacher-Constructed Assessments

The essential question to ask when developing an informal teacher-constructed assessment is this: Does the assessment consistently assess what the teacher intended to be evaluated based on the material being taught? Best practices in assessment suggest that teachers start answering this question by incorporating assessment design into the instructional design process. Assessments are best generated at the same time as lesson plans. Although teaching to the test has acquired negative overtones, it is precisely what all student assessment is meant to accomplish. Teachers cannot and should not assess every item they teach, but it is important that they identify and prioritize the critical lesson elements for inclusion in a summary assessment.

Instruction and assessment are meant to complement one another. When this occurs it helps teachers, policymakers, administrators, and parents know what students are capable of doing at specific stages in the education process. A good match of assessment with instruction leads to more effective scope and sequencing, enhancing the acquisition of knowledge and the mastery of skills required for success in subsequent grades as well as success after graduation from school (Reigeluth, 1999).

The following are guidelines that lead to increased effectiveness of teacher-constructed assessment (Reynolds, Livingston, Willson, & Willson, 2010; Shillingburg, 2016; Taylor & Nolen, 2005):

Clarify the purpose of the assessment and the intended use of its results.

Define the domain (content and skills) to be assessed.

Match instruction to standards required of each domain.

Identify the characteristics of the population to be assessed and consider how these data might influence the design of the assessment.

Ensure that all prerequisite skills required for the lesson have been taught to the students.

Ensure that the assessment evaluates skills compatible with and required for success in future lessons.

Review with the students the purpose of the assessment and the knowledge and skills to be assessed.

Consider possible task formats, timing, and response modes and whether they are compatible with the assessment as well as how the scores will be used.

Outline how validity will be evaluated and measured.
- Methods include matching test questions to lesson plans, lesson objectives, and standards, and obtaining student feedback after the assessment.
- Content-related evidence often consists of deciding whether the assessment methods are appropriate, whether the tasks or problems provide an adequate sample of the student’s performance, and whether the scoring system captures the performance.
- When possible, review test items with colleagues and students; revise as necessary.

Review issues of reliability.
- Make sure that the assessment includes enough items and tasks (examples of performance) to report a reliable score.
- Evaluate the relative weight allotted to each task, to each content category, and to each skill being assessed.

Pilot-test the assessment, then revise as necessary. Are the results consistent with formative assessments administered on the content being taught?

Standardized (Systematic)

Standardized testing is the second major category of summative assessment commonly used in schools. Students and teachers are very familiar with these standardized tests, which have become ubiquitous. Over the past 20 years, they have played an ever-increasing role in schools, especially since the passage in 2001 of the No Child Left Behind Act (NCLB, 2002). Standardized tests have increased not only in influence but also in quantity. Typically, students are engaged in taking standardized tests between 20 and 25 hours each year (Bangert-Drowns, Kulik, Kulik, & Morgan, 1991; Hart et al., 2015). The average 8th grader spends between 1.6% and 2.3% of classroom time on standardized tests, not including test preparation (Bangert-Drowns et al., 1991; Lazarín, 2014). A student will be required to participate in approximately 112 mandatory standardized exams during his or her academic career (Hart et al., 2015).

Although research finds that student performance increases with the frequency of assessment, it also shows that improvement tapers off with excess testing (Bangert-Drowns et al., 1991). Regardless of where educators stand on the issue of standardized testing, most can agree that these assessments should be reduced to the minimum number required to obtain the critical information for which they were designed. The aim is to decrease the number of standardized tests to those indispensable in providing educators with the basic information to make high-stakes decisions and for schools to implement a continuous improvement process. Ultimately, everyone is best served by reducing redundancy in test taking in order to maximize instructional time (Wang, Haertel, & Walberg, 1990).

Standardized tests provide valuable data to be used by educators for school reform and continuous improvement purposes. Data from these tests can include early indicators that point to interventions for preventing potential future problems. The data can also reveal when the system has broken down or highlight exemplary performers that schools can emulate. Using such data can be invaluable as a systemwide tool (Celio, 2013). Despite the potential value of summative assessment as a tool to monitor and improve systems, research finds minimal positive impact on student performance when the tests are used for high-stakes purposes or to hold teachers and schools accountable (Carnoy & Loeb, 2002; Hanushek & Raymond, 2005). The increased use of incentives and other accountability measures, which have cost enormous sums, reduced instruction time, and added stress to teachers, can be linked to only an average effect size of 0.05 in improvement of student achievement (Yeh, 2007).

As previously noted, formative assessment has been shown to be a much more effective tool in helping individual students maintain progress toward meeting accepted performance standards, and the rigor and cost required to design valid and reliable standardized tests places them outside the realm of tools that teachers can personally design. In the end, it is important to understand what summative assessment is best suited to accomplish. When it comes to improving systems, standardized assessment is well suited for meeting a school’s needs. But for improving an individual student’s performance, formative assessment is more appropriate.

Summary

Summative assessment is a commonplace tool used by teachers and school administrators. It ranges from a simple teacher-constructed end-of-lesson exam to standardized tests that determine graduation from high school and entry into college. If used for the purposes for which it was designed, summative assessment plays an important role in education. When used appropriately, it can deliver objective data to support a teacher’s professional judgment, to make high-stakes decisions, and as a tool for acquiring the needed information for adjustments in curriculum and instruction that will ultimately improve the education process. When used incorrectly or for accountability purposes, summative assessment can take valuable instruction time away from students and increase teacher and student stress without producing notable results.

Citations

Allen, M. J., & Yen, W. M. (2002). Introduction to measurement theory. Long Grove, IL: Waveland Press.

American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1999). Standards for educational and psychological testing. Washington, DC: American Educational Research Association (AERA).

Bangert-Drowns, R. L., Kulik, C. L. C., Kulik, J. A., & Morgan, M. (1991). The instructional effect of feedback in test-like events. Review of educational research, 61(2), 213–238.

Barnett, D. W. (1988). Professional judgment: A critical appraisal. School Psychology Review, 17(4), 658–672

Başol, G. (2003). Effectiveness of frequent testing over achievement: a meta-analysis study. Unpublished doctorate dissertation, Ohio University, Athens, OH.

Başol, G., & Johanson, G. (2009). Effectiveness of frequent testing over achievement: A meta analysis study. International Journal of Human Sciences, 6(2), 99–121.

Belfield, C. R., & Crosta, P. M. (2012). Predicting success in college: The importance of placement tests and high school transcripts. CCRC Working Paper No. 42. New York, NY: Community College Research Center, Teachers College, Columbia University.

Brennan, R. L. (Ed.) (2006). Educational measurement (4th ed.). Westport, CT: Praeger Publishers.

Carnoy, M., & Loeb, S. (2002). Does external accountability affect student outcomes? A cross-state analysis. Educational Evaluation and Policy Analysis, 24(4), 305–331.

Celio, M. B. (2013). Seeking the magic metric: Using evidence to identify and track school system quality. In Performance Feedback: Using Data to Improve Educator Performance (Vol. 3, pp. 97–118). Oakland, CA: The Wing Institute.

Espenshade, T. J., & Chung, C. Y. (2010). Standardized admission tests, college performance, and campus diversity. Unpublished paper, Office of Population Research, Princeton University, Princeton, NJ.

Fuchs, L. S. & Fuchs, D. (1986). Effects of systematic formative evaluation: A meta-analysis. Exceptional Children, 53(3), 199–208.

Garrison, C., & Ehringhaus, M. (2007). Formative and summative assessments in the classroom. Westerville, OH: Association for Middle Level Education. https://www.amle.org/portals/0/pdf/articles/Formative_Assessment_Article_Aug2013.pdf

Geiser, S., & Santelices, M. V. (2007). Validity of high-school grades in predicting student success beyond the freshman year: High-school record vs. standardized tests as indicators of four-year college outcomes. Research and Occasional Paper Series. Berkeley, CA: Center for Studies in Higher Education, University of California.

Gibbs, G., & Simpson, C. (2005). Conditions under which assessment supports students’ learning. Learning and Teaching in Higher Education, 1, 3–31.

Hanushek, E. A., & Raymond, M. E. (2005). Does school accountability lead to improved student performance? Journal of Policy Analysis and Management, 24(2), 297–327.

Harlen, W., & James, M. (1997). Assessment and learning: Differences and relationships between formative and summative assessment. Assessment in Education: Principles, Policy & Practice, 4(3), 365–379.

Hart, R., Casserly, M., Uzzell, R., Palacios, M., Corcoran, A., & Spurgeon, L. (2015). Student testing in America’s great city schools: An inventory and preliminary analysis. Washington, DC: Council of the Great City Schools.

Lazarín, M. (2014). Testing overload in America’s schools. Washington, DC: Center for American Progress.

McMillan, J. H., & Schumacher, S. (1997). Research in education: A conceptual approach (4th ed.). New York, NY: Longman.

Mertler, C. A. (1999). Teachers’ (mis)conceptions of classroom test validity and reliability. Paper presented at the annual meeting of the Mid-Western Educational Research Association, Chicago, IL.

Moss, C. M. (2013). Research on classroom summative assessment. In J. H. McMillan (Ed.), Handbook of research on classroom assessment (pp. 235–255). Los Angeles, CA: Sage.

No Child Left Behind (NCLB) Act of 2001, Pub. L. No. 107-110, § 115, Stat. 1425. (2002).

Popham, W. J. (2014). Classroom assessment: What teachers need to know (7th ed.). Boston, MA: Pearson Education.

Reigeluth, C. M. (1999). The elaboration theory: Guidance for scope and sequence decisions. In C. M. Reigeluth (Ed.), Instructional design theories and models: A new paradigm of instructional theory (Vol. II, pp. 425–453). Mahwah, NJ: Lawrence Erlbaum.

Reynolds, C. R., Livingston, R. B., Willson, V., & Willson, V. (2010). Measurement and assessment in education. Upper Saddle River, NJ: Pearson Education.

Rosenshine, B. (2003). High-stakes testing: Another analysis. Education Policy Analysis Archives, 11(24), 1–8.

Spencer, T. D., Detrich, R., & Slocum, T. A. (2012). Evidence-based practice: A framework for making effective decisions. Education and Treatment of Children, 35(2), 127–151.

Shillingburg. W. (2016). Understanding validity and reliability in classroom, school-wide, or district-wide assessments to be used in teacher/principal evaluations. Retrieved from https://cms.azed.gov/home/GetDocumentFile?id=57f6d9b3aadebf0a04b2691a

Taylor, C. S., & Nolen, S. B. (2005). Classroom assessment: Supporting teaching and learning in real classrooms (2nd ed.). Upper Saddle River, NJ: Pearson Prentice Hall.

Wang, M. C., Haertel, G. D., & Walberg, H. J. (1990). What influences learning? A content analysis of review literature. The Journal of Educational Research, 84(1), 30–43.

Yeh, S. S. (2007). The cost-effectiveness of five policies for improving student achievement. American Journal of Evaluation, 28(4), 416–436.

Publications

TITLE

SYNOPSIS

CITATION

LINK

Assessment Overview

Research recognizes the power of assessment to amplify learning and skill acquisition. This overview describes and compares two types of Assessments educators rely on: Formative Assessment and Summative Assessment.