Education Drivers

Teacher Formal Evaluation

Performance appraisals must begin by identifying the purpose of the evaluation. Two different goals are frequently ascribed to performance appraisals: to measure teacher competence and to help improve performance. These two goals are often in conflict. An appraisal that effectively promotes professional development requires trust between the evaluator and the teacher being mentored. Trust is almost impossible to achieve when an appraisal is used as a high-stakes measure that can determine future employment, compensation, or advancement. Research suggests that schools are better served by severing the link between these conflicting and necessary objectives. Separating the two goals increases the likelihood that both can be achieved. Annual performance appraisals are summative; they are snapshots that inform both principal and teacher of satisfactory performance and inform the teacher of any need to improve. For continuous improvement, however, formative assessment is required. Research suggests that effective formative assessment should include systematic classroom observation, student achievement gains, and student and peer performance surveys.

Overview: Formal Teacher Evaluation

Formal Teacher PDF

Cleaver, S., Detrich, R. & States, J. (2018). Overview of Teacher Formal Evaluation. Oakland, CA: The Wing Institute.

Teacher evaluation has been a regular part of the teacher and principal experience since the early 20th century (Shinkfield & Stufflebeam, 1995). In recent years, teacher evaluation has gained focus as a way to improve teacher practice and student outcomes (Marzano, Frontier, & Livingston, 2011).

In general, teacher evaluation is the process of reviewing teacher performance in the classroom (Sawchuk, 2015). It is a combination of inputs such as teacher behaviors, outputs such as student test data, and methods of evaluation such as teacher observation rubrics (Goe, Bell, & Little, 2008).

Teacher evaluation can be formative or summative. Formative evaluation involves collecting information that can be used to shape instruction; it is often thought of as evaluation to be used for something, in this case improving teaching. Summative evaluation is the evaluation of teaching, often conducted at the end of a school year or other specified period. The information from summative evaluations is used to make decisions about teacher promotion or retention and other modifications.

Formal teacher evaluation, a summative measure, has two goals: (a) to assess teacher competency across a time period (often a school year), and (b) to provide feedback on teacher practice. This feedback may be in the form of communication with the teacher, for example, in an end-of-year-review, or it may be in the form of a pay benefit. Either way, the feedback is unlikely to be used in real time as it would be in formative evaluation.

The purpose of this overview is to provide information about the role of formal teacher evaluation, the research that examines the practice, and its impact on student outcomes.

Why Is Formal Teacher Evaluation Important?

A high-quality teacher evaluation system can provide important information about education, work to ensure teacher quality, create a common language around quality instruction, and provide a structure for accountability (Danielson, 2010). First and foremost, it is important because what teachers and principals do each day has a direct impact on students.

Teachers and Principals Matter

When it comes to student achievement, teachers matter (Chetty, Friedman, & Rockoff, 2011) and so do principals (Marzano, Waters, & McNulty, 2005). Examining the impact of factors on student outcomes, Marzano et al. (2005) calculated that 33% of student achievement can be contributed to teachers, and 25% to principals. 

Evaluation Drives Good Teaching and Decision Making

For teacher evaluation to be effective, the methods, or how the evaluation is conducted, and purpose must be explicit (Darling-Hammond, Wise, & Pease, 1983). The process of achieving effective evaluation forces conversations and conclusions about what defines good teaching and expected student outcomes (Danielson, 2010). These conversations and the resulting alignment can help orient a teacher evaluation system around best practices and an understanding of quality education.

Once the teacher evaluation system is in place, the results from teacher evaluations drive decisions that range from retention and bonuses to professional development (McDougald, Griffith, Pennington, & Mead, 2016). Originally, advocates of teacher evaluation hoped that a strong teacher evaluation system would force the removal of ineffective teachers, but this has not been the case (Griffith & McDougald, 2016). However, evaluation continues to shape the broader conversation about what is important in education and how it impacts students (McDougald et al, 2016). 

Background: Formal Teacher Evaluation

Starting in the 20th century, evaluating teachers became an increasingly important function of the principal’s role (Shinkfield & Stufflebeam, 1995). Throughout the 20th century, teacher evaluation was seen either as a way to decisively evaluate teachers as effective or ineffective, or to support and shape teacher practice (Hazi & Arredondo Rucinski, 2009). For example, in the 1960s and 1970s, evaluation shifted from evaluation to teacher support and improvement through clinical supervision models (Hazi & Arredondo Rucinski, 2009).

In the 21st century, collective bargaining agreements, legislation, and national reports (e.g., The Nation’s Report Card) have influenced the development of teacher evaluation (Shinkfield & Stufflebeam, 1995). The No Child Left Behind (NCLB) act focused on highly qualified teachers and evaluation as a way to improve instruction. In response to NCLB, Hazi and Arredondo Rucinski (2009) summarized how states were implementing the teacher evaluation aspect of the law. In general, state education leaders took steps to define teacher quality and created indicators for teacher quality. The level of control that states placed on evaluations varied, but the four general types of teacher evaluation activity included:

  • Adopting National Governors Association strategies (e.g., training evaluators)
  • Engaging in increased oversight and involvement in local evaluation practices
  • Decreasing the frequency of evaluation of veteran teachers
  • Increasing data used in evaluation

In the late 2000s, teacher evaluation came under scrutiny when researchers such as Toch and Rothman (2008) indicated that the majority of teachers were rated “above average” and suggested that teacher evaluations did not correlate with student outcomes, meaning that teachers rated “above average” did not produce strong student outcomes. In addition, feedback provided by principals may not be helpful or even accurate as many principals are not effective at identifying quality instruction (Fink & Markholt, 2011). These findings prompted increased scrutiny of how teachers are evaluated and how the information is used (Goe, Holheide, & Miller, 2011).

Although teacher evaluation was embedded into state law and school district practice during NCLB, states adjusted benchmarks to align with the Common Core State Standards established in 2009 (Aragon, 2018). Then policy was adjusted again with the Every Student Succeeds Act (ESSA), passed in December 2015. Specifically, ESSA rolled back incentives for states and districts to change evaluation policies and altered funding streams (Pennington & Mead, 2016). Under ESSA, states are required to define teacher ineffectiveness but are not required to implement teacher evaluation systems. In response, in 2017, a total of 16 states enacted bills related to the purpose, design, authority over, or progress of teacher evaluation. These bills related to funding, the core focus of evaluation in the state, and the types of data used (Aragon, 2018). In addition, there has been a shift in the type of data incorporated into evaluations; the number of states requiring student growth measures in teacher evaluation decreased to 39 (Aragon, 2018).

In the current policy climate, districts and states continue to prioritize teacher evaluation, and relevant issues remain around the collection and use of teacher data.

Relevant Issues in Formal Teacher Evaluation

The core issues driving formal teacher evaluation are:

  • What should be measured?
  • How do we measure it?
  • What should principals and districts do with formal evaluation information once it has been collected?

What Should Be Measured?

The work that teachers do varies from instruction to engaging with parents, so an important consideration is the type of data that best reflects a teacher’s practice.

Student Assessment Data. One type of data incorporated into teacher evaluation is student test scores as a measure of achievement or mastery (RAND, 2012). Student test data can provide an objective measure linked to student achievement, in contrast to supervisor judgments, which can be subjective (Steele, Hamilton, & Stecher, 2010) or inaccurate (Fink & Markholt, 2011). When used, student test scores must (a) support valid and reliable inferences about how teachers contribute to student achievement and (b) attempt to include teachers that do not teach courses that are directly assessed (e.g., music, gym, and grade levels not included in state testing; Steele et al., 2010).

Alternative Measures.Alternatives to student test data, such as teacher evaluation and student learning objectives, can be used to evaluate teacher performance. These measures have benefits such as increased collaboration and perceived fairness, as well as drawbacks such as cost and implementation challenges (McCullough, English, Angus, & Gill, 2015). Alternative measures also require attention to validity (whether the test is an accurate tool for measuring what it claims to measure) and reliability (how consistent the test is when used multiple times by a variety of assessors) (McCullough et al., 2015). Still, evaluation systems that included alternative measures, particularly those that were able to identify student growth, demonstrated a wider range of teacher performance than systems without alternate measures (McCullough et al., 2015).

How Do We Measure?

Tools for collecting and using data for teacher evaluations range from value-added statistical models to observation rubrics.

Value-Added Measures.These measures evaluate teachers based on their impact on student test scores, or the value the teacher adds to student achievement (Hanushek, 1971; Rockoff, 2004). Value-added models estimate a teacher’s impact on student test performance using statistical techniques. Recent research found that value-added measures are a good way to demonstrate how teachers can raise student test scores and are significantly correlated with some long-term effects such as the probability of attending college and higher earnings by age 28 (Chetty et al., 2011). Important questions related to value-added measures include:

  • Do the differences across teachers capture the impact of teachers or differences among students? That is, do value-added measures capture the right information?
  • What are the lasting impacts of being taught by a teacher with a high value-added score on student outcomes?
  • How valid are the measures used in value-added analysis? If invalid measures are used, then the analysis is untrustworthy.

Supporters of value-added measures argue that when decisions are made based on value-added measures, students benefit (Gordon, Kane, & Staiger, 2006; Hanushek, 2009). On the other hand, critics argue that value-added measures do not capture teacher quality (Baker et al., 2010; Corcoran, 2010) and that bias may limit the usefulness of value-added measures (Kane & Staiger, 2008; Rothstein, 2010). Finally, there are concerns about the stability and trustworthiness of value-added measures (David, 2010; Goldhaber & Hansen, 2008).

Teacher Observations.Classroom observation provides a measure of what is happening during instruction and aligns individual classroom practice with broader quality teaching practices  (Danielson, 2010; RAND, 2012). To be effective, observation frameworks must be subject-specific, must be created in collaboration with content experts, and must provide accurate and useful information (Hill & Grossman, 2013). The Danielson Framework for Teaching (FFT) is a commonly used framework for teacher evaluation (Danielson, 1996, 2007).

Danielson Framework for Teaching.FFT includes an extensive rubric over four domains: planning and preparation, classroom environment, instruction, and professional responsibilities. Across these four domains are 76 elements of teaching broken into four levels: unsatisfactory, basic, proficient, and distinguished. Over time and two iterations (1996 and 2007), FFT has become a widely used tool to capture teaching and learning (Marzano et al., 2011). 

Research indicates that FFT has acceptable reliability and validity (Lash, Tran, & Huang, 2016). Specifically, there is a positive correlation between teacher evaluation scores with FFT and value-added measures at the classroom level; the range for average validity across three years is -0.6 to 0.35 (Milanowski, 2011). When there is variability in scores, it is attributable to the teacher and not other variables (Kane & Staiger, 2012; Kane, Taylor, Tyler, & Wooten, 2011). In short, FFT has proved to be tool that has established reliability and validity as a way to capture teacher practice.

What Should Principals and Districts Do With Formal Evaluation Results?

Districts have attempted to tie teacher evaluation results to job security and bonuses. For example, incentive programs have been implemented in districts ranging from Houston to Memphis with mixed results (Atkinson et al., 2008; Blumenthal, 2016; Springer et al., 2010). School leaders should be mindful of how teacher evaluation results are used. Studies have found that merit pay was not connected to improvements in student outcomes or instruction (Fryer, 2011). In addition, creating an atmosphere of competition by connecting evaluation results to sanctions and punishments had a negative effect on workers (Pink, 2011). Instead, school leaders should use results from formal teacher evaluations to bolster student learning by focusing on how teacher actions impact student results within each building or district, and incorporating teacher evaluation findings into a culture of collaboration (DuFour & Mattos, 2013).

Continuum of Research: Does Formal Teacher Evaluation Have a Positive Impact on Student Outcomes?

Currently, much of the research on teacher evaluation focuses on helping us understand what goes into and results from teacher evaluation, less research directly ties formal teacher evaluation to student outcomes. Studies that have addressed the specific impact of teacher evaluation systems on student outcomes have found mixed results.

For example, a study of the mid-career elementary and middle school teachers in the Cincinnati Public Schools Teacher Evaluation System (TES), which used FFT across seven consecutive years, found that teachers were more effective in advancing math achievement during the year in which they were evaluated. The study did not draw conclusions about what in teacher practice influenced the differences in student achievement (Taylor & Tyler, 2012a, 2012b).

The Gates Foundation studied the effects of teacher evaluation systems across three districts (Hillsborough County Public Schools in Florida, Memphis City Schools, and Pittsburgh Public Schools) and four charter management organizations. The information collected from teacher evaluation systems were used to make decisions about staffing, areas of development, and teacher advancement and compensation. The researchers hypothesized that when the right teacher evaluation system was in place, teaching quality would improve and lead to an increase in student achievement. The final report showed no impact on student achievement or graduation rates, particularly for low-income minority students. One possible explanation for this was teacher buy-in; across the sites, 50% of teachers agreed that the evaluation system would benefit students, a percentage that declined over the years of the initiative. Impacts on student achievement were mixed across the schools, perhaps because it takes longer than the time frame of the study to see a clear impact on student outcomes. Also, there were external changes (e.g., changes in state-level policy) that impacted the implementation (Stecher et al., 2018).

Implications of Research

Teacher evaluation is established as a function of education (Shinkfield & Stufflebeam, 1995). Recent studies has established problems and recommendations for best practice in teacher evaluation.

The Widget Effect(Weisburg, Sexton, Mulhern, & Keeling, 2009) reported that teacher evaluation has been:

  • Infrequent, with teachers going for years without meaningful feedback
  • Not focused on classroom behaviors or practices that are directly tied to student learning
  • Limited in scope, with teachers identified as “unsatisfactory” or “satisfactory”
  • Unhelpful in the type of information provided to teachers
  • Insignificant, providing information that is not used to shape teachers’ work experience or opportunities

In response, the New Teacher Project (2010) proposed that teacher evaluations should:

  • Occur annually, to provide feedback over the course of a teacher’s career
  • Be conducted using clear, rigorous performance expectations based on student learning
  • Include multiple measures (e.g., value-added models, classroom observation data) and ratings to ensure that the range of a teacher’s work is represented
  • Provide a range of achievement levels, such as the four summative ratings of the Danielson Framework for Teaching
  • Provide information that can be incorporated into ongoing conversation and development throughout the year
  • Produce information relevant to teachers and with implications, both positive and negative, for the overall development of the system and individual classrooms

Best practices for using student achievement data in teacher evaluation are also being established. Steele, Hamilton, and Stecher (2010) determined that evaluation systems should:

  • Incorporate multiple measures of teacher effectiveness to increase validity, reduce measurement error, and capture the range of teaching roles beyond the ones regularly tested
  • Ensure that assessment data is reliable and valid, particularly in high-stakes contexts
  • Ensure consistency by providing clear parameters for the selection of measures and using the same measures across classrooms
  • Use multiple years of student data for value-added estimates to increase accuracy and precision of the estimates
  • Find ways to incorporate all students, including those who are not with the teacher for the full year, and teachers who are not easily incorporated into value-added models.


The cost-benefit of formal teacher evaluation will vary from district to district, depending on such considerations as type of information gathered, tools used, and staffing involved (Peterson, 2000). One study found that the cost to start a teacher evaluation system across three districts ranged from $8 to $115 per student, which amounted to 0.4% to 0.5% of total district spending (Chambers, Brodziak de los Reyes, & O’Neill, 2013).


Formal teacher evaluation is integrated into many state and district policies, and, even with shifts in federal focus under ESSA, is likely to remain common practice. The goal of formal teacher evaluation is to collect data that accurately represents teacher practice and the connection to student achievement in a valid and reliable way, and use that information to improve the system for teaching and learning. Although conclusions about the impact of teacher evaluation on student achievement are mixed (Stecher et al., 2018; Taylor & Tyler, 2012a, 2012b), ideally collecting and using information about teacher practice can advance the conversation about quality instruction and teaching potential.


Aragon, S. (2018). Teacher evaluations: What is the issue and why does it matter? Policy snapshot.Denver, CO: Education Commission of the States. Retrieved from

Atkinson, A., Burgess, S., Croxon, B., Gregg, P., Propper, C., Slater, H., & Wilson, D. (2008). Evaluating the impact of performance-related pay for teachers in England. Labour Economics, 16(3), 251–261.

Baker, E. L., Barton, P. E., Darling-Hammond, L., Haertel, E., Ladd, H. F., Linn, R. L., Ravitch, D., …Shepard, L. A. (2010). Problems with the use of student test scores to evaluate teachers(Briefing Paper 278). Washington, DC: Economic Policy Institute.

Blumenthal, R. (2016, January 13). Houston ties teachers’ pay to test scores. The New York Times.Retrieved from

Chambers, J., Brodziak de los Reyes, I., & O’Neil, C. (2013). How much are districts spending to implement teacher evaluation systems: Case studies of Hillsborough County Public Schools, Memphis City Schools, and Pittsburgh Public Schools.Santa Monica, CA: RAND Corporation. Retrieved from

Chetty, R., Friedman, J. N., & Rockhoff, J. E. (2011). The long-term impacts of teachers: Teacher value-added and student outcomes in adulthood(Working Paper 17699). Cambridge, MA: National Bureau of Economic Research. Retrieved from

Corcoran, S. P. (2010). Can teachers be evaluated by their students’ test scores? Should they be? The use of value-added measures for teacher effectiveness in policy and practice. Providence, RI: Annenburg Institute for School Reform at Brown University.

Danielson, C. (1996, 2007). Enhancing professional practice: A framework for teaching (1st and 2nd eds).Alexandria, VA: ASCD.   

Danielson, C. (2010). Evaluations that help teachers learn. Educational Leadership, 68(4), 35–39.

Darling-Hammond, L., Wise, A. E., & Pease, S. R. (1983). Teacher evaluation in the organizational context: A review of the literature. Review of Educational Research, 53(3),285–328. doi: 10.3102/00346543053003285

David, J. L. (2010). What research says about using value-added measures to evaluate teachers. Educational Leadership, 67(8), 81–82. Retrieved from

DuFour, R., & Mattos, M. (2013). How do principals really improve schools? Education Leadership, 70(7), 34–40.

Fink, S., & Markholt, A. (2011). Leading for instructional improvement: How successful leaders develop teaching and learning expertise.Hoboken, NJ: John Wiley & Sons.

Fryer, R. G. (2011).Teacher incentives and student achievement: Evidence from New York City schools(Working Paper 16850). Cambridge, MA: National Bureau of Economic Research. Retrieved from

Goe, L., Bell, C., & Little, O. (2008). Approaches to evaluating teacher effectiveness: A research synthesis. Washington, DC: National Comprehensive Center for Teacher Quality. Retrieved from

Goe, L., Holdheide, L., & Miller, T. (2011). A practical guide to designing comprehensive teacher evaluation systems: A tool to assist in the development of teacher evaluation systems.Washington, DC: National Comprehensive Center for Teacher Quality. Retrieved from

Goldhaber, D., & Hansen, M. (2008). Is this just a bad class? Assessing the stability of measured teacher performance(Working Paper 2008-5). Seattle, WA: Center on Reinventing Public Education, University of Washington.

Gordon, R., Kaine, T. J., & Staiger, D. O. (2006). Identifying effective teachers using performance on the job(Hamilton Project Discussion Paper). Washington, DC: The Brookings Institute.

Griffith, D., & McDougald, V. (2016). Undue process: Why bad teachers in twenty-five diverse districts rarely get fired.Washington, DC: Thomas B. Fordham Institute. Retrieved from

Hanushek, E. A. (1971). Teacher characteristics and gains in student achievement: Estimation using micro-data. American Economic Review, 61(2), 280–288.

Hanushek, E. A. (2009). Teacher deselection.In D. Goldhaber & J. Hannaway (Eds.), Creating a new teacher profession(pp. 165–180). Washington, DC: Urban Institute Press.

Hazi, H. M., & Arredondo Rucinski, D. (2009). Teacher evaluation as a policy target for improved student learning: A fifty-state review of statute and regulatory action since NCLB. Education Policy Analysis Archive, 17(5).

Hill, H., & Grossman, P. (2013). Learning from teacher observations: Challenges and opportunities posed by new teacher evaluation systems. Harvard Educational Review, 83(2),371–384.

Kane, T. J., & Staigler, D. O. (2008). Estimating teacher impacts on student achievement: An experimental evaluation (Working Paper 14607). Cambridge, MA: National Bureau of Economic Research.

Kane, T. J., & Staigler, D. O. (2012). Gathering feedback for teaching: Combining high-quality observations with student surveys and achievement gains.Seattle, WA: Bill and Melinda Gates Foundation.

Kane, T. J., Taylor, E. S., Tyler, J. H., & Wooten, A. L. (2011). Identifying effective classroom practices using achievement data. Journal of Human Resources, 46(3), 587–613.

Lash, A., Tran, L., & Huang, M. (2016). Examining the validity of ratings from a classroom observation instrument for use in a district’s teacher evaluation system(REL 2016-135). Washington, DC: U.S. Department of Education, Institute of Education Sciences, National Center for Education Evaluation and Regional Assistance, Regional Educational Laboratory West.

Marzano, R. J., Frontier, T., & Livingston, D. (2011). Effective supervision: Supporting the art and science of teaching. Alexandria, VA: ASCD.

Marzano, R. J., Waters, T., & McNulty, B. A. (2005).School leadership that works: From research to results. Alexandria, VA: ASCD.

McCullough, M., English, B., Angus, M. H., & Gill, B. (2015). Alternative student growth measures for teacher evaluation: Implementation experiences of early-adopting districts (REL 2015-093). Washington, DC: U.S. Department of Education, Institute of Education Sci­ences, National Center for Education Evaluation and Regional Assistance, Regional Educational Laboratory Mid-Atlantic.

McDougald, V., Griffith, D., Pennington, K., & Mead, S. (2016). What is the purpose of teacher evaluation today? A conversation between Bellwether and Fordham. Retrieved from

Milanowski, A. T., (2011, April). Validity research on teacher evaluation systems based on the framework for teaching.Paper presented at the annual meeting of the American Educational Research Association, New Orleans, LA. Retrieved from

The New Teacher Project. (2010). Teacher Evaluation 2.0.New York, NY: Author. Retrieved from:

Pennington, K., & Mead, S. (2016). For good measure? Teacher evaluation policy in the ESSA era. Washington, DC: Bellwether Education Partners. Retrieved from

Peterson, K. D. (2000). Teacher evaluation: A comprehensive guide to new directions and practices (2nd ed.).Thousand Oaks, CA: Corwin Press.

Pink, D. H. (2011). Drive: The surprising truth about what motivates us.New York, NY: Riverhead Books.

RAND Education. (2012).Teachers matter: Understanding teachers’ impact on student achievement, Santa Monica, Calif.: Author. Retrieved from

Rockoff, J. E. (2004). The impact of individual teachers on student achievement: Evidence from panel data. American Economic Review, 94(2), 247–252.

Rothstein, J. (2010). Teacher quality in educational production: Tracking, decay, and student achievement. Quarterly Journal of Economics, 125(1),175–214.

Sawchuk, S. (2015, September 3). Teacher evaluation: An issue overview. Education Week.Retrieved from:

Shinkfield, A. J., & Stufflebeam, D. L. (1995). Teacher evaluation: Guide to professional practice.New York, NY: Springer.

Springer, M. G., Ballou, D., Hamilton, L., Le, V., Lockwood, J. R., McCaffrey, D. F., …Stecher B.M. (2010).Teacher pay for performance: Experimental evidence from the project on incentives in teaching (POINT).Nashville, TN: National Center on Performance Incentives at Vanderbilt University.

Stecher, B. M., Holtzman, D. J., Garet, M. S., Hamilton, L. S., Engberg, J., Steiner, E. D., …Chambers, J. (2018). Improving teaching effectiveness: Final report: The intensive partnerships for effective teaching through 2015–2016.Santa Monica, CA: RAND Corporation.

Steele, J. L., Hamilton, L. S., & Stecher, B. M. (2010). Incorporating student performance measures into teacher evaluation systems.Santa Monica, CA: RAND Corporation. Retrieved from:

Taylor, E. S., & Tyler, J. H. (2012a). Can teacher evaluation improve teaching? Evidence of systematic growth in the effectiveness of mid-career teachers. Education Next, 12(4), 79–84. Retrieved from

Taylor, E. S., & Tyler, J. H. (2012b). The effect of evaluation on teacher performance. American Economic Review, 102(7), 3628–3651.

Toch, T., & Rothman, R. (2008). Rush to judgment: Teacher evaluation in public education.Washington, DC: Education Sector. Retrieved from 

Weisburg, D., Sexton, S., Mulhern, J., & Keeling, D. (2009). The widget effect: Our national failure to acknowledge and act on the difference in teacher effectiveness. New York, NY: The New Teacher Project. Retrieved from



Overview of Teacher Evaluation. Oakland, CA: The Wing Institute

Teachers contribute to student achievement. As a practice, teacher evaluation has developed over time. Today, the focus of teacher evaluation is to determine the impact of teaching on student outcomes and for use as professional development. Research on teacher evaluation has produced mixed results. This overview provides information about teacher evaluation as it relates to collecting information about teacher practice and using it to improve student outcomes. The history of teacher evaluation and current research findings and implications are included.

Cleaver, S., Detrich, R. & States, J. (2018). Overview of Teacher Evaluation. Oakland, CA: The Wing Institute.


Data Mining

What is the relationship between teacher working conditions and school performance?
This inquiry looks at the effect of time on the job and the quality of a teacher's skills.
Keyworth, R. (2010). What is the relationship between teacher working conditions and school performance? Retrieved from what-is-relationship-between882.
Enhancing Adherence to a Problem Solving Model for Middle-School Pre-Referral Teams: A Performance Feedback and Checklist Approach

This study looks at the use of performance feedback and checklists to improve middle-school teams problem solving.

Bartels, S. M., & Mortenson, B. P. (2006). Enhancing adherence to a problem-solving model for middle-school pre-referral teams: A performance feedback and checklist approach. Journal of Applied School Psychology, 22(1), 109-123.

Do Principals Know Good Teaching When They See It?

This article examines the effectiveness and related issues of current methods of principal evaluation of teachers.

Burns M. (2011). Do Principals Know Good Teaching When They See It?. Educational policy, 19(1), 155-180.

Effective Teaching: What Is It and How Is It Measured?

Research supports significant difference in the performance of teachers in the top quartile versus the bottom. Teachers working with the most challenging students are often not afforded the status that teachers working with gifted  or advance placement. Students in the bottom quartile are often given the teachers who are frequently given less effective teachers along with new teachers whom research finds are less effective than experienced peers. Research finds current evaluation systems are unable to effectively assess the ability of teachers. This results in teacher not receiving feedback to enable them to improve. These measures infrequently inform teacher assignments, professional development, or career advancement. Teachers are left on their own to self-determine their own strengths and weaknesses. This paper examines how to measure teacher performance and the practices necessary for increasing teacher trust in systems designed to effectively measure performance.

Cantrell, S., & Scantlebury, J. (2011). Effective Teaching: What Is It and How Is It Measured?. Effective Teaching as a Civil Right, 28.

The Long-Term Impacts Of Teachers: Teacher Value-Added And Student Outcomes In Adulthood

This paper examines the issue of efficacy of value-added measures in evaluating teachers. This question is important in understanding whether value-added analysis provides unbiased estimates of teachers’ impact on student achievement and whether these teachers improve long-term student outcomes.

Chetty, R., Friedman, J. N., & Rockoff, J. E. (2011). The long-term impacts of teachers: Teacher value-added and student outcomes in adulthood (No. w17699). National Bureau of Economic Research.

Effects of immediate performance feedback on implementation of behavior support plans, 2005

The purpose of this study is to examine the effects of feedback on treatment integrity for implementing behavior support plans.

Codding, R. S., Feinberg, A. B., Dunn, E. K., & Pace, G. M. (2005). Effects of immediate performance feedback on implementation of behavior support plans. Journal of Applied Behavior Analysis, 38(2), 205-219.

Selecting growth measures for school and teacher evaluations: Should proportionality matter?

In this paper we take up the question of model choice and examine three competing approaches. The first approach, (SGPs) framework, eschews all controls for student covariates and schooling environments. The second approach, value-added models (VAMs), controls for student background characteristics and under some conditions can be used to identify the causal effects of schools and teachers. The third approach, also VAM-based, fully levels the playing field so that the correlation between school- and teacher-level growth measures and student demographics is essentially zero. We argue that the third approach is the most desirable for use in educational evaluation systems.

Ehlert, M., Koedel, C., Parsons, E., & Podgursky, M. (2013). Selecting growth measures for school and teacher evaluations: Should proportionality matter?. National Center for Analysis of Longitudinal Data in Education Research, 21.

Leading for Instructional Improvement: How Successful Leaders Develop Teaching and Learning Expertise

This book shows how principals and other school leaders can develop the skills necessary for teachers to deliver high quality instruction by introducing principals to a five-part model of effective instruction.

Fink, S., & Markholt, A. (2011). Leading for instructional improvement: How successful leaders develop teaching and learning expertise. John Wiley & Sons.

The Power of Feedback

This paper provides a conceptual analysis of feedback and reviews the evidence related to its impact on learning and achievement.

Hattie, J., & Timperley, H. (2007). The power of feedback. Review of educational research, 77(1), 81-112.

Learning from teacher observations: Challenges and opportunities posed by new teacher evaluation systems

This article discusses the current focus on using teacher observation instruments as part of new teacher evaluation systems being considered and implemented by states and districts. They argue that if these teacher observation instruments are to achieve the goal of supporting teachers in improving instructional practice, they must be subject-specific, involve content experts in the process of observation, and provide information that is both accurate and useful for teachers. They discuss the instruments themselves, raters and system design, and timing of and feedback from the observations.

Hill, H., & Grossman, P. (2013). Learning from teacher observations: Challenges and opportunities posed by new teacher evaluation systems. Harvard Educational Review, 83(2), 371-384.

Can Principals Identify Effective Teachers? Evidence on Subjective Performance Evaluation in Education

This paper examines how well principals can distinguish between more and less effective teachers. To put principal evaluations in context, we compare them with the traditional determinants of teacher compensation-education and experience-as well as value-added measures of teacher effectiveness.

Jacob, B. A., & Lefgren, L. (2008). Can principals identify effective teachers? Evidence on subjective performance evaluation in education. Journal of Labor Economics, 26(1), 101-136.

Alternative student growth measures for teacher evaluation: Implementation experiences of early-adopting districts

This study examines implementation of alternative student growth measures in a sample of eight school districts that were early adopters of the measures. It builds on an earlier Region­ al Educational Laboratory Mid-Atlantic report that described the two types of alterna­tive student growth measures—alternative assessment–based value-added models and student learning objectives—in the early-adopting districts.

McCullough, M., English, B., Angus, M. H., & Gill, B. (2015). Alternative student growth measures for teacher evaluation: Implementation experiences of early-adopting districts (No. 8a9dfcb1bc6143608448114ea9b69d06). Mathematica Policy Research.

Teacher Merit Pay and Student Test Scores: A Meta-Analysis

Teacher merit pay has garnered significant attention as a promising reform method for improving teacher performance and, more importantly, student achievement scores. This meta-analysis, which examined findings from 44 studies of teacher merit pay, found that merit pay is associated with a modest, statistically significant, positive effect on student test scores. The research also found that not all merit pay programs are equal. The best results are dependent on constructing efforts that incorporate sound, evidence-based practice elements.

Pham, L., Nguyen, T., & Springer, M. (2017). Teacher Merit Pay and Student Test Scores: A Meta-Analysis. Nashville, TN: Vanderbilt University.

Houston ties teachers’ pay to test scores
This report is a look at Houston’s teacher performance pay system.
Blumenthal, R. (2006). Houston ties teachers’ pay to test scores. New York Times, 13.
Who leaves, Teacher attrition and student achievement
The purpose of this paper is to examine the relationship between student achievement and teacher attrition using value-added modeling for teachers in New York City.
Boyd, D., Lankford, H., Loeb, S., & Wyckoff, J. (2007). Who leaves, Teacher attrition and student achievement (Research Report). Albany, NY: Teacher Policy Research.
Measuring the Impacts of Teachers II: Teacher Value-Added and Student Outcomes in Adulthood
This paper examines the issue of the efficacy of valued-added measures in evaluating the effectiveness of teachers and long term impact on student’s lives.
Chetty, R., Friedman, J. N., & Rockoff, J. E. (in press II). Measuring the impact of teachers II: Evaluating bias in teacher value-added estimates. American Economic Review.
Pay for Performance: What Are the Issues?
The purpose of this paper is to look at the impact and value of merit pay, performance pay, knowledge and skill-based pay.
Delisio, E. R., (2014). Pay for Performance: What Are the Issues?. Education World
An Evaluation of the Teacher Advancement Program (TAP) in Chicago: Year Two Impact Report
Mathematica researchers review 2008-09 data from Chicago schools participating in the Teacher Advancement Program annual bonus system based student achievement.
Glazerman, S., & Seifullah, A. (2010). An Evaluation of the Teacher Advancement Program (TAP) in Chicago: Year Two Impact Report. Mathematica Policy Research, Inc.
Are public schools really losing their “best”?: Assessing the career transitions of teachers and their implication for the quality of the teacher workforce
The purpose of this paper is to examine attrition and mobility of teachers using teacher value-added measures for early-career teachers in North Carolina public schools from 1996 to 2002. The results suggest the best teachers remain in teaching and stay in high socioeconomic Status and high performing schools.
Goldhaber, D., Gross, B., & Player, D. (2007). Are public schools really losing their “best”?: Assessing the career transitions of teachers and their implication for the quality of the teacher workforce. Center for Analysis of Longitudinal Data in Education Research (Working Paper 12). Washington, D.C. Urban Institute. H
Supporting Principals in Implementing Teacher Evaluation Systems
With so much emphasis being placed on improving teacher performance, The National Association of Elementary School Principals and the National Association of Secondary School Principals have developed recommendations to support principals more effectively evaluate teachers.
Grissom, J. A., Loeb, S., & Master, B. (2013). Effective Instructional Time Use for School Leaders: Longitudinal Evidence from Observations of Principals. Educational Researcher, 42(8), 433-444.
Why public schools lose teachers
This paper examines the issue of teacher attrition and the factors that motivate teachers leaving schools. The results indicate that teacher mobility is much more strongly related to characteristics of the student population (race and lower socioeconomic status) and achievement. The study finds salary plays a much smaller role in these decisions.
Hanushek, E., Kain, J., & Rivkin, S. (2004). Why public schools lose teachers. Journal of Human Resources, 39(2), 326-354.
Performance Contracts for Administrators
This paper examines the issues for performance compensation for school principals and other administrators.
Hertling, E. (1999). Performance contracts for administrators.
Teacher turnover and teacher shortages: An organizational analysis
This paper investigates organizational characteristics and conditions in schools that drive staffing problems and teacher turnover.
Ingersoll, R. (2001). Teacher turnover and teacher shortages: An organizational analysis. American Educational Research Journal, 38(3), 499-534.
Why Schools Have Difficulty Staffing Their Classrooms with Qualified Teachers
This is taken from the testimony of Richard Ingersoll in front the Pennsylvania legislature on the issues of school turnover.
Ingersoll, R. M. (2013). Why Schools Have Difficulty Staffing Their Classrooms with Qualified Teachers. Retrieved October 3, 2014
American Statistical Association’s Recent Position Statement on Value-Added Models (VAMs): Five Points of Contention
These commentaries critiques the work that links teacher value-added models to students’ long-run outcomes.
Interpretation, T. M. Q. Chetty et al. on the American Statistical Association’s Recent Position Statement on Value-Added Models (VAMs): Five Points of Contention.
Do Principals Fire the Worst Teachers?
This paper examines how principals make decisions regarding teacher dismissal. In 2004, the Chicago Public Schools (CPS) and Chicago Teachers Union (CTU) gave principals great flexibility to dismiss probationary teachers for any reason. The study estimates the relative weight that school administrators place on a variety of teacher characteristics and finds evidence that principals do consider teacher absences and value-added measures, along with several demographic characteristics, in determining which teachers to dismiss.
Jacob, B. A. (2010). Do principals fire the worst teachers? (No. w15715). National Bureau of Economic Research.
National Council on Teacher Quality (NCTQ)

The National Council on Teacher Quality works to achieve fundamental changes in the policy and practices of teacher preparation programs, school districts, state governments, and teachers unions.

New Teacher Center
The New Teacher Center provides research, policy analyses, training and support for improving new teacher support and induction.
Back to Top