Teacher Evaluation

Teachers contribute to student achievement. As a practice, teacher evaluation has developed over time. Today, the focus of teacher evaluation is to determine the impact of teaching on student outcomes and for use as professional development. Research on teacher evaluation has produced mixed results. This overview provides information about teacher evaluation as it relates to collecting information about teacher practice and using it to improve student outcomes. The history of teacher evaluation and current research findings and implications are included.

Teacher Evaluation Overview

Teacher Evaluation PDF

Cleaver, S., Detrich, R. & States, J. (2018). Overview of Teacher Evaluation. Oakland, CA: The Wing Institute. https://www.winginstitute.org/assessment-summative.

As students progress through school, many elements—home experiences, classroom instruction, and internal factors—influence their eventual outcomes. In the school environment, a teacher’s skills, strengths, and abilities have as much of an influence on student learning as student background (Wenglinsky, 2002). Put another way, teachers matter; teachers who are effectivecontribute to positive student outcomes and achievement (Johnson & Zwick, 1990; Nye, Konstantopoulus, & Hedges, 2004; Sanders, Wright, & Horn, 1997), so it is important to understand what effective teachers do that influence student outcomes. Equally important is to provide teachers with information and feedback they can use to become better practitioners. That’s where teacher evaluation comes in.

Teacher Evaluation

Teacher evaluation is conducted to ensure teacher quality and to promote professional learning with the goal of improving future performance (Danielson, 2010). A basic definition of teacher evaluation is the formal process used to review teacher performance and effectiveness in the classroom (Sawchuk, 2015). However, this definition is an oversimplification. In practice, teacher evaluation involves understanding and agreeing on the inputs (e.g., the practices that define quality teaching), outputs (e.g., student achievement measures), and methods of evaluation (e.g., student assessment data, teacher observation rubrics). The elements of evaluation are rarely agreed on (Goe, Bell, & Little, 2008). This overview provides information about teacher evaluation as it relates to collecting information about teacher practice and using it to improve student outcomes.

Teacher Evaluation for Improvement and Accountability

Teacher evaluation serves two purposes: improvement and accountability. Evaluation provides teachers with information that can improve their practice and serve as a starting point for professional development; for example, using information from teacher evaluations to set a plan of study for professional learning community (PLC) meetings. Evaluation provides accountability when information gained from the evaluation is used to guide decisions regarding bonuses, firing, and other human resource decisions (Santiago & Benavides, 2009).

There is an inherent tension between these two purposes. On one hand, when teachers feel they are focused on improvement, accountability can feel incongruent and teachers may not want to provide accurate information because of the risk of revealing weaknesses. On the other hand, when the focus is on accountability, teachers may feel insecure about their work (Santiago & Benavides, 2009). Goals around improvement may hinder the ability to use evaluation for accountability decisions, while goals around accountability may prevent or obfuscate improvement efforts. If the teacher evaluation process becomes too cumbersome or aversive for either the teacher or evaluator, the process will be in jeopardy.

Summative and Formative Evaluation

Teacher evaluation can serve a summative or formative purpose. Summative evaluation provides conclusive evaluation of a teacher’s performance to determine how well that individual has done his or her work (Marzano, 2012). In this type of evaluation, a supervisor evaluates a teacher using a combination of measures that may include student test scores, lesson plans and artifacts, and rating scales or rubrics. Teachers are not involved and the results are used for accountability decisions such as pay awards or dismissal (Marzano, 2012).

            Formative evaluation provides ongoing information about teacher practice with the goal of providing feedback that helps teachers improve. Teachers are often involved in the process through self-reflection or self-assessment. The results of the evaluation may be used to give teachers feedback, and to make decisions regarding the professional development or coaching support that teachers receive (Sayavedra, 2014).  

History and Current State of Teacher Evaluation

In the early 20th century, the framework of scientific management, or the idea that every task can be broken down into its best and most efficient method, was applied to education (Marzano, Frontier, & Livingston, 2011). This started a focus on examining teacher behavior, providing suggestions for feedback, and evaluating effectiveness in the classroom (Marzano et al., 2011). Since World War II, the role of evaluation has evolved. Clinical supervision, popular in the 1960s and 1970s, was the first major trend. It involved a pre-observation conference, teacher observation, reflection, and analysis with a focus on classroom behaviors that directly impacted learning. In the 1980s, the Hunter lesson design, also called mastery teaching, was incorporated into observation and evaluation so that administrators observed a specific lesson sequence: anticipatory set, objective and purpose, input, model, checking for understanding, guided practice, and independent practice (Hunter, 1984).

            In the mid-1980s, alternatives to clinical supervision and mastery teaching were proposed. In these alternatives, the teacher became a core element in evaluation and principals were expected to differentiate observation and evaluation depending on teachers’ needs and experience (Marzano et al., 2011). Throughout the 1980s and 1990s, there was a shift away from structured observation, along with a move toward formal teacher evaluation (Marzano et al., 2011).

            One of these shifts was prompted by a RAND group study of 32 districts across the United States (Wise, Darling-Hammond, McLaughlin, & Bernstein, 1984). The RAND study concluded that there were four primary concerns regarding then-current evaluation: (a) Principals were not committed or able to provide accurate evaluations, (b) teachers were not open to receiving feedback, (c) evaluation practices were not uniform, and (d) evaluators were not trained (Wise et al., 1984). The RAND study also outlined the following recommendations for evaluation:

  • Evaluation systems should align with goals without being overly prescriptive.
  • Principals need time, training, and oversight to implement evaluations effectively.
  • An evaluation system should align with the overarching purpose (and a district may need multiple evaluations to align with multiple goals).
  • Resources need to be provided and allocated effectively.
  • Teachers need to be involved in the design, monitoring, and implementation of evaluation systems.

Throughout the 20th century, teacher evaluation was a district-level initiative, more focused on teacher behavior and administrative supervision. In the 21st century, teacher evaluation has become a focus of national policy, and the emphasis has shifted to evaluation of teacher quality and student achievement (Marzano et al., 2011).

In the late 2000s, two reports critiqued the teacher evaluation system and set the stage for the current conversation. First, Toch and Rothman’s report Rush to Judgment critiqued teacher evaluation as “superficial and capricious” (2008, p. 1) and ascertained that it did not measure student learning. And, despite No Child Left Behind requirements, Toch and Rothman found only 14 states that required annual teacher evaluations. Similarly, Weisberg, Sexton, Mulhern, and Keeling (2009), in The Widget Effect,found that fewer than 1% of 15,000 teachers in 12 districts and four states were rated “unsatisfactory” and that little action was taken based on results from teacher evaluations. The authors argued that districts were treating teachers as widgets, or interchangeable parts in a system, not as individual professionals with the potential to have an important impact on instructional effectiveness and student outcomes.  

            This increased concern about how teacher evaluations were being conducted and used, along with legislation around teacher quality, focused state legislature attention on teacher evaluation (Goe, Holdheide, & Miller, 2011). The current conversation still focuses on how teacher evaluations are conducted; the impact of teacher evaluation on teacher effectiveness and student outcomes; and how results are used, for example, in professional development (Sawchuk, 2015).

Relevant Issues in Teacher Evaluation

Current issues in teacher evaluation revolve around core questions on how to design and implement an evaluation, including what framework to use, what to measure, and how to collect data.


A framework outlines the guiding principles for a teacher evaluation. It provides credibility in the system, and assurance that evaluators can confidently ascertain the quality of teachers (Danielson, 2010). That framework should include:

  • A clear definition of good teaching that is agreed on by everyone involved (Danielson, 2010).
  • An understanding of the purpose of the evaluation, which may be information gathering, accountability, or improvement, or any combination of the three (Goe et al., 2008).
  • A clear purpose that provides information about whether the evaluation is formative or summative, and how the results will be used (Goe et al., 2008).
  • An understanding of who is involved and how, the tools that will be used, and the stakeholders involved (Santiago & Benavides, 2009).


Teacher quality is measured both quantitatively (e.g., student test scores) and qualitatively (e.g., notes on teacher professionalism). An analysis of 120 studies (Goe et al., 2008) identified qualitative elements of effective teachers:

  • Positive contribution to academic, attitudinal, and social outcomes for students
  • Comprehensive lesson planning, progress monitoring, and instruction adaption and evaluation capacity
  • Diversity and civic-mindedness
  • Collaboration with stakeholders (e.g., parents, administrators), particularly for students who are at risk (e.g., those with individualized education programs, or IEPs)

Once the elements that will be measured are clear, how to measure each aspect must be considered. While summative evaluations should include a comprehensive variety of measures that can provide a full picture of a teacher’s effectiveness, formative evaluations may include any range of measures used to collect enough information to serve the purpose of the evaluation. The measures used in formative evaluation may also be more teacher focused, including self-assessment, observation, peer mentoring, and coaching. When coaching and peer mentoring are used, it is important to consider training evaluators in how to deliver feedback that leads to improved teacher performance.

            Another consideration for measurement is the reliability and validity of tools. Reliability of a tool is how well it produces consistent and stable results. Tools that are used to measure teacher practices must be reliable and valid; they must provide information that is consistent across multiple evaluators and that measure teacher practice without measuring any other factors at the same time. Also, tools used to gauge student outcomes must be valid, meaning that the scores must accurately measure the outcome without measuring anything else (Goe et al., 2008).

            Blanton et al. (2003) outlined additional criteria that inform the usefulness of a measurement tool:

  • The ability to capture all aspects of a teacher’s effectiveness
  • The ability to capture the range of activities in a teacher’s work
  • Usefulness of the scores to be used for a specific purpose
  • Feasibility, including the cost, training required, and other considerations
  • Credibility or the trust that the stakeholders have in the measure

Charlotte Danielson Framework for Teaching.

A common measure used for teacher evaluation is the Charlotte Danielson Framework for Teaching (Danielson, 1996, 2007), which includes an extensive rubric over four domains: planning and preparation, classroom environment, instruction, and professional responsibilities. Across these four domains, the rubric incorporates 76 elements of teaching broken into four levels of performance (unsatisfactory, basic, proficient, and distinguished). Over time and two iterations (1996 and 2007), the Danielson framework has become the primary tool for capturing teaching and learning (Marzano et al., 2011). The Danielson Framework for Teaching (Danielson, 1996) was intended to do three things:

  • Acknowledge the difficulty and complexity of teaching as a profession.
  • Create a language for professional engagement.
  • Provide a structure for teacher assessment and reflection.

Research conducted on the Danielson framework indicates acceptable reliability and validity (Lash, Tran, & Huang, 2016). When there is score variance, it is attributable to the teacher, not other variables (Kane & Staiger, 2012; Kane, Taylor, Tyler, & Wooten, 2011). This means that when a score differs from one evaluation to the next, such as when a teacher advances in the area of planning and preparation from fall to winter, the difference between the two scores occurs because the teacher changed his or her practice, not because the tool was unclear. The reliability of achievement growth scores varies (Kane & Staiger, 2012; Lash et al., 2016). One study that used evaluations from 156 teachers across 18 high-poverty charter schools in the mid-Atlantic concluded that using multiple measures across a school year (in this case, three separate observations using the Danielson framework) provided a reliable measure (Kettler & Reddy, 2017).

Value-Added Measures

Value-added measures are a way to take into account the various conditions and factors that contribute to student achievement, across multiple years of teaching, and in comparison with other teachers . This way of calculating a teacher’s effectiveness was developed in the 2000s using statistical models that could determine how much one teacher contributed to student learning (Goe et al., 2008).

Because they are removed from the immediate classroom experience and seem disconnected from what happens in classrooms, value-added measures are controversial (Goe et al., 2008). However, these measures do have reliability. A study by the Bill and Melinda Gates Foundation (2010) found that teachers whose students showed gains in one assessment were likely to show gains in related assessments that measured conceptual understanding. For example, a math teacher whose students scored high on the state math assessment was likely to have students who also demonstrated a deep knowledge of the core principles of math. The correlation between teacher value-added measures on state tests and deeper understanding were higher for math (0.54) than for reading (0.37). However, it is important to consider that teachers who produce strong value-added scores on state tests may also develop students’ overarching skills and depth of knowledge about the subject.

As a summative measure, value-added measures provide an overarching look at a teacher’s impact over time. Yet, as a formative tool, value-added measures do not provide information about what high-performing teachers do that make a difference in student learning (Goe et al., 2008). While value-added models are useful for identifying trends that can be used to make system improvements, multiple reports have recommended against using them for individual personnel decisions (American Statistical Association, 2014; Darling-Hammond et al., 2012; Polikoff & Porter, 2014). Specifically, the American Statistical Association cautioned against using value-added measures because, among other reasons, they are based on only one measure (standardized test scores), and the models may not capture all the factors that contribute to the effect a teacher may have on student outcomes.

Continuum of Research and Impact on Student Outcomes

Teacher evaluation is an established practice directed by state and federal law. However, we do not know the exact or full impact of teacher evaluation practices on student outcomes (e.g., Stecher et al., 2018). Some research has attempted to connect the practice of teacher evaluation with changes in student outcomes. In three notable large-scale studies, teacher evaluation was the practice of assessing teachers using a valid and reliable tool and providing feedback. These studies produced mixed results on student or school-level outcomes.

A quasi-experimental study of mid-career elementary and middle school teachers in the Cincinnati Public Schools Teacher Evaluation System (TES) examined teachers before, during, and after a year-long evaluation. The 105 teachers involved in the study taught fourth- through eighth-grade math. Evaluations conducted using multiple, structured classroom observations by trained peers and administrators were conducted between the 2003–2004 and 2009–2010 school years. The observations were conducted using a rubric based on the Danielson Framework for Teaching (Danielson, 1996, 2007). Student achievement was compared before, during, and after the teacher’s evaluation year. Teachers were more effective in advancing student achievement in math the year they were evaluated and the years afterward. Specifically, a student who was taught by a teacher who had been through TES scored 11% of a standard deviation (4.5 percentile points for a median student) higher in math compared with a student taught by the same teacher before the evaluation. The study did not identify what about teacher practice accounted for the difference in student achievement. This study supports the use of teacher evaluation to encourage continued growth in mid-career teachers’ performance and a connection to student achievement. Also, performance improvement was greatest for teachers who were weakest at the start of the evaluation (those who received low initial scores or who were ineffective in improving student test scores the year prior to evaluation). Teacher evaluation was a way for teachers who needed the most support, those that scored the lowest on initial evaluations and likely received the most critical feedback, to receive development (Taylor & Tyler, 2012a, 2012b).

In another large-scale study, the Chicago Public Schools’ Excellence in Teaching Project was a teacher evaluation program focused on increasing student learning through principal-teacher conversation. A pilot study included 44 elementary schools in 2008–2009 and an additional 48 schools in 2009–2010. Principals in the first cohort received a total of 50 hours of support across the school year, with training and development in the Danielson framework, best practices in teacher observation and evidence collection, coaching, and implementation. Principals who joined the project in the second year received significantly less support. This difference in support across the two cohorts may have impacted the results. Short-term positive effects on reading performance were found in high-achieving, low-poverty schools, and schools that were in the first cohort performed higher in reading and math than schools in the second cohort. This study suggests that teacher evaluation systems produce different effects at different schools, and that teacher observation can have an impact on school performance (Steinberg & Sartain, 2015).

The Gates Foundation has been extensively involved in teacher evaluation as it relates to student achievement outcomes (Barnum, 2018). In 2018, the Gates Foundation released a cumulative study that reflected its work in three districts (Stecher et al., 2018). The Intensive Partnerships for Effective Teaching initiative was focused on increasing student performance by improving teaching effectiveness. The project started in 2009–2010 in three school districts (Hillsborough County Public Schools in Florida, Memphis City Schools, and Pittsburgh Public Schools) and four charter management organizations. Across multiple years, teaching effectiveness measures collected using a rubric were used to improve staffing, identify areas of development, strengthen professional development, and structure teacher advancement and compensation. The researchers hypothesized that with a strong teaching effectiveness evaluation system in place, teaching quality would increase and lead to greater academic outcomes for students in low-income, minority schools. The final report (Stecher et al., 2018) noted that school sites had implemented the teacher effectiveness practices (evaluation using an observation rubric and subsequent decision-making), but the advancement in student achievement or graduation rates was not realized, particularly for low-income minority students. At the end of the project (2014–2015), student achievement, access to effective teaching, and graduation rates in sites that had participated in the initiative did not differ from those in sites that had not participated. The reason why there was no difference was unclear, although the researchers hypothesized that a focus exclusively on teacher effectiveness may not be enough to improve student outcomes and that other factors may need to be addressed to produce dramatic improvements in student outcomes.


Teacher evaluation is a best practice that can be used to inform decisions when implemented with transparent processes and strong measures. The process of teacher evaluation produces some change in teacher practice that can impact student outcomes during and after the evaluation period (Taylor & Tyler, 2012a, 2012b). However, teacher evaluation may have different impacts on schools with varying demographics and baseline achievement levels (Steinberg & Sartain, 2015). Finally, formative evaluation can provide clear, objective feedback and a structure for collecting and using data to show teachers how they are changing performance, and, in that way, serve as professional development to support low-performing teachers (Taylor & Tyler, 2012a, 2012b).

Cost-Benefit of Teacher Evaluation.

The cost-benefit of teacher evaluation encompasses many considerations including student learning outcomes, information gathered, and the ability to make decisions with the information (Peterson, 2000). It is likely that the benefits and costs will be specific to a school or district.  

For example, one study of the cost to start a teacher evaluation system across three districts found that it ranged from $8 to $115 per student, which equated to between 0.4% and 0.5% of total district spending, and between 1% and 1.3% of teacher compensation (Chambers, Brodziak de los Reyes, & O’Neil, 2013). The researchers concluded that their figures did not reflect all potential costs and that the cost of actual implementation might be higher.


Currently, teacher evaluation is understood as a form of professional development. The goal is to establish a rigorous and fair system that can be used to make decisions related to hiring, firing, and promotion, and that can improve teacher practice and student learning (Bill and Melinda Gates Foundation, 2012). This is no easy task as evidenced by the mixed results for large-scale studies that have examined the impact of teacher evaluation on student achievement (Stecher et al., 2018; Steinberg & Sartain, 2015; Taylor & Tyler, 2012a, 2012b).

As a practice, teacher evaluation is an established way to gather information about how teachers are performing in the classroom and is already incorporated into the expectations and day-to-day work of school administrators. With current measures (e.g., the Danielson Framework for Teaching), it is possible to collect reliable and valid data related to teacher performance and use that data to design professional development targeted at teacher needs. With rigorous measures and quality implementation, teacher evaluation, especially formative evaluation, is a tool that, ideally, can be used to improve teacher quality over time.




American Statistical Association. (2014, April 8). ASA statement on using value-added models for educational assessment. Retrieved from https://www.scribd.com/document/217916454/ASA-VAM-Statement-1 

Barnum, M. (2018, June 21). The Gates Foundation bet big on teacher evaluation. The report it commissioned explains how those efforts fell short. Chalkbeat.Retrieved from https://www.chalkbeat.org/posts/us/2018/06/21/the-gates-foundation-bet-big-on-teacher-evaluation-the-report-it-commissioned-explains-how-those-efforts-fell-short/

Bill and Melinda Gates Foundation. (2010). Learning about teaching: Initial findings from the measures of effective teaching project.Retrieved from https://docs.gatesfoundation.org/documents/preliminary-findings-research-paper.pdf

Bill and Melinda Gates Foundation. (2012). Gathering feedback on teaching: Combining high-quality observation with student surveys and achievement gains.Retrieved from http://k12education.gatesfoundation.org/resource/gathering-feedback-on-teaching-combining-high-quality-observations-with-student-surveys-and-achievement-gains-2/

Blanton, L. P., Sindelar, P. T., Correa, V., Harman, M., McDonnell, J., & Kuhel, K. (2003). Conceptions of beginning teacher quality: Models for conducting research(COPSSE Doc. No. RS-6). Gainesville, FL: Center on Personnel Studies in Special Education (COPSSE), University of Florida. Retrieved from http://copsse.education.ufl.edu//docs/RS-6/1/RS-6.pdf

Chambers, J., Brodziak de los Reyes, I., & O’Neil, C. (2013). How much are districts spending to implement teacher evaluation systems? Case studies of Hillsborough County Public Schools, Memphis City Schools, and Pittsburgh Public Schools. Santa Monica, CA: RAND Corporation. Retrieved from: https://www.rand.org/content/dam/rand/pubs/working_papers/WR900/WR989/RAND_WR989.pdf

Danielson, C. (1996, 2007). Enhancing professional practice: A framework for teaching (1st and 2nd eds).Alexandria, VA: ASCD.

Danielson, C. (2010). Evaluations that help teachers learn. Educational Leadership, 68(4), 35–39. Retrieved from http://www.ascd.org/publications/educational-leadership/dec10/vol68/num04/Evaluations-That-Help-Teachers-Learn.aspx

Darling-Hammond, L., Amrein-Beardsley, A., Haertel, E., & Rothstein, J. (2012). Evaluating teacher evaluation: Popular modes of evaluating teachers are fraught with inaccuracies and inconsistencies, but the field has identified better approaches. Phi Delta Kappan, 93(6), 8–15.Retrieved from https://www.edweek.org/ew/articles/2012/03/01/kappan_hammond.html

Goe, L., Bell, C., & Little, O. (2008). Approaches to evaluating teacher effectiveness: A research synthesis. Washington, DC: National Comprehensive Center for Teacher Quality. Retrieved from https://eric.ed.gov/?id=ED521228

Goe, L., Holdheide, L., & Miller, T. (2011). A practical guide to designing comprehensive teacher evaluation systems: A tool to assist in the development of teacher evaluation systems.Washington, DC: National Comprehensive Center for Teacher Quality. Retrieved from https://files.eric.ed.gov/fulltext/ED520828.pdf

Hunter, M. (1984). Knowing, teaching, and supervising. In P. Hosford (Ed.), Using what we know about teaching.(pp. 169–192). Alexandria, VA: ASCD.

Johnson, E. G., & Zwick, R. (1990). Focusing the new design: The NAEP 1988 technical report. Journal of Educational and Behavioral Studies, 17,95–109.

Kane, T. J., & Staigler, D. O. (2012). Gathering feedback for teaching: Combining high-quality observations with student surveys and achievement gains.Seattle, WA: Bill and Melinda Gates Foundation.

Kane, T. J., Taylor, E. S., Tyler, J. H., & Wooten, A. L. (2011). Identifying effective classroom practices using achievement data. Journal of Human Resources, 46(3), 587–613.

Kettler, R. J., & Reddy, L. A. (2017). Using observational assessment to inform professional development decisions: Alternative scoring for the Danielson Framework for Teaching. Assessment for Effective Intervention,1–12.

Lash, A., Tran, L., & Huang, M. (2016). Examining the validity of ratings from a classroom observation instrument for use in a district’s teacher evaluation system(REL 2016-135). Washington, DC: U.S. Department of Education, Institute of Education Sciences, National Center for Education Evaluation and Regional Assistance, Regional Educational Laboratory West.

Marzano, R. J. (2012). Teacher Evaluation: What’s fair? What’s effective? The two purposes of teacher evaluation. Educational Leadership, 70(3), 14–19. Alexandria, VA: ASCD. Retrieved from http://www.ascd.org/publications/educational-leadership/nov12/vol70/num03/The-Two-Purposes-of-Teacher-Evaluation.aspx

Marzano, R., Frontier, T., & Livingston, D. (2011). Effective supervision: Supporting the art and science of teaching. Alexandria, VA: ASCD.

Nye, B., Konstantopoulos, S., & Hedges, L. V. (2004). How large are teacher effects? Educational Evaluation and Policy Analysis, 26(3),237–257.

Peterson, K. D. (2000). Teacher evaluation: A comprehensive guide to new directions and practices(2nd ed.).Thousand Oaks, CA: Corwin Press.

Polikoff, M. S, & Porter, A. C. (2014). Instructional alignment as a measure of teacher quality. Education Evaluation and Policy Analysis, 64(3), 212–225. Retrieved from http://www.aera.net/Newsroom/Recent-AERA-Research/Instructional-Alignment-as-a-Measure-of-Teaching-Quality 

Sanders, W. L., Wright, S. P., & Horn, S. P. (1997). Teacher and classroom context effects on student achievement: Implications for teacher evaluation. Journal of Personnel Evaluation and Education, 11(1), 57–67.

Santiago, P., & Benavides, F. (2009). Teacher evaluation: A conceptual framework and examples of country practices.Organisation for Economic Cooperation and Development (OECD). Retrieved from http://www.oecd.org/education/school/44568106.pdf

Sawchuk, S. (2015, September 3). Teacher Evaluation: An issue overview. Education Week. Retrieved from www.edweek.org/ew/section/multimedia/teacher-performance-evaluation-issue-overview.html

Sayavedra, M. (2014). Teacher evaluation. ORTESOL Journal, 31, 1–9.

Stecher, B. M., Holtzman, D. J., Garet, M. S., Hamilton, L. S., Engberg, J., Steiner, E. D.,…Chambers, J. (2018).Improving teaching effectiveness: Final report: The intensive partnerships for effective teaching through 2015–2016.Santa Monica, CA: RAND Corporation. Retrieved from https://www.rand.org/pubs/research_reports/RR2242.html

Steinberg, M. P., & Sartain, L. (2015). Does teacher evaluation improve school performance? Experimental evidence from Chicago’s Excellence in Teaching project. Education Finance and Policy, 10(4), 535–572.

Taylor, E. S., & Tyler, J. H. (2012a). Can teacher evaluation improve teaching? Evidence of systematic growth in the effectiveness of midcareer teachers. Education Next, 12(4). Retrieved from http://educationnext.org/can-teacher-evaluation-improve-teaching/

Taylor, E. S., & Tyler, J. H. (2012b). The effect of evaluation on teacher performance. American Economic Review, 102(7), 3628–3651.

Toch, T., & Rothman, R. (2008). Rush to judgment: Teacher evaluation in public education.Washington, DC: Education Sector.Retrieved from https://eric.ed.gov/?id=ED502120 

Weisberg, D., Sexton, S., Mulhern, J., & Keeling, D. (2009). The widget effect: Our national failure to acknowledge and act on differences in teacher effectiveness. New York, NY: The New Teacher Project. Retrieved from https://tntp.org/publications/view/the-widget-effect-failure-to-act-on-differences-in-teacher-effectiveness

Wenglinsky, H. (2002). The link between teacher classroom practices and student academic performance. Education Policy Analysis Archives, 10(12).

Wise, A. E., Darling-Hammond, L., Tyson-Bernstein, H, & McLaughlin, M. W. (1984). Teacher evaluation: A study of effective practices. Santa Monica, CA: RAND Corporation. Retrieved from https://www.rand.org/pubs/reports/R3139.html




Overview of Teacher Evaluation. Oakland, CA: The Wing Institute

Teachers contribute to student achievement. As a practice, teacher evaluation has developed over time. Today, the focus of teacher evaluation is to determine the impact of teaching on student outcomes and for use as professional development. Research on teacher evaluation has produced mixed results. This overview provides information about teacher evaluation as it relates to collecting information about teacher practice and using it to improve student outcomes. The history of teacher evaluation and current research findings and implications are included.

Cleaver, S., Detrich, R. & States, J. (2018). Overview of Teacher Evaluation. Oakland, CA: The Wing Institute. https://www.winginstitute.org/assessment-summative.

ASA statement on using value-added models for educational assessment

Value-Added Models (VAMs) has been embraced by many states and school districts as part of educational accountability systems. Value-Added Assessment (VAA) Models attempt to estimate effects of individual teachers or schools on student achievement while accounting for differences in student background. This paper provides a summary of the American Statistical Associations analysis of the efficacy of value-added modeling in education.

American Statistical Association. (2014). ASA statement on using value-added models for educational assessment. Alexandria, VA.

The Impact of High-Stakes Tests on Student Academic Performance

The purpose of this study is to assess whether academic achievement in fact increases after the introduction of high-stakes tests. The first objective of this study is to assess whether academic achievement has improved since the introduction of high-stakes testing policies in the 27 states with the highest stakes written into their grade 1-8 testing policies.

Amrein-Beardsley, A., & Berliner, D. C. (2002). The Impact of High-Stakes Tests on Student Academic Performance.

Do Principals Know Good Teaching When They See It?

This article examines the effectiveness and related issues of current methods of principal evaluation of teachers.

Burns M. (2011). Do Principals Know Good Teaching When They See It?. Educational policy, 19(1), 155-180.

Effective Teaching: What Is It and How Is It Measured?

Research supports significant difference in the performance of teachers in the top quartile versus the bottom. Teachers working with the most challenging students are often not afforded the status that teachers working with gifted  or advance placement. Students in the bottom quartile are often given the teachers who are frequently given less effective teachers along with new teachers whom research finds are less effective than experienced peers. Research finds current evaluation systems are unable to effectively assess the ability of teachers. This results in teacher not receiving feedback to enable them to improve. These measures infrequently inform teacher assignments, professional development, or career advancement. Teachers are left on their own to self-determine their own strengths and weaknesses. This paper examines how to measure teacher performance and the practices necessary for increasing teacher trust in systems designed to effectively measure performance.

Cantrell, S., & Scantlebury, J. (2011). Effective Teaching: What Is It and How Is It Measured?. Effective Teaching as a Civil Right, 28.

Overview: Formal Teacher Evaluation

The purpose of this overview is to provide information about the role of formal teacher evaluation, the research that examines the practice, and its impact on student outcomes.

Cleaver, S., Detrich, R. & States, J. (2018). Overview of Teacher Formal Evaluation. Oakland, CA: The Wing Institute.https://www.winginstitute.org/teacher-evaluation-formal.

Performance Feedback Overview

This overview examines the current understanding of research on performance feedback as a way to improve teacher performance and student outcomes. 

Cleaver, S., Detrich, R. & States, J. (2019). Overview of Performance Feedback. Oakland, CA: The Wing Institute. https://www.winginstitute.org/teacher-evaluation-feedback.

Selecting growth measures for school and teacher evaluations: Should proportionality matter?

In this paper we take up the question of model choice and examine three competing approaches. The first approach, (SGPs) framework, eschews all controls for student covariates and schooling environments. The second approach, value-added models (VAMs), controls for student background characteristics and under some conditions can be used to identify the causal effects of schools and teachers. The third approach, also VAM-based, fully levels the playing field so that the correlation between school- and teacher-level growth measures and student demographics is essentially zero. We argue that the third approach is the most desirable for use in educational evaluation systems.

Ehlert, M., Koedel, C., Parsons, E., & Podgursky, M. (2013). Selecting growth measures for school and teacher evaluations: Should proportionality matter?. National Center for Analysis of Longitudinal Data in Education Research, 21.

Effective Instructional Time Use for School Leaders: Longitudinal Evidence from Observations of Principals

This study examines principals’ time spent on instructional functions. The results show that the traditional walk-through has little impact, but principals provide coaching, evaluation, and focus on educational programs can make a difference.

Grissom, J. A., Loeb, S., & Master, B. (2013). Effective Instructional Time Use for School Leaders: Longitudinal Evidence from Observations of Principals. Educational Researcher, 42(8), 433-444.

Reliability and Validity of Inferences about Teachers Based on Student Scores

Policymakers and school administrators have embraced value-added models of teacher effectiveness as tools for educational improvement. Teacher value-added estimates may be viewed as complicated scores. This Paper examines the use of value-added modeling as a tool to identify effective teachers from ineffective instructors.

Haertel, E. H. (2013). Reliability and Validity of Inferences about Teachers Based on Student Scores. William H. Angoff Memorial Lecture Series. Educational Testing Service.

Impact of performance feedback delivered via electronic mail on preschool teachers’ use of descriptive praise.

This paper examined the effects of a professional development intervention that included data-based performance feedback delivered via electronic mail (e-mail) on preschool teachers’ use of descriptive praise and whether increased use of descriptive praise was associated with changes in classroom-wide measures of child engagement and challenging behavior. 

Hemmeter, M. L., Snyder, P., Kinder, K., & Artman, K. (2011). Impact of performance feedback delivered via electronic mail on preschool teachers’ use of descriptive praise. Early Childhood Research Quarterly26(1), 96-109.

Learning from teacher observations: Challenges and opportunities posed by new teacher evaluation systems

This article discusses the current focus on using teacher observation instruments as part of new teacher evaluation systems being considered and implemented by states and districts. They argue that if these teacher observation instruments are to achieve the goal of supporting teachers in improving instructional practice, they must be subject-specific, involve content experts in the process of observation, and provide information that is both accurate and useful for teachers. They discuss the instruments themselves, raters and system design, and timing of and feedback from the observations.

Hill, H., & Grossman, P. (2013). Learning from teacher observations: Challenges and opportunities posed by new teacher evaluation systems. Harvard Educational Review, 83(2), 371-384.

Can Principals Identify Effective Teachers? Evidence on Subjective Performance Evaluation in Education

This paper examines how well principals can distinguish between more and less effective teachers. To put principal evaluations in context, we compare them with the traditional determinants of teacher compensation-education and experience-as well as value-added measures of teacher effectiveness.

Jacob, B. A., & Lefgren, L. (2008). Can principals identify effective teachers? Evidence on subjective performance evaluation in education. Journal of Labor Economics, 26(1), 101-136.

Have We Identified Effective Teachers? Validating Measures of Effective Teaching Using Random Assignment.

In this study the authors designed the Measures of Effective Teaching (MET) project to test replicable methods for identifying effective teachers. In past reports, the authors described three approaches to measuring different aspects of teaching: student surveys, classroom observations, and a teacher's track record of student achievement gains on state tests.

Kane, T. J., McCaffrey, D. F., Miller, T., & Staiger, D. O. (2013). Have We Identified Effective Teachers? Validating Measures of Effective Teaching Using Random Assignment. Research Paper. MET Project. Bill & Melinda Gates Foundation.

The impact of feedback frequency on learning and task performance: Challenging the “more is better” assumption.

This paper challenge the “more is better” assumption and propose that frequent feedback can overwhelm an individual’s cognitive resource capacity, thus reducing task effort and producing an inverted-U relationship with learning and performance over time. 

Lam, C. F., DeRue, D. S., Karam, E. P., & Hollenbeck, J. R. (2011). The impact of feedback frequency on learning and task performance: Challenging the “more is better” assumption. Organizational Behavior and Human Decision Processes116(2), 217-228.

Alternative student growth measures for teacher evaluation: Implementation experiences of early-adopting districts

This study examines implementation of alternative student growth measures in a sample of eight school districts that were early adopters of the measures. It builds on an earlier Region­ al Educational Laboratory Mid-Atlantic report that described the two types of alterna­tive student growth measures—alternative assessment–based value-added models and student learning objectives—in the early-adopting districts.

McCullough, M., English, B., Angus, M. H., & Gill, B. (2015). Alternative student growth measures for teacher evaluation: Implementation experiences of early-adopting districts (No. 8a9dfcb1bc6143608448114ea9b69d06). Mathematica Policy Research.

Providing Teachers with Performance Feedback on Praise to Reduce Student Problem Behavior

This study examined the effect of a visual performance feedback intervention (i.e., a simple, computer-generated line graph) on teachers' rate of praise for students' academic and behavioral performance and subsequent changes in students' rates of problem behavior.

Mesa, J., Lewis-Palmer, T., & Reinke, W. (2005). Providing Teachers with Performance Feedback on Praise to Reduce Student Problem Behavior. Beyond Behavior15(1), 3-7.

Teacher job satisfaction and motivation to leave the teaching profession: Relations with school context, feeling of belonging, and emotional exhaustion.

This study examines the relations between school context variables and teachers’ feeling of belonging, emotional exhaustion, job satisfaction, and motivation to leave the teaching profession. Six aspects of the school context were measured: value consonance, supervisory support, relations with colleagues, relations with parents, time pressure, and discipline problems.

Skaalvik, E. M., & Skaalvik, S. (2011). Teacher job satisfaction and motivation to leave the teaching profession: Relations with school context, feeling of belonging, and emotional exhaustion. Teaching and teacher education27(6), 1029-1038.

Bellwether Education Partners
Bellwether Education Partners is a nonprofit dedicated to helping education organizations in the public, private, and nonprofit sectors.
Back to Top