Evaluating Teacher Performance and Teaching Effectiveness: Conceptual and Methodological Considerations

Educational theory inextricably links teachers to student learning, as the key factor mediating educational policies and student experiences in the classroom, with research consistently showing a relationship between a range of teacher and classroom variables that exert an important influence on student outcomes. This chapter highlights the key conceptual and methodological issues involved in the evaluation of teaching and teachers, with particular focus on the distinction between the concepts of performance and effectiveness. It considers the implications of assumptions and choices around why the evaluation is conducted, what is evaluated, and how it is evaluated, presenting a range of methods to collect data on performance and effectiveness. Additionally, we analyze issues related to the reliability and validity of resulting inferences about teacher performance or effectiveness and the implications for policy and practice. Finally, the distinctions and commonalities in evaluating performance and effectiveness in practice are exemplified through the presentation of different models of teacher evaluation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic €32.70 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (France)

eBook EUR 117.69 Price includes VAT (France)

Softcover Book EUR 147.69 Price includes VAT (France)

Hardcover Book EUR 147.69 Price includes VAT (France)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Teacher Evaluation with Multiple Indicators: Conceptual and Methodological Considerations Regarding Validity

An integral perspective on teacher evaluation: a review of empirical studies

Article 22 May 2020

Teacher Evaluations and Pupil Achievement Gains: Evidence from Classroom Observations

Article 15 July 2016

Notes

Subject knowledge; commitment to student learning; monitoring and managing student learning; reflecting around and learning about their own practice; and membership in learning communities.

Learner development; learning differences; learning environments; content knowledge; application of content; assessment, planning for instruction; instructional strategies; professional learning and ethical practice; and leadership and collaboration.

In 2020, guidelines for remote teaching were issued for the FFT, which focus on components that are thought to be most relevant for online learning and remote instruction (The Danielson Group, 2020).

The area of emotional support encompasses the dimensions of classroom climate, teacher sensitivity, and regard for student perspectives, while classroom organization includes behavior management, productivity, and instructional learning format. Finally, instructional support is operationalized into concept development, quality of feedback, and language modeling.

In these models, teachers who have been identified for their excellence in teaching and mentoring are chosen as coaches to provide support to new teachers as well as experienced colleagues who may require help. Coaches are also responsible for the teachers’ formal personnel evaluations. Typically, coaches do not work in a single school, but are matched with teachers from different schools according to grade level or subject area.

AYPs were defined as a specific amount of yearly progress in standardized test scores a school, district, or state was expected to make in a year.

Schools can adopt commercially available tests or develop their own, provided these are “rigorous, aligned to content standards, and appropriate for the teacher’s classes and students” (District of Columbia Public Schools, 2011, p. 2; Gitomer & Joyce, 2015).

References

AERA, APA, NCME. (2014). Standards for educational and psychological testing. American Educational Research Association. Google Scholar
Amrein-Beardsley, A. (2008). Methodological concerns about the education value-added assessment system. Educational Researcher, 37(2), 65–75. Google Scholar
Anderson, J. (2013, March 30). Curious grade for teachers: Nearly all pass. New York Times. Google Scholar
Apple, M. W. (2007). Ideological success, educational failure? On the politics of no child left behind. Journal of Teacher Education,58(2), 108–116. ArticleGoogle Scholar
Australian Institute for Teaching and School Leadership. (2018). Australian Professional Standards for Teachers. AITSL. Google Scholar
Baker, E. L., Barton, P. E., Darling-Hammond, L., Haertel, E., Ladd, H. F., Linn, R. L., Shepard, L. A., et al. (2010). Problems with the use of student test scores to evaluate teachers. EPI Briefing Paper (278). Google Scholar
Ball, D. L., & Rowan, B. (2004). Introduction: Measuring instruction. The Elementary School Journal,5(1), 3–10. ArticleGoogle Scholar
Ball, D. L., Thames, M. H., & Phelps, G. (2008). Content knowledge for teaching: What makes it special? Journal of Teacher Education,59(5), 389–407. ArticleGoogle Scholar
Bell, C. A., Dobbelaer, M. J., Klette, K., & Visscher, A. (2019). Qualities of classroom observation systems. School Effectiveness and School Improvement,30(1), 3–29. ArticleGoogle Scholar
Bell, C. A., Klieme, E., & Praetorius, A.-K. (2020). Conceptualising teaching quality into six domains for the Study. In OECD, global teaching insights technical report (pp. 1–24). OECD Publishing. Google Scholar
Betebenner, D. W. (2009). Norm- and criterion-referenced student growth. Educational Measurement: Issues and Practice,28(4), 42–51. ArticleGoogle Scholar
Betebenner, D. W. (2011). A technical overview of the student growth percentile methodology: Student growth percentiles and percentile growth projections/trajectories. The National Center for the Improvement of Educational Assessment. Google Scholar
Bill & Melinda Gates Foundation. (2010). Learning about teaching: Initial findings from the measures of effective teaching project. Bill & Melinda Gates Foundation. Google Scholar
Bill & Melinda Gates Foundation. (2012a). Gathering feedback for teaching. Research Paper. Bill & Melinda Gates Foundation. Google Scholar
Bill & Melinda Gates Foundation. (2012b). Asking students about teaching. Policy and practice brief.Google Scholar
Bransford, J., Darling-Hammond, L., & LePage, P. (2005). Introduction. In L. Darling-Hammond & J. Bransford (Eds.), Preparing teachers for a changing world: What teachers should learn and be able to do (pp. 1–39). Jossey-Bass. Google Scholar
Brennan, R. L. (2001). Some problems, pitfalls, and paradoxes in educational measurement. Educational Measurement: Issues and Practice,20(4), 6–18. ArticleGoogle Scholar
Brookhart, S. M. (2009). The many meanings of multiple measures. Education Leadership,67(3), 6–12. Google Scholar
Brophy, J., & Goode, T. L. (1986). Teacher behavior and student achievement. In M. C. Wittrock (Ed.), Handbook of research on teaching (3rd ed., pp. 328–375). MacMillan. Google Scholar
California Commission on Teacher Credentialing. (2009). California standards for the teaching profession (CSTP).Google Scholar
CASEL. (2020). CASEL’S SEL framework: What are the core competence areas and where are they promoted? CASEL. Google Scholar
Close, K., Amrein-Beardsley, A., & Collins, C. (2020). Putting teacher evaluation systems on the map: An overview of state’s teacher evaluation systems post–every student succeeds act. Education Policy Analysis Archives, 28(58), 1–26. Google Scholar
Cohen, D. K. (1995). Rewarding teachers for student performance. In S. Fuhrman, & J. O’Day (Eds.), Rewards and reforms: Creating educational incentives that work. Jossey-Bass. Google Scholar
Cole, M. S., Bedeian, A. G., Hirschfeld, R. R., & Vogel, B. (2011). Dispersion-composition models in multilevel research: A data-analytic framework. Organizational Research Methods,14(4), 718–734. ArticleGoogle Scholar
Connecticut State Department of Education. (2010). Common core of teaching: Foundational skills. CSDE. Google Scholar
Corcoran, S. P. (2010). Can teachers be evaluated by their students’ test scores? Should they be? The use of value-added measures of teacher effectiveness in policy and practice. Annenberg Institute for School Reform. Google Scholar
Council of Chief State School Officers. (2013). InTASC model core teaching standards and learning progressions for teachers 1.0. CCSO. Google Scholar
Danielson, C. (2013). The framework for teaching evaluation instrument, 2013 edition. Danielson group. Google Scholar
Darling-Hammond, L. (2000). Teacher quality and student achievement: A review of state policy evidence. Education Policy Analysis Archives,8(1), 1–44. ArticleGoogle Scholar
Darling-Hammond, L. (2006). Constructing 21st-century teacher education. Journal of Teacher Education,57(3), 300–314. ArticleGoogle Scholar
Darling-Hammond, L. (2008). Reshaping teaching policy, preparation, and practice: Influences of the national board for professional teaching standards. In R. Stake, S. Kushner, L. Ingvarson, & J. Hattie (Eds.), Assessing teachers for professional certification: The first decade of the national board for professional teaching standards (Advances in Program Evaluation) (Vol. 11, pp. 25–53). Emerald Group Publishing Limited. Google Scholar
Darling-Hammond, L. (2015). Can value added add value to teacher evaluation? Educational Researcher,44(2), 132–137. ArticleGoogle Scholar
Darling-Hammond, L., Amrein-Beardsley, A., Haertel, E., & Rothstein, J. (2012). Evaluating teacher evaluation. Phi Delta Kappan, 93(6), 8–15. Google Scholar
De Corte, W., Lievens, F., & Sackett, P. R. (2007). Combining predictors to achieve optimal tradeoffs between selection quality and adverse impact. Journal of Applied Psychology,92, 1380–1393. ArticleGoogle Scholar
De Pascale, C. (2012). Managing multiple measures. Principal,91(5), 6–10. Google Scholar
Department for Education, England. (2013). Teachers’ standards: Guidance for school leaders, school staff and governing bodies. DFE. Google Scholar
District of Columbia Public Schools. (2011). Teacher-assessed student achievement data (TAS) guidance. DCPS. Google Scholar
Doss, C. J. (2019). Student growth percentiles 101: Using relative ranks in student test scores to help measure teaching effectiveness. RAND Corporation. Google Scholar
Duncan, A. (2012, agosto 22). Change is hard. Retrieved from US Department of Education: https://www.ed.gov/news/speeches/change-hard
Dynarski, M. (2016). Teacher observations have been a waste of time and money. Brookings Institution. Google Scholar
Ehlert, M., Koedel, C., Parsons, E., & Podgursky, M. J. (2014). The sensitivity of value-added estimates to specification adjustments: Evidence from school- and teacher-level models in missouri. Statistics and Public Policy,1(1), 19–27. ArticleGoogle Scholar
Elmore, R. F. (1996). Getting to scale with good educational practice. Harvard Educational Review,66(1), 1–26. ArticleGoogle Scholar
Every Student Succeeds Act, Title I Section 1111(2)(B)(III)(vi) (2015). Google Scholar
Ferguson, R. F. (2012). Can student surveys measure teaching quality? Phi Delta Kappan,94(3), 24–28. ArticleGoogle Scholar
Florida Department of Education. (2018). 2017–18 District educator evaluation ratings. Retrieved from Archived Statewide District Evaluation Results: http://www.fldoe.org/teaching/performance-evaluation/archive.stml
Gitomer, D. H., & Joyce, J. (2015). A review of the DC IMPACT teacher evaluation system. National Research Council. Google Scholar
Gitomer, D. H., & Zisk, R. C. (2015). Knowing what teachers know. Review of Research in Education,39, 1–53. ArticleGoogle Scholar
Gitomer, D. H., Martinez, J. F., Battey, D., & Hyland, N. E. (2019). Assessing the assessment: Evidence of reliability and validity in the edTPA. American Educational Research Journal,58(1), 3–31. ArticleGoogle Scholar
Glazerman, S., Goldhaber, D., Loeb, S., Raudenbush, S., Staiger, D. O., & Whitehurst, G. J. (2011). Passing muster: Evaluating evaluation systems. Brown Center on Education Policy at Brookings. Google Scholar
Glazerman, S., Loeb, S., Goldhaber, D., Staiger, D., Raudenbush, S., & Whitehurst, G. (2010). Evaluating teachers: The important role of value-added. Brookings Institution. Google Scholar
Goe, L. (2007). The link between teacher quality and student outcomes: A research synthesis. National Comprehensive Center for Teacher Quality. Google Scholar
Goe, L., & Croft, A. (2009). Methods of evaluating teacher effectiveness. National Comprehensive Center for Teacher Quality. Google Scholar
Goe, L., Bell, C., & Little, O. (2008). Approaches to evaluating teacher effectiveness: A research synthesis. National Comprehensive Center for Teacher Quality. Google Scholar
Goldhaber, D., & Anthony, E. (2007). Can teacher quality be effectively assessed? National board certification as a signal of effective teaching. The Review of Economics and Statistics,89(1), 134–150. ArticleGoogle Scholar
Goldhaber, D., Walch, J., & Gabele, B. (2014). Does the model matter? Exploring the relationship between different student achievement-based teacher assessments. Statistics and Public Policy,1(1), 28–39. ArticleGoogle Scholar
Goldstein, J., & Noguera, P. A. (2006). A thoughtful approach to teacher evaluation. Educational Leadership,63(6), 31–37. Google Scholar
Good, T. L. (2014). What do we know about how teachers influence student performance on standardized tests: And why do we know so little about other student outcomes? Teachers College Record,116, 1–41. Google Scholar
Goodman, S. F., & Turner, L. J. (2013). The design of teacher incentive pay and educational outcomes: Evidence from the New York City bonus program. Journal of Labor Economics,31(2), 409–420. ArticleGoogle Scholar
Grossman, P., Loeb, S., Cohen, J., & Wyckoff, J. (2013). Measure for measure: The relationship between measures of instructional practice in middle school English language arts and teachers’ value-added scores. American Journal of Education,119, 445–470. ArticleGoogle Scholar
Guarino, C. M., Reckase, M. D., & Wooldridge, J. M. (2012). Can value-added measures of teacher performance be trusted? Education Policy Center at Michigan State University. Google Scholar
Guarino, C. M., Reckase, M. D., Stacy, B., & Wooldridge, J. M. (2015). A comparison of student growth percentile and value-added models of teacher performance. Statistics and Public Policy,2(1), 1–11. ArticleGoogle Scholar
Guerriero, S. (2018). Teachers’ pedagogical knowledge and the teaching profession: Background report and project objectives. OECD Publishing. Google Scholar
Hallinger, P., Heck, R. H., & Murphy, J. (2014). Teacher evaluation and school improvement: An analysis of the evidence. Educational Assessment, Evaluation and Accountability,26(1), 5–28. ArticleGoogle Scholar
Hamilton, L. (2005). Lessons from performance measurement in education. In R. Klitgaard & P. C. Light (Eds.), High-performance government (pp. 381–405). RAND Corporation. Google Scholar
Hamre, B. K., & Pianta, R. C. (2007). Learning opportunities in preschool and early elementary classrooms. In R. C. Pianta, M. J. Cox, & K. L. Snow (Eds.), School readiness & the transition to kindergarten in the era of accountability (pp. 49–84). Paul H. Brookes Publishing Co. Google Scholar
Hanushek, E. A., & Rivkin, S. G. (2010). Using value-added measures of teacher quality. CALDER - Urban Institute. Google Scholar
Hanushek, E. A., Kain, J. F., & Rivkin, S. G. (1999). Do higher salaries buy better teachers? NBER Working Paper No. 7082. Google Scholar
Harris, D. N., & Sass, T. R. (2009). What makes for a good teacher and who can tell? CALDER working paper. Google Scholar
Harris, D. N., Ingle, W. K., & Rutledge, S. A. (2014). How teacher evaluation methods matter for accountability: a comparative analysis of teacher effectiveness ratings by principals and teacher value-added measures. American Educational Research Journal,51(1), 73–112. ArticleGoogle Scholar
Hattie, J. (2009). Visible learning: A synthesis of over 800 meta-analyses relating to achievement. Routledge. Google Scholar
Hill, H. C., Blunk, M., Charalambous, C., Lewis, J., Phelps, G. C., Sleep, L., & Ball, D. L. (2008). Mathematical knowledge for teaching and the mathematical quality of instruction: An exploratory study. Cognition and Instruction,26, 430–511. ArticleGoogle Scholar
Hill, H. C., Charalambous, C. Y., & Kraft, M. A. (2012). When rater reliability is not enough: Teacher observation systems and a case for the generalizability study. Educational Researcher,41(2), 56–64. ArticleGoogle Scholar
Jackson, C. K. (2016). What do test scores miss? The importance of teacher effects on non-test score outcomes. NBER. Google Scholar
Johnson, S. M., & Fiarman, S. E. (2012). The potential of peer review. Educational Leadership,70(3), 20–25. Google Scholar
Kane, M. T. (2006). Validation. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 17–64). American Council on Education. Google Scholar
Kane, T. J., & Staiger, D. O. (2008). Estimating teacher impacts on student achievement: An experimental evaluation. NBER Working Paper 14607. Google Scholar
Kane, T. J., & Staiger, D. O. (2012). Gathering feedback for teaching. Bill & Melinda Gates Foundation. Retrieved from http://k12education.gatesfoundation.org/download/?Num=2680&filename=MET_Gathering_Feedback_Research_Paper1.pdf
Kane, T. J., Taylor, E. S., Tyler, J. H., & Wooten, A. L. (2011, Summer). Evaluating teacher effectiveness: Can classroom observations identify practices that raise achievement? Education Next (pp. 55–60). Google Scholar
Kane, T., McCaffrey, D., Miller, T., & Staiger, D. (2013). Have we identified effective teachers? Validating measures of effective teaching using random assignment. Research Paper. Bill & Melinda Gates Foundation. Google Scholar
Kennedy, M. M. (2008). Sorting out teacher quality. Phi Delta Kappan,90(1), 59–63. ArticleGoogle Scholar
Kloser, M. (2014). Identifying a core set of science teaching practices: A Delphi expert panel approach. Journal of Research in Science Teaching,51(9), 1185–1217. ArticleGoogle Scholar
Kloser, M., Edelman, A., Floyd, C., Martinez, J. F., Stecher, B., Srinivasan, J., & Lavin, E. (2021). Interrogating practice or show and tell? Using a digital portfolio to anchor a professional learning community of science teachers. Journal of Science Teacher Education,32(2), 210–241. ArticleGoogle Scholar
Kuhfeld, M. (2017). When students grade their teachers: A validity analysis of the tripod student survey. Educational Assessment,22(4), 253–274. ArticleGoogle Scholar
Kurtz, M. D. (2018). Value-added and student growth percentile models: What drives differences in estimated classroom effects? Statistics and Public Policy,5(1), 1–8. ArticleGoogle Scholar
Lachlan-Haché, L., Cushing, E., & Bivona, L. (2012a). Student learning objectives as measures of educator effectiveness: The basics. American Institutes for Research. Google Scholar
Lachlan-Haché, L., Cushing, E., & Bivona, L. (2012b). Student learning objectives: Benefits, challenges, and solutions. American Institutes for Research. Google Scholar
LAUSD. (2021a, April 3). History of EDST. Retrieved from https://achieve.lausd.net/Page/11782#spn-content
LAUSD. (2021b). Teaching and learning framework. LAUSD. Google Scholar
Linn, R. L. (2000). Assessments and accountability. Educational Researcher,29(2), 4–16. ArticleGoogle Scholar
Lockwood, J. R., McCaffrey, D. F., Hamilton, L. S., Stecher, B., Le, V.-N., & Martinez, J. F. (2007). The sensitivity of value-added teacher effect estimates to different mathematics achievement measures. Journal of Educational Measurement,44(1), 47–67. ArticleGoogle Scholar
Los Angeles Unified School District. (2019). 2018–2019 EDS final evaluation report for teachers and non-classroom teachers: Administrator handbook. LAUSD. Google Scholar
Maine Department of Education. (2012). Common core teaching standards. MDE. Google Scholar
Martínez Rizo, F. (2015). La evaluación del desempeño docente. Una propuesta para la educación básica en México. In G. Guevara Niebla, M. T. Melendez Irigoyen, F. E. Ramon Castaño, H. Sanchez Perez, & F. Tirado Segura (Eds.), La evaluación docente en México (pp. 64–95). INEE-Fondo de Cultura Económica. Google Scholar
Martinez, J. F. (2012). Consequences of omitting the classroom in multilevel models of schooling: An illustration using opportunity to learn and reading achievement. School Effectiveness and School Improvement,23(3), 305–326. ArticleGoogle Scholar
Martinez, J. F., & Fernandez, M. P. (2019). Evaluación docente con indicadores múltiples: Consideraciones conceptuales y metodológicas en torno a la validez. In J. Manzi, M. R. Garcia, & S. Taut (Eds.), Validez de Evaluaciones Educacionales en Chile y Latinoamérica (pp. 531–562). Ediciones UC. Google Scholar
Martinez, J. F., Borko, H., & Stecher, B. (2012). Measuring instructional practices in middle school science using classroom artifacts. Journal for Research in Science Teaching,49, 38–67. ArticleGoogle Scholar
Martinez, J. F., Schweig, J., & Goldschmidt, P. (2016a). Approaches for combining multiple measures of teacher performance: Reliability, validity, and implications for evaluation policy. Educational Evaluation and Policy Analysis,38(4), 738–756. ArticleGoogle Scholar
Martinez, J. F., Taut, S., & Schaaf, K. (2016b). Classroom observation for evaluating and improving teaching: An international perspective. Studies in Educational Evaluation,49, 15–29. ArticleGoogle Scholar
Marzano, R. J., & Toth, M. D. (2013). Teacher evaluation that makes a difference: A new model for teacher growth and student achievement. ASCD. Google Scholar
Matsumura, L. C., Garnier, H. E., Slater, S. C., & Boston, M. D. (2008). Toward measuring instructional interactions “at-scale.” Educational Assessment,13, 267–300. ArticleGoogle Scholar
Medley, D. M., & Coker, H. (1987). The accuracy of principals’ judgments of teacher performance. The Journal of Educational Research,80(4), 242–247. ArticleGoogle Scholar
Meyer, R. H. (1996). Value-added indicators of school performance. In E. A. Hanushek & D. W. Jorgenson (Eds.), Improving America’s schools: The role of incentives (pp. 197–223). The National Academies Press. Google Scholar
Meyer, R., Pier, L., Mader, J., Christian, M., Rice, A., Loeb, S., Hough, H., et al. (2019). Can we measure classroom supports for social-emotional learning? Applying value-added models to student surveys in the CORE districts. PACE. Google Scholar
Mihaly, K., McCaffrey, D., Staiger, D., & Lockwood, J. R. (2013). A composite estimator of effective teaching (MET Project). The RAND Corporation. Google Scholar
Millman, J. (1981). Student achievement as a measure of teacher competence. In Handbook of teacher evaluation (pp. 146–166). Sage. Google Scholar
Ministry of Education, Chile. (2008). Marco para la Buena Enseñanza. MINEDUC. Google Scholar
Muijs, D. (2006). Measuring teacher effectiveness: Some methodological reflections. Educational Research and Evaluation,12(1), 53–74. ArticleGoogle Scholar
Mullens, J. E. (1995). Classroom instructional processes: A review of existing measurement approaches and their applicability for the teacher followup survey. U.S. Department of Education. Google Scholar
Mullis, I. V., Martin, M. O., Foy, P., Kelly, D. L., & Fishbein, B. (2020). TIMSS 2019 international results in mathematics and science. TIMSS & PIRLS International Study Center. Google Scholar
National Board for Professional Teaching Standards. (2016). What teachers should know and be able to do (2nd ed.). NBPTS. Google Scholar
National Commission on Excellence in Education. (1983). A nation at risk: The imperative for educational reform. U.S. Department of Education. Google Scholar
National Council of Teachers in Mathematics. (2000). Principles and standards for school mathematics. NCTM. Google Scholar
National Research Council. (2010). Preparing teachers: Building evidence for sound policy. National Academy of Sciences. Google Scholar
NCTQ. (2015). State teacher policy yearbook: National summary. National Council on Teacher Quality (NCTQ). Google Scholar
NCTQ. (2017). Running in place: How New teacher evaluations fail to live up to promises. NCTQ. Google Scholar
NCTQ. (2019). State of the states 2019: Teacher & principal evaluation policy. National Council on Teacher Quality (NCTQ). Google Scholar
New York City Department of Education. (2019). Advance guide for educators 2019–2020. NYCDE. Google Scholar
OECD. (2013). Teachers for the 21st century: Using evaluation to improve teaching. OECD Publishing. Google Scholar
OECD. (2019). TALIS 2018 results: Teachers and school leaders as lifelong learners (Vol. 1). OECD Publishing. Google Scholar
OECD. (2020). Global teaching insights: A video study of teaching. OECD Publishing. Google Scholar
Paige, M. (2020). Moving forward while looking back: How can VAM lawsuits guide teacher evaluation policy in the age of ESSA? Education Policy Analysis Archives,28(64), 1–18. Google Scholar
Papay, J. (2012). Refocusing the debate: Assessing the purposes and tools of teacher evaluation. Harvard Educational Review,82(1), 123–141. ArticleGoogle Scholar
Pecheone, R. L., Shear, B., Whittaker, A., & Darling-Hammond, L. (2013). 2013 edTPA field test: Summary report. SCALE. Google Scholar
Peterson, K. D. (1995). Teacher evaluation: A comprehensive guide to new directions and practices. Corwin. Google Scholar
Pianta, R. C., & Hamre, B. K. (2009). Conceptualization, measurement, and improvement of classroom processes: Standardized observation can leverage capacity. Educational Researcher,38(2), 109–119. ArticleGoogle Scholar
Pianta, R. C., La Paro, K. M., & Hamre, B. K. (2007). Classroom assessment scoring system. Paul H. Brookes. Google Scholar
Popham, W. J. (1971). Performance tests of teaching proficiency: Rationale, development, and validation. American Educational Research Journal,8(1), 105–117. ArticleGoogle Scholar
Popham, W. J. (2007). Instructional insensitivity of tests: Accountability’s dire drawback. Phi Delta Kappan, 146–155. Google Scholar
Porter, A., Youngs, P., & Odden, A. (2001). Advances in teacher assessments and their uses. In V. Richardson (Ed.), Handbook of research on teaching (4th ed., pp. 259–297). AERA. Google Scholar
Reynolds, A. (1992). Getting to the core of the apple: A theoretical view of the knowledge base of teaching. Journal of Personnel Evaluation in Education,6, 41–55. ArticleGoogle Scholar
Rivkin, S. G., Hanushek, E. A., & Kain, J. F. (2005). Teachers, schools, and academic achievement. Econometrica,73(2), 417–458. ArticleGoogle Scholar
Rothstein, J. (2016). Can value-added models identify teachers’ impacts? IRLE—UC Berkeley. Google Scholar
Rowan, B., & Correnti, R. (2009). Studying reading instruction with teacher logs: Lessons from the study of instructional improvement. Educational Researcher,38(2), 120–131. ArticleGoogle Scholar
Rubin, D. B., Stuart, E. A., & Zanutto, E. L. (2004). A potential outcomes view of value-added assessment in education. Journal of Educational and Behavioral Statistics,29(1), 103–116. ArticleGoogle Scholar
S.B. 736, Student Success Act. (2010). St. FL. Google Scholar
S.B. 736, Student Success Act, Section 1012.343(3)(a)1 (2010). Google Scholar
Sass, T. R. (2008). The stability of value‐added measures of teacher quality and implications for teacher compensation policy. CALDER—Urban Institute. Google Scholar
Sato, M. (2014). What is the underlying conception of teaching of the edTPA? Journal of Teacher Education,65(5), 421–434. ArticleGoogle Scholar
Sawada, D., Piburn, M. D., Judson, E., Turley, J., Falconer, K., Benford, R., & Bloom, I. (2002). Measuring reform practices in science and mathematics classrooms: The reformed teaching observation protocol. School Science and Mathematics,102(6), 245–253. ArticleGoogle Scholar
Schweig, J. D. (2016). Moving beyond means: Revealing features of the learning environment by investigating the agreement of student ratings. Learning Environments Research,19(3), 441–462. ArticleGoogle Scholar
Schweig, J., Baker, G., Hamilton, L. S., & Stecher, B. M. (2018). Building a repository of assessments of interpersonal, intrapersonal, and higher-order cognitive competencies. RAND Corporation. Google Scholar
Shulman, L. (1998). Teacher portfolios: A theoretical activity. In N. Lyons (Ed.), With portfolio in hand (pp. 23–37). Teachers College Press. Google Scholar
Shulman, L. S. (1987). Knowledge and teaching: Foundations of the new reform. Harvard Educational Review,57(1), 1–22. ArticleGoogle Scholar
Stecher, B. M., Wood, A. C., Gilbert, M., Borko, H., Kuffner, K. L., Arnold, S. C., & Dorman, E. H. (2005). Using classroom artifacts to measure instructional practices in middle school mathematics: A two-state field test (CSE Report 662). CRESST. Google Scholar
Stecher, B., & Kirby, S. N. (2004). Organizational improvement and accountability: Lessons for education from other sectors. RAND Corporation. Google Scholar
Steele, J., Hamilton, L. S., & Stecher, B. M. (2010). Incorporating student performance measures into teacher evaluation systems. The RAND Corporation. Google Scholar
Stodolsky, S. S. (1990). Classroom observation. In J. Millman & L. Darling-Hammond (Eds.), The new handbook of teacher evaluation: Assessing elementary and secondary school teachers (pp. 175–190). Corwin Press. Google Scholar
Taut, S., & Sun, Y. (2014). The development and implementation of a national, standards-based, multi-method teacher performance assessment system in Chile. Education Policy Analysis Archives, 22(71). Google Scholar
The Danielson Group. (2020). The framework for remote teaching. The Danielson Group. Google Scholar
Tucker, P. D., & Stronge, J. H. (2005). Linking teacher evaluation and student learning. Association for Supervision and Curriculum Development. Google Scholar
U.S. Department of Education. (2001). No child left behind act (Executive Summary). U.S. Department of Education. Google Scholar
Walkington, C., & Marder, M. (2018). Using the UTeach observation protocol (UTOP) to understand the quality of mathematics instruction. ZDM Mathematics Education,50, 507–519. ArticleGoogle Scholar
Walsh, E., & Isenberg, E. (2013). How does a value-added model compare to the colorado growth model? Mathematica Policy Research. Google Scholar
Weisberg, D., Sexton, S., Mulhern, J., & Keeling, D. (2009). The widget effect: Our national failure to acknowledge and act on differences in teacher effectiveness. The New Teacher Project. Google Scholar
West, M. R. (2016). Should non-cognitive skills be included in school accountability systems? Preliminary evidence from California’s CORE districts. Evidence Speaks Reports,1(13), 1–7. Google Scholar
Windschitl, M., Thompson, J., & Braaten, M. (2018). Ambitious science teaching. Harvard Education Press. Google Scholar
Wise, A. E., Darling-Hammond, L., McLaughlin, M. W., & Bernstein, H. T. (1985). Teacher evaluation: A study of effective practices. The Elementary School Journal,86(1), 60–121. ArticleGoogle Scholar
Wragg, E. C. (1999). An introduction to classroom observation. Routledge. Google Scholar
Yuan, K., Le, V., McCaffrey, D. F., Marsh, J. A., Hamilton, L. S., Stecher, B. M., & Springer, M. G. (2013). Incentive pay programs do not affect teacher motivation or reported practices: Results from three randomized studies. Educational Evaluation and Policy Analysis,35(1), 3–22. ArticleGoogle Scholar

Author information

Authors and Affiliations

UCLA, Los Ángeles, CA, USA María Paz Fernández & José Felipe Martínez

María Paz Fernández