Link to Resource: Making Inferences about Teacher Observation Scores over Time

Authors: Derek C. Briggs and Jessica L. Alzen

Citation: Pre-print of Briggs, D. C. & Alzen, J. L. (2019). Making inferences about teacher observation scores over time. Educational and Psychological Measurement.


Observation protocol scores are commonly used as status measures to support inferences about teacher practices. When multiple observations are collected for the same teacher over the course of a year, some portion of a teacher’s score on each occasion may be attributable to the rater, lesson and time of year of the observation. All three of these are facets that can threaten the generalizability of teacher scores, but the role of time is easiest to overlook. A generalizability theory framework is used in this study to illustrate the concept of a hidden facet of measurement. When there are many temporally spaced observation occasions, it may be possible to support inferences about the growth in teaching practices over time as an alternative (or complement) to making inferences about status at a single point in time. This study uses longitudinal observation scores from the Measures of Effective Teaching project to estimate the reliability of teacher-level growth parameters for designs that vary in the number and spacing of observation occasions over a two-year span. On the basis of a subsample of teachers scored using the Danielson Framework for Teaching, we show that at least 8 observations over two years are needed before it would be possible to make distinctions in growth with a reliability coefficient of .38.