Institute of Behavioral Science, Room 1B73
University of Colorado at Boulder
Boulder, CO 80309
Derek Briggs is a professor in the Research and Evaluation Methodology program where he also directs the Center for Assessment Design Research and Evaluation. Dr. Briggs’s research focuses upon advancing methods for the measurement and evaluation of student learning. His daily agenda is to challenge conventional wisdom and methodological chicanery as they manifest themselves in educational research, policy and practice. As a psychometrician, Dr. Briggs works with states and other entities to provide technical advice on the design and use of large-scale student assessments. He has a special interest in the use of learning progressions as a method for facilitating student-level inferences about growth, and helping to bridge the use of test scores for formative and summative purposes. Other interests include the use and analysis of statistical models to support causal inferences about the effects of educational interventions on student achievement.
Dr. Briggs teaches graduate level courses on quantitative research methodology with a focus on psychometrics. These include Quantitative Methods in Educational Research I (EDUC 8230), Measurement in Survey Research (EDUC 8710), Advanced Topics in Measurement (EDUC 8720) and Categorical Data Analysis (EDUC 7396).
Dr. Briggs’s investigations into the relationship between developmental (i.e., vertical) score scales and inferences about student growth and teacher/school value-added were recognized with an award by the National Council for Measurement in Education and Provost’s Achievement Award at the University of Colorado. He is widely recognized for his evaluations of the effects of test preparation on college admissions exam performance.
Dr. Briggs is the past president of the National Council on Measurement in Education (2021-22), past editor of the journal Educational Measurement: Issues and Practice, and author of the book Historical and Conceptual Foundations of Measurement in the Human Sciences: Credos and Controversies (Routledge).
PhD Education, University of California, Berkeley, 2002
MA Education, University of California, Berkeley, 1998
BA Economics, Carleton College, 1993
In the current educational policy environment, there is great interest in both holding schools accountable for student learning, and knowing “what works” in terms of interventions that will increase student learning. I see two related methodological obstacles to each of these policy foci. First, standardized tests used as the outcome measures to hold schools accountable are often weak proxies for student learning, proxies that lack validity evidence to support their interpretation. Second, even when these outcomes can be considered valid representations of the constructs being tested (e.g., understanding of mathematics), it is very difficult to isolate the effect of any one potential cause on changes in these outcomes—let alone to generalize the presumed effect beyond the units, treatments, observations and setting of a given study. Because the policies associated with accountability programs and effective educational interventions are high stakes in nature, it is important to scrutinize existing methodological approaches used to measure and evaluate growth in student achievement. The goal is to find ways to improve these approaches, or develop better alternatives.
Measuring Student Achievement
A precondition for the measurement of student learning is typically the administration and scoring of a standardized test instrument. In much of the educational research literature it is implicitly assumed that differences in test scores among students (i.e., differences in student “achievement”) are suggestive of different levels of what students have learned in a given domain. It follows from this that subsequent changes in these scores over time for any given student can be interpreted as a quantification of learning. The aim of my research is to increase the plausibility of such assumptions by making tighter linkages between what students learn and the way that this is measured. This link depends greatly on the strength of the argument that can be established with respect to test validity. To this end, much of my research focuses on the importance of instrument design as a means of developing compelling validity arguments.
A good example of research I have conducted along these lines can be found in the paper “Diagnostic Assessment with Ordered Multiple-Choice Items” (Briggs, Alonzo, Schwab & Wilson, 2006), published in the journal Educational Assessment, and the paper “Using learning progressions to design vertical scales that support coherent inferences about student growth” (Briggs & Peck, 2015) published in the journal Measurement: Interdisciplinary Research and Perspectives. In the 2006 paper, my colleagues and I developed a novel item format we describe as ordered multiple-choice (OMC). OMC items differ from traditional multiple-choice items in that each OMC response option has been linked to an underlying developmental continuum for how student understanding should be expected to progress. Because of this linkage, it becomes possible to make diagnostic interpretations about student understanding from student OMC item responses. In the 2015 paper, we argue that there is a disconnect between the criterion-referenced intuitions that parents and teachers have for what it means for students to demonstrate growth and the primarily norm-referenced metrics that are typically used to infer growth. We propose and illustrate the use of a learning-progression approach to the conceptualization of growth and the subsequent design of a vertical score scale in the context of the Common Core State Standards for Mathematics.
Much of my research focuses upon both technical and theoretical issues in the development of measurement models. My emphasis has been on models within the framework of what is known as item response theory (IRT). Even my most technical research endeavors have a clear relevance to the bigger picture of developing valid measures of student learning. I have been exploring three issues in particular: (1) IRT as a framework for modeling growth in student achievement, (2) IRT models that can account for the fact that many tests measure multidimensional constructs, and (3) IRT models that can provide information to test developers about the extent to which estimates of student achievement are generalizable. From a more theoretical standoint, I am interested in foundational issues that are easy to overlook in psychometrics. For instance, is it appropriate to apply the terminology of measurement, which has its roots in the physical sciences, to describe the process of quantification in the social, psychological and behavioral sciences? Psychometricians often talk past each other because we do not share a commonly understood definition or vocabulary for educational measurement. This can lead to serious confusions and controversies that show little sign of resolution, as I argue in the paper “Measuring Growth with Vertical Scales” (Briggs, 2013) published in the Journal of Educational Measurement.
Evaluating Interventions intended to have an Effect on Student Achievement
Making causal inferences about the effects of educational interventions on student learning is difficult because the nature of educational research involves tremendous heterogeneity in the quality of study designs. In much of my research I critically examine the methodological obstacles to quantifying the effect of test preparation programs on standardized test performance. For example, in “Causal Inference and the Heckman Model,” (Briggs, 2004) published in the Journal of Educational and Behavioral Statistics I examine the Heckman Model, a methodological approach typically employed by economists in situations where causal effects must be estimated for a quasi-experimental study design. In “Meta-Analysis: A Case Study,” (Briggs, 2005), published in the journal Evaluation Review, I examine the extent to which a meta-analytic modeling approach allows for generalizable inferences as to the effectiveness of coaching programs on test performance. And in the NEPC Report “Due Diligence and the Evaluation of Teachers” (Briggs & Domingue, 2009) we examine the value-added models that were used to make causal inferences about the efficacy of teachers in the Los Angeles Unified School District. These articles are representative of the scrutiny I apply to statistical modeling in which causal inference is the fundamental objective. In each article my approach was to carefully test the sensitivity of methodological approaches to their underlying assumptions. When I find a great deal of sensitivity present, I caution against a mechanical use of statistical modeling approaches such as the Heckman model, meta-analysis and value-added modeling unless they are accompanied by a deep understanding of the empirical research context.
Examples of Research Projects
My portofolio of research projects is constantly changing, but almost always involve some combination of psychometric investigation using empirical and/or simualted data or more evaluative projects that involve statisitical modeling and analyses. Most of these projects are part of ongoing work through the Center for Assessment Design Research and Evaluation (CADRE). Please visit the CADRE website for summaries of current projects as well as past project reports, research briefs and working papers.
- “An Economist, a Psychometrician and the Father of a Special Needs Child Walk into a School”, Invited Womer Lecture, University of Michigan, February 19, 2013.
- “Standardized Testing and Special Needs”, Chautauqua Education Series, March 18, 2015.
- “Measuring Student Learning: Assessment 101,” Aspen Institute Education Summit, September 26, 2015.
- "Introduction to Causal Inference"
- “Turning the Page to the Next Chapter in Education Measurement”, NCME 2022 Presidential Address, San Diego, April 23, 2022.
In the various courses I teach, my goals are (1) to show my students how quantitative methods can help them as they make and critique educational research arguments, (2) to impress upon them that the validity of educational research conclusions depends not upon the specific methodological approach being taken, but upon how well the approach fits the research question that was posed, (3) to help them learn how different quantitative methods fit together and how they can be used effectively, and (4) to motivate them to deepen their understanding of different methodological approaches.
Courses Frequently Taught:
EDUC 8230: Quantitative Methods in Educational Research
In this class I attempt to convince students to think of quantitative methods as “the art of making numerical conjectures about puzzling questions  .” I also attempt to teach them how to be critical consumers of quantitative methods that are often used to obfuscate more than they are used to enlighten. My classroom sessions are usually a mix of lecture, demonstration, whole-class discussion and small-group activities. Since I believe that the best way to learn statistical methods is to apply these methods in context, my class involves the frequent use of problem sets that combine conceptual and analytical practice questions. Conceptual questions are typically drawn from the course textbook, while analytical questions require the use of statistical software with empirical data sets. To accomplish the latter I expose my students to primary data sets taken from actual examples of published educational research. I find that using real data sets from published research provides students with a grounded context within which to interpret the results of their analyses. On each problem set I emphasize to students that the reasoning and evidence they provide to support their answers is just as important to me as the correctness of their answers. This is the basis for the formative comments I provide as feedback on all completed problem sets. Other structural elements of this course have included midterm and final exams, and a research project in which students are expected to critique the presentation of a statistical analysis from a newspaper report.
For a copy of the most recent course syllabus for EDUC 8230, click here.
 This definition comes from the required textbook for the course: Statistics, 3rd Edition, by Freedman, Pisani & Purves.
EDUC 8710: Measurement in Survey Research
EDUC 8720: Advanced Topics in Measurement
In these specialty courses I emphasize the idea that measurement is both art and science, and that while much of the science can be learned from textbooks and articles, the art can be learned only from experience. The aim of the first course is to give students an introduction to fundamental concepts of measurement through a semester-long project in which students are expected to develop, pilot test, analyze and evaluate their own survey instruments. The concept of validity serves as a unifying theme that motivates the development and evaluation of these instruments. The focus of the second course is on obtaining a deeper understanding of specific psychometric models for measurement and their applications in educational and psychological research. Consistent with the approach taken in all my courses, students are expected to apply and compare different psychometric models in the context of empirical data.
Syllabus for Measurement in Survey Research
(For complete list of publications, please see the faculty member's curriculum vitae.)
Articles, Book Chapters and Reports
Briggs, D. C. & Peck, F. A. (2015). Using learning progressions to design vertical scales that support coherent inferences about student growth. Measurement: Interdisciplinary Research & Perspectives, 13, 75-99.
Briggs, D. C. & Dadey, N. (2015). Making sense of common test items that do not get easier over time: Implications for vertical scale designs. Educational Assessment, 20(1), 1-22.
Briggs, D. & Domingue, B. (2014) Value-added to what? The paradox of multidimensionality. R. Lissitz (ed.), Value-added Modeling and Growth Modeling with Particular Application to Teacher and School Effectiveness. Charlotte, NC: Information Age Publishing.
Briggs, D. C., & Domingue, B. (2013). The gains from vertical scaling. Journal of Educational and Behavioral Statistics, 38(6), 551-576.
Briggs, D. C. (2013). Measuring growth with vertical scales. Journal of Educational Measurement, 50(2), 204-226.
Briggs, D. C. (2013). Teacher evaluation as Trojan horse: the case for teacher-developed assessments. Measurement: Interdisciplinary Research and Perspectives, Vol 11(1-2), 24-29.
Briggs, D. C., Ruiz-Primo, M. A., Furtak, E., Shephard, L. & Yin, Y. (2012). Meta-analytic methodology and conclusions about the efficacy of formative assessment. Educational Measurement: Issues and Practice, 13-17.
Briggs, D. C. (2012). Making value-added inferences from large-scale assessments. In M. Simon, K. Ercikan, & M. Rousseau (Eds.), Improving Large-Scale Assessment in Education: Theory, Issues and Practice. London: Routledge.
Briggs, D. C. (2012). Making progress in the modeling of learning progressions. In A. Alonzo & A. Gotwals (Eds.) Learning Progressions In Science (pp. 293-316). Sense Publishers.
Briggs, D. C. & Alonzo, A. C. (2012) The psychometric modeling of ordered multiple-choice item responses for diagnostic assessment with a learning progression. In A. Alonzo & A. Gotwals (Eds.), Learning Progressions In Science (pp. 345-355). Sense Publishers.
Dadey, N. & Briggs, D. C. (2012). A meta-analysis of growth trends from vertically scaled assessments. Practical Assessment, Research & Evaluation, 17(14). Available online: http://pareonline.net/pdf/v17n14.pdf
Briggs, D. C. & Weeks, J. P. (2011) The persistence of value-added school effects. Journal of Educational and Behavioral Statistics, 36(5), 616-637.
Briggs, D. C. (2011) Cause or Effect? Validating the use of tests for high-stakes inferences in education. In N. J. Dorans & S. Sinharay (Eds.), Looking Back: Proceedings of a Conference in Honor of Paul W. Holland. New York: Springer.
Briggs, D. C. & Weeks, J. P. (2009) The sensitivity of value-added modeling to the creation of a vertical scale. Education Finance & Policy, 4(4), 384-414.
Briggs, D. C. & Domingue, B. D. (2011). Due diligence and the evaluation of teachers: A review of the value-added analysis underlying the effectiveness rankings of Los Angeles Unified School District Teachers by the Los Angeles Times. National Education Policy Center. http://nepc.colorado.edu/publication/due-diligence.
Domingue, B. W. & Briggs, D. C. (2009) Using linear regression and propensity score matching to estimate the effect of coaching on the SAT. Multiple Linear Regression Viewpoints, 35(1), 12-29.
Briggs, D. C. (2008) Using explanatory item response models to analyze group differences in science achievement. Applied Measurement in Education, 21(2), 89-118.
Briggs, D. C. (2008) Synthesizing causal inferences. Educational Researcher, 37(1), 15-22.
Briggs, D. C., & Wilson, M. (2007) Generalizability in item response modeling. Journal of Educational Measurement, Vol 44(2), 131-155.
Briggs, D., Alonzo, A., Schwab, C. & Wilson, M. (2006) Diagnostic assessment with ordered multiple-choice items. Educational Assessment, 11(1), 33-64.
Briggs, D. C. (2005) Meta-analysis: a case study. Evaluation Review. Vol 29(2), 87-127.
Briggs, D. C. (2004) Causal inference and the Heckman model. Journal of Educational and Behavioral Statistics. Vol 29(4), 397-420.
Briggs, D. C. & Wilson, M. (2003) An introduction to multidimensional measurement using Rasch models. Journal of Applied Measurement, 4(1), 87-100.
Briggs, D. C. (2002) SAT coaching, bias and causal inference. Dissertation Abstracts International. DAI-A 64/12, p. 4433. (UMI No. 3115515)
Briggs, D. C. (2001) The effect of admissions test preparation: evidence from NELS-88. Chance, Vol. 14(1), 10-18.