Derek Briggs

Professor and Program Chair, Research & Evaluation Methodology (REM)
Areas of Expertise
Assessment, Causal Inference, Evaluation, Growth and Value-Added Modeling, Learning Progressions, Measurement, Policy, Psychometrics, Statistical Analysis, Survey Design and Research, Test Development and Validation


Derek Briggs is a professor of quantitative methods and policy analysis and chair of the Research and Evaluation Methodology program at the University of Colorado Boulder.

Dr. Briggs’s long-term research agenda focuses upon building sound methodological approaches for the measurement and evaluation of growth in student learning. His daily agenda is to challenge conventional wisdom and methodological chicanery as they manifest themselves in educational research, policy and practice. He has a special interest in the use of learning progressions as a method for facilitating student-level inferences about growth, and helping to bridge the use of test scores for formative and summative purposes. Other interests include critical analyses of the statistical models used to make causal inferences about the effects of teachers, schools and other educational interventions on student achievement. 

Dr. Briggs teaches graduate level courses on research methodology.  These include Quantitative Methods in Educational Research I (EDUC 8230), Measurement in Survey Research (EDUC 8710), Advanced Topics in Measurement (EDUC 8720) and Educational Evaluation (EDUC 7386).

Dr. Briggs’s investigations into the relationship between the developmental (i.e., vertical) score scales and inferences about student growth and teacher/school value-added were recognized with an award by the National Council for Measurement in Education and Provost’s Achievement Award at the University of Colorado. He is widely recognized for his evaluations of the effects of test preparation on college admissions exam performance.

Some of his notable publications include “Measuring growth with vertical scales” (Journal of Educational Measurement); “Due diligence and the evaluation of teachers” (NEPC Report); “Preparation for college admissions exams” (Report Commissioned by the National Association of College Admissions Counselors); “The impact of vertical scaling decisions on growth interpretations” (Educational Measurement: Issues and Practice); “Diagnostic Assessment with Ordered Multiple-Choice Items,” (Educational Assessment); “Meta-Analysis: A Case Study,” Evaluation Review.

Dr. Briggs is the current editor of the journal Educational Measurement: Issues and Practice and serves on numerous state and national level technical advisory committees on the topic of educational assessment and accountability. He is a member of the American Educational Research Association, the National Council for Measurement in Education, the Society for Research on Educational Effectiveness, and the Psychometric Society.


PhD Education, University of California, Berkeley, 2002
MA Education, University of California, Berkeley, 1998
BA Economics, Carleton College, 1993


Summary of Research Agenda

In the current educational policy environment, there is great interest in both holding schools accountable for student learning, and knowing “what works” in terms of interventions that will increase student learning. I see two related methodological obstacles to each of these policy foci. First, standardized tests used as the outcome measures to hold schools accountable are often weak proxies for student learning, proxies that lack validity evidence to support their interpretation. Second, even when these outcomes can be considered valid representations of the constructs being tested (e.g., understanding of mathematics), it is very difficult to isolate the effect of any one potential cause on changes in these outcomes—let alone to generalize the presumed effect beyond the units, treatments, observations and setting of a given study. Because the policies associated with accountability programs and effective educational interventions are high stakes in nature, it is important to scrutinize existing methodological approaches used to measure and evaluate growth in student achievement. My research activities are intended to accomplish the latter as a first step in improving these approaches, or developing new ones altogether. At this early stage in my career, my research is primarily useful to educational policy and practice in pointing out when inferences about learning and causation may be either equivocal or misleading. 

Measuring Student Achievement

A precondition for the measurement of student learning is typically the administration and scoring of a standardized test instrument. In much of the educational research literature it is implicitly assumed that differences in test scores among students (i.e., differences in student “achievement”) are suggestive of different levels of what students have learned in a given domain. It follows from this that subsequent changes in these scores over time for any given student can be interpreted as a quantification of learning. The aim of my research is to increase the plausibility of such assumptions by making tighter linkages between what students learn and the way that this is measured. This link depends greatly on the strength of the argument that can be established with respect to test validity. To this end, much of my research focuses on the importance of instrument design as a means of developing compelling validity arguments. In this I have been strongly influenced by the work of Mark Wilson, my graduate advisor, whose philosophy about measurement and its relationship to the broader notion of assessment is captured within the concept of the “assessment triangle” in the National Research Council book Knowing What Students Know: The Science and Design of Educational Assessment (2001). 

In both my research and teaching I build upon Wilson’s work and make the case that the validity argument in support of any standardized test depends to a large degree upon the interrelationship between the three corners of the assessment triangle: (a) a theory of developmental progression for the construct being measured, (b) an observation model that links this theory to the items contained in the test instrument, and (c) a measurement model that links observed item responses back to the theory of developmental progression. The concept that valid measurement requires all three of these elements may not seem controversial, but in practice the specification of measurement models by psychometricians often seems to be primarily a statistical exercise, decoupled from any underlying theory of development and observation. As part of a commentary published in the journal Measurement (Briggs, 2004), I have suggested that the development of a validity argument in support of test score inferences might be fruitfully split into two stages: design validity and interpretive validity. The design validity of a standardized test might be established through the use of something like the assessment triangle before the test is operationally administered, while interpretive validity would be established after the test has been administered and would necessitate an ongoing program of evaluation. A good example of research I have conducted that adheres to this idea of establishing design validity as part of instrument development can be found in the paper “Diagnostic Assessment with Ordered Multiple-Choice Items” (Briggs, Alonzo, Schwab & Wilson, 2006), published in the journal Educational Assessment. In this study, my colleagues and I developed a novel item format we describe as ordered multiple-choice (OMC). OMC items differ from traditional multiple-choice items in that each OMC response option has been linked to an underlying developmental continuum for how student understanding should be expected to progress. Because of this linkage, it becomes possible to make diagnostic interpretations about student understanding from student OMC item responses. A hypothesis from this research that I am continuing to explore is that when a measurement model is specified within this sort of a design context, it becomes more plausible to advance the argument that differences in estimated values of student achievement can be validly interpreted as differences in student learning.

Much of my research specifically focuses upon technical issues in the development of measurement models. My emphasis has been on models within the framework of what is known as item response theory (IRT). However, it is important to note that even my most technical research endeavors have a clear relevance to the bigger picture of developing valid measures of student learning. I have been exploring three issues in particular: (1) IRT as a framework for modeling growth in student achievement, (2) IRT models that can account for the fact that many tests measure multidimensional constructs, and (3) IRT models that can provide information to test developers about the extent to which estimates of student achievement are generalizable. My interest in using IRT to model both growth and multidimensionality stems in part from my experience as a contributor to the edited textbook Explanatory Item Response Models: A Generalized Linear and Nonlinear Approach (Wilson & De Boeck, 2004). It is established in this textbook that IRT models can be cast within a broader statistical framework as a multivariate, multilevel model in which item responses are the lowest level of analysis, and repeated measures of these item responses over time constitute a second level of analysis. In addition, I have written two articles and one book chapter on the advantages and approaches to using IRT models for the purpose of multidimensional measurement. One article (co-authored with Mark Wilson), was published in the Journal of Applied Measurement; a second article, of which I am the sole author, is in review at the journal Applied Measurement in Education.  Finally, over the past three years I have been developing a method that would allow the user of an IRT model to speculate about the generalizability of score inferences that derive from the administration of any given test instrument. The paper presenting this approach, “Generalizability in Item Response Modeling” (co-authored with Mark Wilson), is currently in press at the Journal of Educational Measurement, one of the most technically demanding and selective journals in my field.

Evaluating Growth in Student Achievement

Making generalizable causal inferences about the effectiveness of educational interventions on student learning is difficult, in large part for the reasons I have outlined above, and also because the nature of educational research involves tremendous heterogeneity in the quality of study designs. I have published two articles in which I critically examine the methodological obstacles to quantifying the effect of test preparation programs on standardized test performance. The first article, “Causal Inference and the Heckman Model,” was based upon my doctoral dissertation and published in the Journal of Educational and Behavioral Statistics. In this article I examine the Heckman Model, a methodological approach typically employed by economists in situations where causal effects must be estimated for a quasi-experimental study design. In the second article, “Meta-Analysis: A Case Study,” published in the journal Evaluation Review, I examine the extent to which a meta-analytic modeling approach allows for generalizable inferences as to the effectiveness of coaching programs on test performance. These articles are representative of the scrutiny I apply to statistical (and psychometric) modeling in which causal inference is the fundamental objective. In each article my approach was to carefully test the sensitivity of methodological approaches to their underlying assumptions. After finding a great deal of sensitivity present, I caution against a mechanical use of statistical modeling approaches such as the Heckman model and meta-analysis unless they are accompanied by a deep understanding of the empirical research context. 

Titles of Research Projects

  • Validatation Theory and Practice in the Context of High Stakes Test Use
  • Vertical Scaling in Value-Added Models for Student Learning
  • Multidimensional Growth Modeling: Estimating Value-Added School Effects with a Multidimensional Vertical Scale
  • The Effectiveness of Admissions Test Preparation: New Evidence from ELS:02
  • Undergraduate Science Course Innovations and their Impact on Student Learning
  • The Flexible Application of Student-Centered Instruction (FASCI) instrument


Briggs, D. C., Weeks, J. P. & Wiley. E. (2008) Vertical Scaling in Value-Added Models for Student LearningPresentation at the National Conference for Value-Added Modeling, April 23, 2008, Madison, WI.

Briggs, D. C. & Weeks, J. P. (2008) The Persistence of Value-Added School EffectsPresentation at the 2008 Annual Meeting of the American Educational Research Association, Division D, New York, NY. March 27, 2008.

Briggs, D.C. , Weeks, J. P. & Wiley. E. (2008) The Impact of Vertical Scaling Decisions on Growth ProjectionsPresentation at the annual meeting of the National Council for Measurement in Education Annual Conference, March 26, 2008, New York, NY.


In the various courses I teach, my goals are (1) to show my students how quantitative methods can help them as they make and critique educational research arguments, (2) to impress upon them that the validity of educational research conclusions depends not upon the specific methodological approach being taken, but upon how well the approach fits the research question that was posed, (3) to help them learn how different quantitative methods fit together and how they can be used effectively, and (4) to motivate them to deepen their understanding of different methodological approaches. 

Courses Frequently Taught:

EDUC 8230: Quantitative Methods in Educational Research

In this class I attempt to convince students to think of quantitative methods as “the art of making numerical conjectures about puzzling questions [1] .” I also attempt to teach them how to be critical consumers of quantitative methods that are often used to obfuscate more than they are used to enlighten. My classroom sessions are usually a mix of lecture, demonstration, whole-class discussion and small-group activities. Since I believe that the best way to learn statistical methods is to apply these methods in context, my class involves the frequent use of problem sets that combine conceptual and analytical practice questions. Conceptual questions are typically drawn from the course textbook, while analytical questions require the use of statistical software with empirical data sets. To accomplish the latter I expose my students to primary data sets taken from actual examples of published educational research. I find that using real data sets from published research provides students with a grounded context within which to interpret the results of their analyses. On each problem set I emphasize to students that the reasoning and evidence they provide to support their answers is just as important to me as the correctness of their answers. This is the basis for the formative comments I provide as feedback on all completed problem sets. Other structural elements of this course have included midterm and final exams, and a research project in which students are expected to critique the presentation of a statistical analysis from a newspaper report.

For a copy of the most recent course syllabus for EDUC 8230, click here.

[1] This definition comes from the required textbook for the course: Statistics, 3rd Edition, by Freedman, Pisani & Purves.

EDUC 8710: Measurement in Survey Research

EDUC 8720: Advanced Topics in Measurement

In these specialty courses I emphasize the idea that measurement is both art and science, and that while much of the science can be learned from textbooks and articles, the art can be learned only from experience. The aim of the first course is to give students an introduction to fundamental concepts of measurement through a semester-long project in which students are expected to develop, pilot test, analyze and evaluate their own survey instruments. The concept of validity serves as a unifying theme that motivates the development and evaluation of these instruments. The focus of the second course is on obtaining a deeper understanding of specific psychometric models for measurement and their applications in educational and psychological research. Consistent with the approach taken in all my courses, students are expected to apply and compare different psychometric models in the context of empirical data.

Syllabus for Measurement in Survey Research

Syllabus for Advanced Topics in Measurement

Selected Publications

(For complete list of publications, please see the faculty member's curriculum vitae.)


Domingue, B. W. & Briggs, D. C. (2009) Using linear regression and propensity score matching to estimate the effect of coaching on the SAT. Multiple Linear Regression Viewpoints, 35(1), 12-29.

Briggs, D. C. (2008) Using explanatory item response models to analyze group differences in science achievement. Applied Measurement in Education, 21(2), 89-118.

Briggs, D. C. (2008) Synthesizing causal inferences. Educational Researcher, 37(1), 15-22.

Briggs, D. C., & Wilson, M. (2007) Generalizability in item response modeling. Journal of Educational Measurement, Vol 44(2), 131-155.

Briggs, D., Alonzo, A., Schwab, C. & Wilson, M. (2006) Diagnostic assessment with ordered multiple-choice items. Educational Assessment, 11(1), 33-64.

Briggs, D. C. (2005) Meta-analysis: a case study. Evaluation Review. Vol 29(2), 87-127.

Briggs, D. C. (2004) Causal inference and the Heckman model. Journal of Educational and Behavioral Statistics. Vol 29(4), 397-420.

Briggs, D. C. & Wilson, M. (2003) An introduction to multidimensional measurement using Rasch models. Journal of Applied Measurement, 4(1), 87-100.

Briggs, D. C. (2002) SAT coaching, bias and causal inference. Dissertation Abstracts International. DAI-A 64/12, p. 4433. (UMI No. 3115515)

Briggs, D. C. (2002) Test preparation programs: impact. Encyclopedia of Education. 2nd Edition.

Briggs, D. C. (2001) The effect of admissions test preparation: evidence from NELS-88. Chance, Vol. 14(1), 10-18.

Stern, D. & Briggs, D. (2001) Does paid employment help or hinder performance in secondary school? Insights from US high school students. Journal of Education and Work, Vol. 14(3), 355-372.

Stern, D. & Briggs, D. (2001) Changing admissions policies: mounting pressures, new developments, more questions. Change Vol. 33(1), 34-41.

Commentary and Reviews

Wiley, E. & Briggs, D.C. (2007) Can value-added assessment improve accountability? Education Views. University of Colorado at Boulder, School of Education, Winter 2007.

Briggs, D. C. (2006) Review of “Getting farther ahead by staying behind: a second-year evaluation of Florida’s policy to end social promotion” by Jay Greene and Marcus Winters. Education Policy Studies Laboratory. Available online at

Briggs, D. C. (2006) Book Review: The SAGE Handbook of Quantitative Methods in the Social Sciences. Applied Psychological Methods. Vol 30(5), 447-451.

Briggs, D. C. (2004) Comment: making an argument for design validity before interpretive validity. Measurement: Interdisciplinary Research and Perspectives. Vol. 2(3), 171-174.

Briggs, D. C. (2002) Comment: Jack Kaplan’s new study of SAT coaching. Chance. Vol. 15(1), 7-8.

Works in Progress, Commissioned Reports and Pre-Prints


Briggs, D. C. (2013). Measuring growth with vertical scales. Journal of Educational Measurement, 50(2), 204-226.

Briggs, D. C. (2013). Teacher evaluation as Trojan horse: the case for teacher-developed assessments.  Measurement: Interdisciplinary Research and Perspectives, Vol 11(1-2), 24-29.

Briggs, D. C., Ruiz-Primo, M. A., Furtak, E., Shephard, L. & Yin, Y. (2012). Meta-analytic methodology and conclusions about the efficacy of formative assessment. Educational Measurement: Issues and Practice, 13-17.

Briggs, D. C. (2012). Making value-added inferences from large-scale assessments.  In M. Simon, K. Ercikan, & M. Rousseau (Eds.), Improving Large-Scale Assessment in Education: Theory, Issues and Practice. London: Routledge.

Briggs, D. C. (2012). Making progress in the modeling of learning progressions. In A. Alonzo & A. Gotwals (Eds.) Learning Progressions In Science (pp. 293-316). Sense Publishers.

Briggs, D. C. & Alonzo, A. C. (2012) The psychometric modeling of ordered multiple-choice item responses for diagnostic assessment with a learning progression.  In A. Alonzo & A. Gotwals (Eds.), Learning Progressions In Science (pp. 345-355). Sense Publishers.

Dadey, N. & Briggs, D. C. (2012). A meta-analysis of growth trends from vertically scaled assessments. Practical Assessment, Research & Evaluation, 17(14). Available online:  

Briggs, D. C. & Weeks, J. P. (2011) The persistence of value-added school effects. Journal of Educational and Behavioral Statistics, 36(5), 616-637.

Briggs, D. C. (2011) Cause or Effect? Validating the use of tests for high-stakes inferences in education.  In N. J. Dorans & S. Sinharay (Eds.), Looking Back: Proceedings of a Conference in Honor of Paul W. Holland. New York: Springer.

Briggs, D. C. & Weeks, J. P. (2009) The sensitivity of value-added modeling to the creation of a vertical scale.  Education Finance & Policy, 4(4), 384-414.

Working Papers

Briggs, D. C. & Domingue, B. (in review) The gains from vertical scaling.  Journal of Educational and Behavioral Statistics.

Briggs, D. C. & Domingue, B. (in review) Value-added to what? The paradox of multidimensionality. 

Briggs, D. C. & Dadey, N. (in review).  Vertical scales that imply students are not learning: artifact or reality?

Briggs, D. C. & Alzen, J. (in review). Does taking an online version of a course have a negative effect of student learning?