University of Colorado at Boulder  
School of Education
CU: Home A to Z map University of Colorado at Boulder Home A to Z Campus Map School of Education
About the School Faculty and Research Centers & Outreach Faculty/Staff Directory Prospective Students Current Students Alumni
 Faculty Listing
 Derek C. Briggs, PhD
  Contact
  Teaching
  Research
  Publications
  Presentations
  Curriculum Vitae

Derek C. Briggs, PhD

Research

Summary of Research Agenda

In the current educational policy environment, there is great interest in both holding schools accountable for student learning, and knowing “what works” in terms of interventions that will increase student learning. I see two related methodological obstacles to each of these policy foci. First, standardized tests used as the outcome measures to hold schools accountable are often weak proxies for student learning, proxies that lack validity evidence to support their interpretation. Second, even when these outcomes can be considered valid representations of the constructs being tested (e.g., understanding of mathematics), it is very difficult to isolate the effect of any one potential cause on changes in these outcomes—let alone to generalize the presumed effect beyond the units, treatments, observations and setting of a given study. Because the policies associated with accountability programs and effective educational interventions are high stakes in nature, it is important to scrutinize existing methodological approaches used to measure and evaluate growth in student achievement. My research activities are intended to accomplish the latter as a first step in improving these approaches, or developing new ones altogether. At this early stage in my career, my research is primarily useful to educational policy and practice in pointing out when inferences about learning and causation may be either equivocal or misleading.

Measuring Student Achievement

A precondition for the measurement of student learning is typically the administration and scoring of a standardized test instrument. In much of the educational research literature it is implicitly assumed that differences in test scores among students (i.e., differences in student “achievement”) are suggestive of different levels of what students have learned in a given domain. It follows from this that subsequent changes in these scores over time for any given student can be interpreted as a quantification of learning. The aim of my research is to increase the plausibility of such assumptions by making tighter linkages between what students learn and the way that this is measured. This link depends greatly on the strength of the argument that can be established with respect to test validity. To this end, much of my research focuses on the importance of instrument design as a means of developing compelling validity arguments. In this I have been strongly influenced by the work of Mark Wilson, my graduate advisor, whose philosophy about measurement and its relationship to the broader notion of assessment is captured within the concept of the “assessment triangle” in the National Research Council book Knowing What Students Know: The Science and Design of Educational Assessment (2001).

In both my research and teaching I build upon Wilson’s work and make the case that the validity argument in support of any standardized test depends to a large degree upon the interrelationship between the three corners of the assessment triangle: (a) a theory of developmental progression for the construct being measured, (b) an observation model that links this theory to the items contained in the test instrument, and (c) a measurement model that links observed item responses back to the theory of developmental progression. The concept that valid measurement requires all three of these elements may not seem controversial, but in practice the specification of measurement models by psychometricians often seems to be primarily a statistical exercise, decoupled from any underlying theory of development and observation. As part of a commentary published in the journal Measurement (Briggs, 2004), I have suggested that the development of a validity argument in support of test score inferences might be fruitfully split into two stages: design validity and interpretive validity. The design validity of a standardized test might be established through the use of something like the assessment triangle before the test is operationally administered, while interpretive validity would be established after the test has been administered and would necessitate an ongoing program of evaluation. A good example of research I have conducted that adheres to this idea of establishing design validity as part of instrument development can be found in the paper “Diagnostic Assessment with Ordered Multiple-Choice Items” (Briggs, Alonzo, Schwab & Wilson, 2006), published in the journal Educational Assessment. In this study, my colleagues and I developed a novel item format we describe as ordered multiple-choice (OMC). OMC items differ from traditional multiple-choice items in that each OMC response option has been linked to an underlying developmental continuum for how student understanding should be expected to progress. Because of this linkage, it becomes possible to make diagnostic interpretations about student understanding from student OMC item responses. A hypothesis from this research that I am continuing to explore is that when a measurement model is specified within this sort of a design context, it becomes more plausible to advance the argument that differences in estimated values of student achievement can be validly interpreted as differences in student learning.

Much of my research specifically focuses upon technical issues in the development of measurement models. My emphasis has been on models within the framework of what is known as item response theory (IRT). However, it is important to note that even my most technical research endeavors have a clear relevance to the bigger picture of developing valid measures of student learning. I have been exploring three issues in particular: (1) IRT as a framework for modeling growth in student achievement, (2) IRT models that can account for the fact that many tests measure multidimensional constructs, and (3) IRT models that can provide information to test developers about the extent to which estimates of student achievement are generalizable. My interest in using IRT to model both growth and multidimensionality stems in part from my experience as a contributor to the edited textbook Explanatory Item Response Models: A Generalized Linear and Nonlinear Approach (Wilson & De Boeck, 2004). It is established in this textbook that IRT models can be cast within a broader statistical framework as a multivariate, multilevel model in which item responses are the lowest level of analysis, and repeated measures of these item responses over time constitute a second level of analysis. In addition, I have written two articles and one book chapter on the advantages and approaches to using IRT models for the purpose of multidimensional measurement. One article (co-authored with Mark Wilson), was published in the Journal of Applied Measurement; a second article, of which I am the sole author, is in review at the journal Applied Measurement in Education.  Finally, over the past three years I have been developing a method that would allow the user of an IRT model to speculate about the generalizability of score inferences that derive from the administration of any given test instrument. The paper presenting this approach, “Generalizability in Item Response Modeling” (co-authored with Mark Wilson), is currently in press at the Journal of Educational Measurement, one of the most technically demanding and selective journals in my field.

Evaluating Growth in Student Achievement

Making generalizable causal inferences about the effectiveness of educational interventions on student learning is difficult, in large part for the reasons I have outlined above, and also because the nature of educational research involves tremendous heterogeneity in the quality of study designs. I have published two articles in which I critically examine the methodological obstacles to quantifying the effect of test preparation programs on standardized test performance. The first article, “Causal Inference and the Heckman Model,” was based upon my doctoral dissertation and published in the Journal of Educational and Behavioral Statistics. In this article I examine the Heckman Model, a methodological approach typically employed by economists in situations where causal effects must be estimated for a quasi-experimental study design. In the second article, “Meta-Analysis: A Case Study,” published in the journal Evaluation Review, I examine the extent to which a meta-analytic modeling approach allows for generalizable inferences as to the effectiveness of coaching programs on test performance. These articles are representative of the scrutiny I apply to statistical (and psychometric) modeling in which causal inference is the fundamental objective. In each article my approach was to carefully test the sensitivity of methodological approaches to their underlying assumptions. After finding a great deal of sensitivity present, I caution against a mechanical use of statistical modeling approaches such as the Heckman model and meta-analysis unless they are accompanied by a deep understanding of the empirical research context.

Titles of Research Projects

  • Validatation Theory and Practice in the Context of High Stakes Test Use
  • Vertical Scaling in Value-Added Models for Student Learning
  • Multidimensional Growth Modeling: Estimating Value-Added School Effects with a Multidimensional Vertical Scale
  • The Effectiveness of Admissions Test Preparation: New Evidence from ELS:02
  • Undergraduate Science Course Innovations and their Impact on Student Learning
  • The Flexible Application of Student-Centered Instruction (FASCI) instrument
University of Colorado at Boulder



University of Colorado at Boulder