CADRE: Center for Assessment, Design, Research and Evaluation


The Center for Assessment, Design, Research and Evaluation (CADRE) is housed in the School of Education at the University of Colorado Boulder. The mission of CADRE is to produce generalizable knowledge that improves the ability to assess student learning and to evaluate programs and methods that may have an effect on this learning

Projects undertaken by CADRE staff represent a collboration with the ongoing activities in the School of Education, the University, and the broader national and international community of scholars and stakeholders involved in educational assessment and evaluation.

CADRE has the following purposes:

  1. To promote and support the design of assessments capable of providing insights about what and how students learn.
  2. To evaluate programs and interventions intended to have effects on student learning.
  3. To disseminate the results of its research to a wide audience of individual educators, educational institutions and organizations, government representatives and units, the media, and the general public.
  4. To collaborate with local, state and national groups, organizations, and individuals on projects germane to its mission.
  5. To enhance the School of Education’s graduate program in Research and Evaluation Methodology.

CADRE meets these purposes by conducting research in assessment and psychometrics, taking a lead role in small and large-scale educational evaluation projects, and training graduate students to have expertise in these areas. CADRE leverages the widely recognized strength of faculty throughout the School of Education in assessment and evaluation.  This expertise is rooted in expertise in both disciplinary content and research methodology.  For example, faculty members in the School’s curriculum and instruction program area bring disciplinary expertise relevant to mathematics, science and literacy to their research on classroom assessment practices.  Faculty members in the School’s Educational Equity and Cultural Diversity program bring expertise in English language acquisition and special education, two important student subgroups for which fairness is a driving principle for assessment and evaluation practices. Faculty members in the School’s Educational Psychology and Learning Science and Educational Foundations Policy and Practice bring expertise in sociocultural theories of learning, program evaluation and qualitative methodology.  The primary home of CADRE is the School’s Research and Evaluation Methodology program, where faculty specialize in psychometrics, statistics, experimental design and program evaluation.

Motivation for CADRE

In the current educational policy environment, there is great interest both in holding schools and teachers accountable for student learning, and in knowing “what works” in terms of interventions that are likely to have a positive effect on student learning.  As one prominent example, Colorado’s SB-191 legislation, passed in 2010, mandates the annual evaluation of all K-12 public school teachers with respect to growth in student learning outcomes.  As another example (one that is close to home here in Boulder), a number of departments in the University of Colorado’s College of Arts & Sciences have hired talented undergraduates to serve as “learning assistants” in large-enrollment STEM courses with two purposes in mind.  The first is to help instructors implement new pedagogical strategies that make students in large-enrollment courses more active participants in their learning experiences.  The second is to inspire the learning assistants to consider a career in teaching, ideally in a K-12 setting. To evaluate progress toward either of these goals, there is great need for expertise in assessment design, research and evaluation. CADRE, based in the University of Colorado’s School of Education, will serve as a central resource where such expertise can be located and engaged.

There are challenging methodological obstacles that must be faced in the assessment and evaluation of student learning. To begin with, one must be able to design assessments capable of quantifying the extent to which learning has taken place. A precondition for assessing student learning is typically the administration and scoring of a standardized test instrument. It is conventionally assumed that individual differences in test scores can be interpreted as evidence of differences in what students understand in a given domain of instruction.  Yet such interpretations hinge upon carefully defining the domain of interest, writing test items that are representative of this domain, and demonstrating that test scores are suitably reliable to support distinctions among test-takers. Furthermore, even the construction of an ideal test is not by itself sufficient for the quantification of what and how much students have learned, since learning implies change over time. Measuring a student’s growth across two or more points in time is not straightforward.  For example, if the same exact same test is given twice and the curriculum to which students are exposed in between is novel, then the “pre-test” is likely to be too hard to make precise distinctions among students.  If completely unique tests are given on each occasion, then differences in scores are often impossible to disentangle from differences in the difficulty of the questions being posed on each test.  Finally, there are important tensions and tradeoffs between the attributes of students one wishes to make inferences about, and the time and cost necessary to elicit and gather the relevant information.  Multiple-choice items are often criticized because they do not elicit evidence of so-called higher-order cognitive skills and abilities such as multistep reasoning, the ability to justify answers with evidence, and the ability to synthesize.  But open-ended “performance tasks” are typically time-consuming, harder to score objectively, and may under-represent the target domain of interest because fewer such tasks can be administered.  

All of the challenges noted above constitute threats to the validity of evaluation efforts that seek to establish whether interventions have had effects on students or classrooms of students.  Such threats are known as threats to construct validity.  That is, if one is not actually measuring student learning, the conclusions of an evaluation establishing the efficacy of an intervention on some locally defined test outcome will be impossible to generalize.  One must be equally vigilant about threats to the internal validity of a study’s design: is the change in some outcome of interest confounded by the presence of a third variable?  For example, if a professor observes a large score gain over the span of three months after administering the same test in a pre-post design, how much of this gain can be attributed to the activities of the professor relative to the natural maturation of the student or the quality of the professor’s teaching assistant?  A good recourse is to compare the score gains of students in a treatment condition to those of students in a control condition.  But unless the two groups have been randomly assigned to each condition, it can be hard to rule out alternate explanations for differences in outcomes by group (e.g., the treatment group was more motivated, the control group was lower achieving at the outset of the study, etc.).