Institutional Research Reporting & Analytics About ODA

ODA Home > Institutional Research > Faculty Course Questionnaire > Section Report Guide > Making Comparisons Across Time and Across Sections

Making Comparisons Across Time and Across Sections

To compare one section to another (or to compare ratings on different questions for one section), follow these general guidelines:
  • Because the FCQ score scale and items changed significantly in fall 2006, we do not recommend making comparisons between former (prior to fall 2006) and current FCQ results.
  • Are the two sections similar enough to warrant comparison? Differences in discipline, activity type (lecture, lab, seminar), level (especially graduate vs. undergraduate), class size, and other factors can make comparisons tricky.
  • Is the return rate sufficiently high? Be cautious if it's under 70%.
  • Do fewer than 90% of returned forms contain ratings on the question you're looking at? This is rare except for the diversity items, but warrants caution when it does occur.
  • Check the section standard deviations (SDs). If these are unusually high (over 1.10 on the former FCQ), then the section average probably does not accurately reflect a "typical" or "consensus" response. In this case comparing averages is not legitimate.

    If all checks are OK:

  • A good way to estimate the meaningfulness of a difference is to convert the difference to standard deviation units. To compare the average rating of Section 2 to the average of Section 1, for example, subtract one average from the other and divide the difference by the standard deviation of Section 1. If the result is .8 or greater (or -.8 or less), the difference is typically considered rather large. See effect size for more information.
  • A statistical significance test can also provide information about the magnitude of a difference between two mean scores. One appropriate statistic for the difference between mean scores of two sections is called a "t test," which gives a "t value" and a "p value." The t value is typically of little interest in and of itself; the associated p value ("p" stands for "probability") is the important number in evaluating statistical significance.
  • Below we'll explain what a p value is, and how to interpret it. There's also a link to an Excel file you can download that will calculate t and p values when you enter section averages, standard deviations, and Ns. First, however, it is important to emphasize what a statistical significance test does not tell you. The most important caveat is this: Whether or not a difference is statistically significant has absolutely no bearing on whether it has any practical or educational or evaluative significance; such questions are matters of informed judgment, not statistical significance tests.
  • With that caveat in mind, here's what a p value, which is the direct measure of statistical significance, does mean. Imagine a situation in which you know that the difference between two average scores is due to chance alone. This would be the case if, for example, you had a set of ratings from a single section of 100 students and you randomly divided them into two samples of 50 ratings each and compared the averages (or, for that matter, if you had a jar full of 100 balls with the numbers 0-4 on them and drew two random samples of 50 balls each from the jar). It's likely that the averages wouldn't be exactly the same - one set of 50 scores might average 3.52, for example, while the other averaged 3.73 -- but since they were randomly drawn from the same section, we know that the difference is clearly due to chance alone. The p value tells you the probability that a given difference in averages would be obtained in that situation, in which there is no "real" difference.
  • So let's say you have actual averages from two different sections you want to compare - say they're 3.52 and 3.73, for a difference of 0.21 -- and you find that when you perform a t test, the associated p value is .17. That means that given the averages (and standard deviations and Ns) of these two distributions of scores, a difference between them as large or larger than 0.21 would occur 17% of the time even if you were simply drawing two samples of scores randomly from a single section.
  • Knowing this p value can help you interpret the statistical significance of the difference in scores; by convention, a difference associated with a p value of less than .05 is considered statistically significant. In other words, a difference is considered statistically significant if it would occur less than 5% of the time under circumstances, such as described above, in which it is known that random chance is the only explanation for the difference. If the p value is greater than .05, it's considered not statistically significant. Again, this is a matter of convention, and the .05 dividing line is somewhat arbitrary. And to emphasize again the point made above, a p value does not tell you whether the difference between two section averages has any educational or evaluative significance.

If, given that caveat, you still want to do a statistical significance test, download the "t calculator." This is an Excel file that will calculate a t and then tell you whether the difference between section averages is statistically significant when you enter the mean, standard deviation, and N for each of the two sections.

Last revision 05/18/16

ODA Home  |   Institutional Research  |   Reporting & Analytics  |   Contact |  Legal & Trademarks |  Privacy
15 UCB, University of Colorado Boulder, Boulder, CO 80309-0015, (303)492-8631
  © Regents of the University of Colorado