Institutional Research Reporting & Analytics About ODA

ODA Home > Institutional Research > FCQ > FCQ Related Studies > Testing two new response formats > Proposal

Possible Alternatives for a New Response Format on the Faculty Course Questionnaire (FCQ)

Perry Sailor
Office of Data Analytics
February 8, 1999

Currently, the FCQ has 5 response options on 11 of the 12 items (9 options on the workload item), plus a "Not applicable to this course" option on 8 of these 11. The response bubbles are labeled A, B, C, D, F. Instructions on the meaning of the labels are printed at the top of the form, but not on or near the bubbles themselves, except for the workload item. The instructions read that the scale is "very good = A B C D F = very poor…." For the workload item, there are 9 bubbles containing a number 1 through 9, with "too light" printed next to bubble 1, "OK" next to bubble 5, and "too heavy" next to bubble 9. This proposal does not refer to the workload item, only to the remaining 11.

There are concerns that the present A-F scale is too closely connected in students' minds to the standard classroom grading scale, and that this connection may in some way contaminate ratings, for example by artificially inflating the relationship between a discipline's grades and that discipline's ratings. Also, distributions of FCQ ratings tend to have a very pronounced skew; that is, ratings tend to "ceiling," or bunch up at the positive end of the scale, meaning that the ratings may not do a good job of discriminating among degrees of excellence.

We have been unable to find sufficient information on ratings distributions obtained and response alternatives used elsewhere to enable us to adopt a model from another institution. However, we have developed several possible alternatives for changes in the FCQ response format that may be effective in breaking the connection with the standard grade scale and/or helping with discrimination at the high end of the scale.

  1. Simply substitute 1 2 3 4 5 for A B C D F in the response bubbles and in the instructions; also add "Very good" and "Very poor" above the columns of response bubbles in the appropriate place above the columns of "1" and "5" bubbles, to help ensure that students know which is the good end and which is the bad end of the scale, since they will no longer have the A-F grade-scale connection to guide them. This should be beneficial in terms of breaking any possible connection in students' minds between grades and FCQ ratings, while being relatively easy to implement logistically. It may not do anything to help high-end discrimination, however, unless the distribution of scores shifts downward simply owing to the breaking of the grade connection - which it may.
  2. Change the scale to 1-5, as described above, and also write labels for each point of the scale rather than just the endpoints, e.g., 1=very good, 2=good, 3=fair, 4=poor, 5=very poor. A variant of this option would be to drop the numbers entirely and instead put the labels VG, G, F, P, and VP in the bubbles. With respect to either of these options, however, empirical research by Lam and Klockars (1982) has demonstrated that scales with only the endpoints labeled produce results similar to scales with equally spaced labels (Poor, Need Improvement, Satisfactory, Quite Good, Excellent), apparently because in the absence of intermediate labels, respondents mentally construct an equal-interval scale anyway. Consequently, if the aim is a balanced scale, labeling the intermediate points is probably unnecessary and may be unwise - respondents may construct a more truly equal-interval, balanced scale in their heads than we could construct through explicit labels.
  3. If more resolution at the positive end of the scale is desired, a variant and extension of (2) would be to skew the labels to get more spread on the good end, for example, 1=excellent, 2=very good, 3=good, 4=fair, 5=poor, or 1=excellent, 2=very good, 3=good, 4=poor, 5=very poor. (Or again, skip the numbers and simply put the initials of the verbal labels in the bubbles, e.g., E, VG, G, F, P.) The Lam and Klockars study found that the distribution of scores is "directly and predictably influenced" by the particular labels used for the intermediate responses. Making this change in the scale would possibly be met with suspicion by some students who would interpret it as an attempt to skew the ratings in a positive way; we would have to carefully explain the rationale, and explain that the effect on ratings will likely be to push the average down numerically so as to obtain greater discrimination among high scorers. Making this change would also have an impact on the ability to make historical comparisons with past years' data. That is true of any change, of course, but purposely packing the scale with response options from the high end of the underlying continuum would virtually guarantee a rather large change in the distribution; in fact, that's precisely the reason for doing it.
  4. A change that might give more discrimination among positive scores, without unbalancing the scale by packing the response options in a positive direction, would be to use a balanced scale with 7 options rather than 5. An example might be 1=truly exceptional 7=totally unacceptable, with the intermediate points unlabeled in light of Lam and Klockars' findings. Again, this would totally wipe out any historical comparisons to prior years' data; but it would possibly give more discrimination, without requiring the amount of explanation and "selling" to stakeholders that a clearly unbalanced set of response options would.
  5. Another, even more radical option is a combination of (3) and (4), that is, increasing the number of response options to seven and simultaneously packing the scale. An example would be: 1=truly exceptional, 2=excellent, 3=very good, 4=good, 5=poor, 6=very poor, 7=unacceptable (or again, labeling with the options' initials only, and not using numbers). This would very much increase the possibility of greater discrimination at the positive end; in fact, it might be overkill. Also, it bring a similar public relations problem as (3), in that it might be thought that we are trying to bias the results. It might be seen as more acceptable on that score than (3), however, because it would give more choices on both the positive and negative end than does the current system. All in all, this might be more change than is needed.
  6. Reword stems as statements (e.g., "Presentation of course material was good"), and use a true Likert scale, with five response options corresponding to Strongly Agree, Agree, Neutral, Disagree, Strongly Disagree. Bubbles would be labeled SA, A, N, D, and SD. This option would be very straightforward in terms of response options; the Strongly Agree to Strongly Disagree scale has a long and extensive history in attitude and opinion measurement going back 60 years or more, and is familiar and relatively free of ambiguity. However, the problems and need for careful consideration to wording don't disappear; they are merely shifted from the response options to the stems, all of which would need to be very carefully rewritten. Moreover, because the FCQ does not attempt to measure a unitary concept with multiple items, there would really be no way, even with a balance between negatively- and positively-worded items, to control for possible response set - the psychological tendency for some respondents to agree (or disagree) with all statements, regardless of their content. On balance, it is hard to see a real advantage for this option, while it has many possible disadvantages.
  7. Make no changes in the response scale. This obviously maintains everything as is, including the good (historical continuity) and the bad (little high-end discrimination, possible psychological connection with, and influence by, the grade scale).


In my judgment, it is best to make the smallest changes necessary to get the job done; this includes keeping a balanced scale, in the sense of not trying to pack the scale with positive response labels. With those goals in mind, the first choices for alternatives to the present form are options 1 (simply substituting 1-5 for A-F, and to ensure that students know which end of the scale is which, putting "very good" and "very poor" above the 1 and 5 columns) and 4 (the same as option 1, only using a 7-point numbered scale rather than a 5-point scale). I propose that we do an experiment in a few --say, six -- large sections (those with 100 or more students, or perhaps smaller in some disciplines). We would solicit volunteer instructors for the study from among tenured faculty.

We would randomly distribute forms using option 1, option 4, and option 7 (the present scale) to approximately 1/3 of the students in each section. This would enable us to learn if the smallest, simplest change, simply replacing letters with numbers, and breaking the prima facie connection with the grade scale, would have any impact on score distributions when compared to the present scale. It would also enable a comparison between a 5-point numbered scale and a 7-point numbered scale, to see if increasing the number of response options alone gives more spread in scores.

We think we can do this experiment fairly readily by using alternative forms printed with software recently obtained by ITS' scanning services. Results from the experimental forms would not be mixed with results from the usual forms in any public reporting.

    Lam, T. C. M, & Klockars, A. J. (1982). Anchor point effects on the equivalence of questionnaire items. Journal of Educational Measurement, 19 (4), 317-322.

ODA - L:\ir\fcq\newform\respopt1.doc

Last revision 05/19/16

ODA Home  |   Institutional Research  |   Reporting & Analytics  |   Contact |  Legal & Trademarks |  Privacy
15 UCB, University of Colorado Boulder, Boulder, CO 80309-0015, (303)492-8631
  © Regents of the University of Colorado