Testing Two New Response Formats on the Faculty Course Questionnaire (FCQ)
Currently, the FCQ has 5 response options on 11 of the 12 items (9 options on the workload item), plus a "Not applicable to this course" option on 8 of these 11. The response bubbles are labeled A, B, C, D, F. Instructions on the meaning of the labels are printed at the top of the form, but not on or near the bubbles themselves, except for the workload item. The instructions read that the scale is " very good = A B C D F = very poor…." For the workload item, there are 9 bubbles containing a number 1 through 9, with "too light" printed next to bubble 1, "OK" next to bubble 5, and " too heavy" next to bubble 9. The experiment with alternate response options described in the present report did not include the workload item, only the other 11.
The experiment was done to address two concerns. One is that the present A-F scale is too closely connected in students’ minds to the standard classroom grading scale, and that this connection may in some way contaminate ratings, for example by artificially inflating the relationship between grades and ratings across a class, a department, or an entire discipline. The second concern addressed by this experiment is that distributions of FCQ ratings tend to have a very pronounced skew; that is, ratings tend to "ceiling," or bunch up at the positive end of the scale, meaning that the ratings may not do a good job of discriminating among degrees of excellence.
We were unable to find sufficient information on ratings distributions obtained and response alternatives used elsewhere to enable us to adopt a model from another institution. Consequently, we developed several possible alternatives for changes in the FCQ response format that may be effective in breaking the connection with the standard grade scale and/or helping with discrimination at the high end of the scale. These alternatives involved various combinations of numbers of response options and labels for both end points and intermediate points of the scales. After considering the pros and cons of the various alternatives, we chose two for experimental study. (The proposal for this study gives more detail about the options and the reasons we chose the two we did.)
We decided that the soundest design would be to administer all three forms (the current one and the two new ones) in each of several different classes, with distribution being random within each class. We thought large classes would be best, so that we would have sufficient numbers both overall and within each class to make statistical conclusions viable. And because FCQ results carry weight in tenure and promotion decisions, we decided to solicit volunteers for the experiment from among full professors.
We identified 46 classes being taught in Fall semester 1999 by full professors and having enrollments of 150 or more. Michael Grant, Associate Vice Chancellor for Undergraduate Education, contacted the professors via email, explained the nature of the study, and asked for volunteers. From among the volunteers, PBA selected 11 classes for participation, striving for broad representation on such dimensions as size (enrollments ranged from 150 to more than 400), level (1000, 2000, 3000), subject area (humanities, social, biological, and physical sciences, engineering, etc.), time of day, days of week, professor’s gender, and professor’s recent FCQ ratings. The author then contacted the professors of the 11 classes selected, to answer any remaining questions and to arrange administration times.
All administration was done by the author, with help from another PBA staff member in the largest class. In a couple of other very large classes, TA’s helped distribute the forms, but they left the room immediately thereafter. The nature of the study was fully disclosed to students; in addition to the regular FCQ instructions, students were read the following:
Some of you may notice that the "A" side of the FCQ form looks a little different than the one you’ve seen before in other classes. That’s because we’re trying out some different forms to see if they work better. We asked a few instructors to volunteer to help us try out the new forms, and this is one of the classes we picked from among the volunteers.
We’re going to pass out forms that have been randomly mixed up. About 1/3 of you will receive the same forms that have been used for the last several years. These have response bubbles that are marked A, B, C, D, F, or NA. Another third have the same number of bubbles, but they’re marked 5, 4, 3, 2, 1 and NA. And another third have forms with two additional response bubbles, marked 7, 6, 5, 4, 3, 2, 1 and NA. None of the questions are any different - just the response bubbles. And as always, they’re completely anonymous.
(For two classes with optional questions on the B side only:)
So, just fill out the form as you would normally. Any questions?
In the results reported below, the form that is currently in use, which uses a 5-point, A-F lettered scale, is labeled "L5." The new form with the five-point numbered scale is labeled "N5," and the form using the 7-point numbered scale is labeled "N7." In this report, results will focus on Item 11 ("This course, compared to all your other university courses") and Item 12 ("This instructor, compared to all your other university instructors").
The unit of analysis. Although responses are not entirely independent of classes, the fact that forms were randomized within each class means that in comparing across forms class effects are cancelled out (i.e., any effects will be equal across forms). Therefore, the individual student is the unit of analysis in all results reported below.
Differences in means and distributions. The mean for scale L5 was higher than for N5 on both items 11 and 12. The difference was 0.08 standard deviation units on Item 11, 0.13 s.d. units on Item 12. Neither difference was statistically significant at .05. This finding was quite consistent across classes; for both items, in 9 of the 11 classes the form using letters A-F in the response bubbles had a higher mean than the numbered 5-point form, and in 8 of the 11 the students using the L5 form gave more "A" ratings than students using the N5 form gave "5" ratings, even though the verbal labels were the same for each.
As expected, the standard deviation of the 7-point form was higher than either of the 5-point forms (comparing means on the 7-point vs. either of the 5-point forms is meaningless), and an even lower percentage of students assigned the highest rating. There was little or no difference between the three forms at the low end of the scale.
The scales were also compared by splitting responses into three categories: below the midpoint, at the midpoint, and above the midpoint. There was a slight tendency on both items for the 7-point scale to have a higher percentage of ratings both above and below the midpoint, while having fewer at the midpoint. Of course, the midpoint is one of seven points on the 7-point scale, but one of only five on the 5-point scale, so probability alone would have predicted this. If the "at midpoint" responses are ignored, differences in percentages of responses above and below midpoint are small and not statistically significant.
Figures 1 and 2 below show bar chart frequency distributions for the three forms on Items 11 and 12, respectively. Although the N7 distribution might appear to more closely approximate the classic normal (i.e., Gaussian) distribution than do the L5 or N5 distributions, objective measures of normality do not support appearances. Neither the measured skewness and kurtosis of the three distributions, nor the distributions’ respective values of the Shapiro-Wilk statistic, W, a test statistic for normality, are appreciably different from each other.
Alternate scales for the FCQ were devised and tested in order to address two perceived problems with the present scale: (1) it uses the same A-F scale as is used in classroom grading of students, thus enabling a possibly spurious correlation between grades earned and ratings given; (2) it has only five response options, a fact which combines with typically skewed distributions to result in low discrimination between instructors at the high end of the scale. The first issue was addressed in the present experiment by comparing results on the L5 scale to those on the N5 scale; the second by comparing the N5 scale to the N7 scale.
The FCQ ratings in this experiment, like all FCQ ratings at CU, were anonymous, so it is impossible to measure the correlation of grades with ratings on any of the scales. All we can conclude from comparing results from the current form (L5) to the 5-point numerical form (N5) is that (1) the prima facie rating/grade relationship is broken, by definition, and (2) the distribution of ratings on the two forms was slightly different, mostly due to fewer "5" ratings than (nominally equivalent) "A" ratings. In other words, changing the response options from letters to numbers may have made a difference, although a very small one, and the difference may be due to breaking the grade-rating connection.
As for the variability issue, addressed by comparing results on the 5-point- and 7-point numerical scales, it is clear that responses on the 7-point scale, as expected, show more variation. There were the same percentage of above-midpoint and below-midpoint ratings on the N5 and N7 scales, but each were spread among a higher number of discrete points.
In terms of practical use of FCQ results by students, faculty, and administrators, the degree of difference the present study found among the three forms probably does not warrant a change from the present L5 form at this time, especially in light of the fact that such a change would also exact a cost: it would break continuity of FCQ data, thus destroying the ability to make longitudinal comparisons between terms in which the old form was used and terms in which the new form was used. However, if and when it is ever decided to change wording of FCQ items – thus breaking continuity anyway – we recommend a concurrent change to the 7-item numerical scale.
Acknowledgments: Thanks to Joan Puryear of Information Services for writing programs to optically scan the new forms, and for doing the scanning itself; to Gary Pfeifer of PBA for technical assistance in handling the data; to Meg Rowland, PBA’s FCQ Coordinator, for helping in many ways throughout the project, from conception to completion; and to Michael Grant, Associate Vice Chancellor for Undergraduate Education, for recruiting study volunteers from among professors and for overall support of the project.
PBA - l:\ir\fcq\newform\
Last revision 10/31/03
PBA Home | Strategic Planning |  Institutional Research & Analysis |
Budget & Finances | Questions? Comments?
15 UCB, University of Colorado Boulder, Boulder, CO 80309-0015, (303)492-8631
© 2001, The Regents of the University of Colorado