|
PBA Home >
Institutional Research & Analysis >
FCQ >
FCQ Related Studies >
Testing two new response formats >
Proposal
Possible Alternatives for a New Response Format on the Faculty Course Questionnaire (FCQ)
Perry Sailor
Office of Planning, Budget, and Analysis
February 8, 1999
Currently, the FCQ has 5 response options on 11 of the 12 items (9 options on the workload item),
plus a "Not applicable to this course" option on 8 of these 11. The response bubbles are labeled
A, B, C, D, F. Instructions on the meaning of the labels are printed at the top of the form, but
not on or near the bubbles themselves, except for the workload item. The instructions read that
the scale is "very good = A B C D F = very poor…." For the workload item, there are 9 bubbles
containing a number 1 through 9, with "too light" printed next to bubble 1, "OK" next to bubble 5,
and "too heavy" next to bubble 9. This proposal does not refer to the workload item, only to the
remaining 11.
There are concerns that the present A-F scale is too closely connected in students' minds to the
standard classroom grading scale, and that this connection may in some way contaminate ratings,
for example by artificially inflating the relationship between a discipline's grades and that
discipline's ratings. Also, distributions of FCQ ratings tend to have a very pronounced skew;
that is, ratings tend to "ceiling," or bunch up at the positive end of the scale, meaning that
the ratings may not do a good job of discriminating among degrees of excellence.
We have been unable to find sufficient information on ratings distributions obtained and response
alternatives used elsewhere to enable us to adopt a model from another institution. However, we
have developed several possible alternatives for changes in the FCQ response format that may be
effective in breaking the connection with the standard grade scale and/or helping with
discrimination at the high end of the scale.
- Simply substitute 1 2 3 4 5 for A B C D F in the response bubbles and in the instructions; also
add "Very good" and "Very poor" above the columns of response bubbles in the appropriate place
above the columns of "1" and "5" bubbles, to help ensure that students know which is the good
end and which is the bad end of the scale, since they will no longer have the A-F grade-scale
connection to guide them. This should be beneficial in terms of breaking any possible
connection in students' minds between grades and FCQ ratings, while being relatively easy to
implement logistically. It may not do anything to help high-end discrimination, however,
unless the distribution of scores shifts downward simply owing to the breaking of the grade
connection - which it may.
- Change the scale to 1-5, as described above, and also write labels for each point of the scale
rather than just the endpoints, e.g., 1=very good, 2=good, 3=fair, 4=poor, 5=very poor. A
variant of this option would be to drop the numbers entirely and instead put the labels VG, G,
F, P, and VP in the bubbles. With respect to either of these options, however, empirical
research by Lam and Klockars (1982) has demonstrated that scales with only the endpoints
labeled produce results similar to scales with equally spaced labels (Poor, Need Improvement,
Satisfactory, Quite Good, Excellent), apparently because in the absence of intermediate labels,
respondents mentally construct an equal-interval scale anyway. Consequently, if the aim is a
balanced scale, labeling the intermediate points is probably unnecessary and may be unwise -
respondents may construct a more truly equal-interval, balanced scale in their heads than we
could construct through explicit labels.
- If more resolution at the positive end of the scale is desired, a variant and extension of (2)
would be to skew the labels to get more spread on the good end, for example, 1=excellent,
2=very good, 3=good, 4=fair, 5=poor, or 1=excellent, 2=very good, 3=good, 4=poor, 5=very
poor. (Or again, skip the numbers and simply put the initials of the verbal labels in the
bubbles, e.g., E, VG, G, F, P.) The Lam and Klockars study found that the distribution of
scores is "directly and predictably influenced" by the particular labels used for the
intermediate responses. Making this change in the scale would possibly be met with suspicion
by some students who would interpret it as an attempt to skew the ratings in a positive way;
we would have to carefully explain the rationale, and explain that the effect on ratings will
likely be to push the average down numerically so as to obtain greater discrimination among
high scorers. Making this change would also have an impact on the ability to make historical
comparisons with past years' data. That is true of any change, of course, but purposely
packing the scale with response options from the high end of the underlying continuum would
virtually guarantee a rather large change in the distribution; in fact, that's precisely the
reason for doing it.
- A change that might give more discrimination among positive scores, without unbalancing the
scale by packing the response options in a positive direction, would be to use a balanced scale
with 7 options rather than 5. An example might be 1=truly exceptional 7=totally unacceptable,
with the intermediate points unlabeled in light of Lam and Klockars' findings. Again, this
would totally wipe out any historical comparisons to prior years' data; but it would possibly
give more discrimination, without requiring the amount of explanation and "selling" to
stakeholders that a clearly unbalanced set of response options would.
- Another, even more radical option is a combination of (3) and (4), that is, increasing the
number of response options to seven and simultaneously packing the scale. An example would
be: 1=truly exceptional, 2=excellent, 3=very good, 4=good, 5=poor, 6=very poor, 7=unacceptable
(or again, labeling with the options' initials only, and not using numbers). This would very
much increase the possibility of greater discrimination at the positive end; in fact, it might
be overkill. Also, it bring a similar public relations problem as (3), in that it might be
thought that we are trying to bias the results. It might be seen as more acceptable on that
score than (3), however, because it would give more choices on both the positive and negative
end than does the current system. All in all, this might be more change than is needed.
- Reword stems as statements (e.g., "Presentation of course material was good"), and use a true
Likert scale, with five response options corresponding to Strongly Agree, Agree, Neutral,
Disagree, Strongly Disagree. Bubbles would be labeled SA, A, N, D, and SD. This option would
be very straightforward in terms of response options; the Strongly Agree to Strongly Disagree
scale has a long and extensive history in attitude and opinion measurement going back 60 years
or more, and is familiar and relatively free of ambiguity. However, the problems and need for
careful consideration to wording don't disappear; they are merely shifted from the response
options to the stems, all of which would need to be very carefully rewritten. Moreover,
because the FCQ does not attempt to measure a unitary concept with multiple items, there would
really be no way, even with a balance between negatively- and positively-worded items, to
control for possible response set - the psychological tendency for some respondents to agree
(or disagree) with all statements, regardless of their content. On balance, it is hard to see
a real advantage for this option, while it has many possible disadvantages.
- Make no changes in the response scale. This obviously maintains everything as is, including
the good (historical continuity) and the bad (little high-end discrimination, possible
psychological connection with, and influence by, the grade scale).
Recommendation:
In my judgment, it is best to make the smallest changes necessary to get the job done; this
includes keeping a balanced scale, in the sense of not trying to pack the scale with positive
response labels. With those goals in mind, the first choices for alternatives to the present
form are options 1 (simply substituting 1-5 for A-F, and to ensure that students know which
end of the scale is which, putting "very good" and "very poor" above the 1 and 5 columns) and
4 (the same as option 1, only using a 7-point numbered scale rather than a 5-point scale). I
propose that we do an experiment in a few --say, six -- large sections (those with 100 or more
students, or perhaps smaller in some disciplines). We would solicit volunteer instructors for
the study from among tenured faculty. We would randomly distribute forms using option 1, option 4, and option 7 (the present scale) to
approximately 1/3 of the students in each section. This would enable us to learn if the smallest,
simplest change, simply replacing letters with numbers, and breaking the prima facie connection
with the grade scale, would have any impact on score distributions when compared to the present
scale. It would also enable a comparison between a 5-point numbered scale and a 7-point numbered
scale, to see if increasing the number of response options alone gives more spread in scores.
We think we can do this experiment fairly readily by using alternative forms printed with software
recently obtained by ITS' scanning services. Results from the experimental forms would not be mixed
with results from the usual forms in any public reporting.
Reference:
Lam, T. C. M, & Klockars, A. J. (1982). Anchor point effects on the
equivalence of questionnaire items. Journal of Educational Measurement, 19 (4), 317-322.
PBA - L:\ir\fcq\newform\respopt1.doc
|