Revised October 24, 1999. Several fairly major corrections were made at 11:35 A.M. on October 25, 1999. If you copied these notes before that time, you may wish to make a new copy. This lecture is based on 1998 Lecture 6.

Lecture 24, MCDB 2150, Fall 1999

Chi-Square Analysis

Textbook assignment: Chapter 12, Pages 369 - 371. This is the last assignment from chapter 12. End-of-chapter materials for the entire chapter should also be read. The detailed table of chi-square values refered to on page 369 is inside the back cover of the textbook, and not in appendix I, as stated.

Major concepts

Substitute symbols: Once again, we are encountering the limited ability of html to support mathematical symbols. The classical test for goodness of fit is the chi-square test, normally written as the Greek letter chi followed by the exponent 2. In these notes, chi is replaced with "X" and chi-squared is written as X2. In addition, we will continue the previously introduced practice of writing "SUM" to replace upper case sigma as the symbol for a summation .

During the spring semester when there is a concurrent MCDB 1151 laboratory, discussion of goodness of fit and the X2 test is limited entirely to the laboratory, where the X2 test is used to evaluate experimental data. Because there is no laboratory this semester, we will briefly discuss the X2 test and its applications without going into extended detail. Before examining the mathematical calculations involved in the chi-square test, we will first briefly examine what it does and does not do.

Goodness of fit: The chi-square test examines the extent to which observed data differ from expected values that are predicted from theory, such as a 3:1 phenotypic ratio. This relationship is sometimes referred to as "goodness of fit". The basic problem that must always be dealt with in statistical analysis is that data from any finite number of observations will be perturbed by random chance events. For example, tossing four heads in a row has a 1/16 probability of happening by pure chance, and does not prove that the same coin will not yield statistically equal numbers of heads and tails if tossed repeatedly.

Null hypothesis: The starting point for chi-square analysis is the null hypothesis, which proposes that any difference between observed and expected values is due entirely to random chance events. The observed data are considered to be consistent with the expectation only if the null hypothesis is not rejected by the chi-square test. Rejection of the null hypothesis does not prove that the expectation was in error. Instead, it only says that the observed data differ from the expected sufficiently so that the probability that the deviation could have arisen by pure chance is relatively small.

Criteria for rejecting the null hypothesis: The chi-square value, whose calculation will be described below, becomes larger as the probability that the observed deviation from the expected value could be due to random chance becomes smaller. After the calculations have been completed, they are converted to a probability (p value) that the observed deviation of the data from expected values could have arisen by random chance. The usual standard for rejecting the null hypothesis is a p value of less than 0.05, which means that there was less than a 5% chance (1 in 20) that the deviation of the observed data from the expectation was the result of random chance events, rather than being caused by an incorrect expectation. Thus, all that rejection of the null hypothesis actually demonstrates is that random chance events are unlikely to have caused the observed amount of deviation. In other words, a p value of 0.05 says that if the expectation is valid and the experiment were repeated over and over, 5% of the tests would yield a deviation this large or larger, whereas 95% of the tests would yield a smaller deviation

No proof of validity: The chi-square test neither proves nor disproves the validity of the expectation. Statistical analysis cannot do that. A large chi-square value, which corresponds to a very low probability that the observed difference between expected and observed occurred by chance, tells us that it is highly unlikely that the expectation was valid. However, statistically unlikely events, such as winning the lottery, do occur. Thus, a rare statistical event can on rare occasions either cause the null hypothesis for a valid expectation to be rejected or cause the null hypothesis for an incorrect expectation to fail to be rejected. However, in practice, a chi-square test in which the null hypothesis is not rejected is considered to be relatively strong evidence that the expectation was probably valid. In cases where the p value comes out close to the cutoff value of 0.05, it is usually desirable to repeat the test with a larger sample size, irrespective of whether the initial test rejects the null hypothesis or not.

Calculation of chi-square: The chi-square test begins with the difference between the observed number (O) and the expected number (E), based on the prediction being tested. Consider, for example, an F2 population of 1064 individuals that is being examined to determine whether the dominant phenotype has been expressed in the expected 3:1 ratio. In this example from Mendel's experiment with tall and dwarf peas, the expected number of dominant phenotypes is 798, whereas the observed number is only 787. The comparable values for recessive phenotypes are 266 expected and 277 observed.

Squaring the difference: The next step is to square the (O - E) value for each of the observations. This has two effects. It increases the weighting of large differences relative to that of small ones and it also makes all values positive, such that too few is no different from too many. For the dominant phenotype in our example, (O-E) = -11 and (O-E)2 = 121.

Dividing by expected value and summation: Each squared value is then divided by the expected value and the resulting fractions are added together for all observations, according to the following formula.

X2= SUM {(O-E)2 / E}.

Be sure to remember that the expected values will not be the same for the dominant and recessive phenotypes. For the current example,

X2 = 121/798 + 121/266 = 0.1516 + 0.4544 = 0.6065.

Degrees of freedom: Because chi-square is a summed value, it increases in magnitude as the number of independent values that are examined increases. Each independent value is referred to as a degree of freedom. Tables that relate chi-square values to the probability that the observed deviation from theoretical may be significant are organized in terms of the degrees of freedom of the chi-square values

Determining degrees of freedom: The number of degrees of freedom for a chi-square value is always one less than the number of observations. The reason is that the binomial expansion for n independent events yields n+1 terms, as we saw earlier, with the last term being the difference between all of the rest and 1.0, and thus not an independent observation. For example, if we look at the example presented above, the number of recessive phenotype individuals is simply the total number minus the number of dominant phenotype individuals. Therefore, these two observations provide only only one degree of freedom since the second one is only what is left of the total after subtracting the first. Thus, in evaluating the significance of chi-square values, it is always necessary to use the line in the table for a number of degrees of freedom that is one less than the total number of observations being evaluated.

Determining the p value: Mathematical calculation of p values from X2 values and degrees of freedom is sufficiently complex so that the p values are almost always obtained from a table or chart. Table 12.5 on page 369 of the textbook presents chi-square values corresponding to probabilities of 0.05 and 0.01 for 1 to 5 degrees of freedom. A far more detialed table can be found inside the back cover of the textbook. (Note that page 369 incorrectly states that this table is in Appendix 1, which does not exist, at least in the instructor's edition of the textbook).

Values that support the null hypothesis: If the chi-square value is less than that for a probability of 0.05, the probability that the observed deviation could have arisen by chance is greater than 5% and the null hypothesis is not rejected. In some cases a more stringent standard may be used such that the null hypothesis is not completely rejected unless there is less than a 1% probability that the results arose by random chance. . Note that for a given number of degrees of freedom, a larger chi-square value, which is indicative of greater deviation from expected values, corresponds to a smaller p value. In cases where the chi-square test indicates a p value between 0.01 and 0.05, it is often desirable to repeat the experiment with a larger population to obtain a less ambiguous result.

Rejecting the null hypothesis: Beyond certain values of chi-square, the probability that the deviation from expected values is caused by random chance events becomes quite small. Thus, for a chi-square value that yields a probability of less than 0.05, there is less than a 1 in 20 chance that a deviation this large could have arisen by pure chance. In other words, there is a greater than 95% chance that the expectation is invalid (or that there is some other problem with the data or its interpretation). This leads to rejection of the null hypothesis (that the observed deviation is due only to chance events). Although rejection of the null hypothesis does not PROVE that the expectation was wrong, it does demonstrate that the current data do not adequately support that expectation.

X2 values that lead to rejection of null hypothesis: For one degree of freedom, the X2 value that corresponds to p = 0.05 is 3.84. This is a number that you should memorize. In the example above, the X2 value of 0.6 indicates a probability of about 80% that the observed deviation was due entirely to chance. This value does not even come close to rejecting the null hypothesis. For 2 degrees of freedom, the X2 value corresponding to p = 0.05 is 5.99. For 3 degrees of freedom it is 7.82.

Does chi-square really tell us anything? On first examination, it might seem like the chi-square test does not provide much useful information. How much do we benefit from knowing that there is less than a 5% probability that the deviation from expectation that we observed could arisen by chance. To examine that question, we need to ask how likely is it that the observed data could actually represent chance deviation from an alternative expectation.

In the example presented above, we can ask what sort of p value would have been obtained for the same data if our expectation was a 1:1 ratio. In that case, E = 532, (O-E) = 255, and SUM{(O-E)2/E} = 244.45. A value of X2 that large is completely off scale for the chart and table in Fig 3.12 of last year's textbook, and shows that the probability is vanishingly small that the observed data could be a chance deviation from an expected 1:1 ratio.

How much deviation triggers rejection of the null hypothesis? A probability of 0.05 (5%) that the observed deviation was a random chance event corresponds to a X2 value of 3.84. In the example presented above, a deviation (O-E) of 27 corresponds to a X2 value of 3.65, still within the limit, whereas a deviation of 28 corresponds to 3.92, just outside the cutoff point for rejection of the null hypothesis. This means that for the current sample size, a phenotypic ratio of 759:305 (2.49:1) or 915:249 (3.67:1) would be rejected as not matching a 3:1 expectation, whereas anything between those values would be accepted as "close enough" to satisfy the chi-square test at p = 0.05.

"Highly significant" rejection of the null hypothesis: Boxed example 12.8 applies chi-square analysis to the example of linkage described in boxed example 12.4. The chi-square value that is obtained indicates that the probability that the results came from chance deviation is far less than 0.01. This supports the conclusion that the rejection of the null hypothesis is highly significant. In this particular example, the null hypothesis of independent assortment is not valid, as demonstrated by subsequent studies showing linkage of the two genetic loci. Be sure that you understand that a p value of 0.01 derived form a chi-square test is highly significant ONLY for rejecting a null hypothesis, and NOT for proving an alternative hypothesis.

The need for caution in interpreting chi-square results: Last year's textbook cautioned about reading too much into rejection of the null hypothesis (page 69). Many different aspects of the experimental design or the biology of the test organisms can cause perturbations in phenotypic ratios. One that was cited is reduced viability of certain types of homozygous recessive organisms. There can also be differences in fertilization rates. Sperm carrying certain alleles may not be as efficient. Eggs from mothers of certain genotypes may not develop as well. Even in random human births, a very careful and detailed study will reject the null hypothesis for a 1:1 sex ratio. For reasons that are still not entirely clear, slightly more boys than girls are born.

Sample size: Another very important consideration is sample size. When sample size is small, the E values used in the calculation of X2 may become quite small, particularly in cases where the expected numbers are a small fraction of the the total sample size, such as 1/16 double recessives in the F2 generation. Because (O-E)2 is divided by E for each observation prior to summation, a small E value can greatly magnify the impact of a modest amount of random chance deviation (O-E) on the final X2 value. Thus, in cases where the null hypothesis is rejected, it is important to examine the overall experimental design to be certain that total sample size and the numbers expected in each phenotypic class were large enough to provide reliable statistics. This is particularly important in cases where the results of a relatively small experiment look like they fit the expectation reasonably well, but fail to generate a sufficiently small X2 value. Often in such cases, analysis of a larger experiment will yield a more acceptable X2 value.