Text assignment: Chapter 12, Pages 364 - 369. These notes provide some additional mathematical details that the textbook assumes to be understood by its readers, particularly with regard to the binomial distribution.
Major concepts and terms
Substitute symbols:The notes for this and the next lecture require mathematical symbols that are not directly available in html (the language web pages are written with). The following replacements will be used:
Nomenclature used for probability: Please note that the symbols used for probability can be almost as varied as those used to identify specific genetic loci. Our textbook and these notes use upper case P to designate probability. However, many books, including the one used last year in this course, use lower case p for probability. You should be prepared to encounter either notation.
Probability: The probability (P) that an event will occur is the number of occurrences of that event divided by the number of opportunities for it to occur.
Note that probability can be either observed (based on past experience) or expected (predicted from a theory or hypothesis).
Sum rule: The sum rule is used for mutually exclusive events:The probability that an event or an alternative event will occur is the sum of their individual probabilities.
The sum of probabilities for a complete set of mutually exclusive events must always equal exactly 1.0.
Product rule: The product rule is used for independent events. The probability of either event occurring is independent of the other. Among the cases in which the first event has occurred, the probability that the second event will also occur is the same as the probability that the second event will occur within the entire population. Thus, the probability that both events will occur is the product of their two individual probabilities.
Combined probabilities: The sum rule is used for combined or joint probabilities:
In this case, the sum is not 1.0 because there is also another possibility that is not included:
Conditional probability: The probability of a particular event occurring within a limited subset of possible alternatives is referred to as conditional probability, which is indicated by a vertical line ( | ) that is read as "given" or "among".
Conditional probability is the probability of the event divided by the probability of the conditional subset of events:
For F2 hybrids, the probability that a round pea is heterozygous is
A similar approach is used to calculate the probability that a healthy person who has a sibling with a recessive genetic disease is a heterozygous "carrier" of that disease:
Binomial probability: In many cases, we are interested only in the probability of obtaining a particular final result, such as two heads in three tosses of a coin, and not in the order of the individual events that yield the final result. Thus, for unordered events, we only need to know the probability that a particular event will occur a specific number of times (x) in a total number of trials (n).
Direct examination of possible combinations: Calculations of probability for specific combinations of mutually exclusive events (heads or tails when tossing coins, boys or girls in a family, dominant or recessive phenotypes in the F2 generation) are usually done with the binomial expansion (described below). However, before introducing the formal mathematical calculation, we will first analyze a relatively simple case in which it is feasible to list all of the possibilities and then count those that fall into the group of interest. Thus, to determine the probability of obtaining heads twice in three tosses of a coin, we can write out all eight possibilities
and then identify the three (HHT, HTH, THH) with two heads and one tails, thus obtaining a 3/8 probability of two heads in three tosses. Although direct analysis worked well in this simple case, it would have been far more cumbersome for 8 heads in 12 tosses.
Binomial probability has two components: Whether done manually or with the binomial explansion, calculations of binomial probability involve two separate elements. The first is the probability that the desired result will be obtained as an ordered event (in a particular sequence, such as HHT). In the example above, that probability is 1/2 x 1/2 x 1/2 = 1/8. The second is the number of different combinations of ordered events that can achieve the desired result. This value is referred to in our text as the "binomial coefficient" (it is sometimes called the numerical coefficient or the C value in other books). In the example above, there are three different ways to achieve the desired result of two heads and one tail, (HHT, HTH, and THH). Thus, the binomial coefficient is 3.
Combining the parts: Multiplication of the ordered probability (1/8) by the number of combinations that yield the desired result (binomial coefficient = 3) yields an unordered probability of 3/8 for two heads in three tries. (Note that this multiplication is equivalent to adding together the unordered probabilities for all of the independent events that yield the desired result (HHT, HTH, THH) to obtain their combined probability by the sum rule). Both of these values, the unordered probability and the number of combinations, can be obtained from a binomial expansion, as described below.
Binomial nomenclature: Most algebra books use the letters "a" and "b" to describe the binomial expansion, (a + b)n. However, in the mathematical calculation of probabilities, lower case "p" is often used to describe the expected frequency of occurrence of an event and "q" to define the alternative probability (q = 1 - p). Thus, binomial expansions are often expressed in terms of (p + q) raised to the power n, as is done in our current textbook. The notes that follow use "p" and "q", but a number of the formulas that are derived are also expressed in terms of "a" and "b" so you can relate them to basic algebra more readily (and also because some textbooks, including the one used last year, employ that notation). You should be prepared to work with formulas presented in either notation, as you are likely to encounter both in more advanced studies and in real-life situations.
Binomial expansion: The key feature of a binomial expansion is that it always deals with exactly two mutually exclusive events whose combined probabilities must equal exactly 1.0. Thus, if p is the probability of the event under study, and q is the probability of its only possible alternative, there is a strict understanding that (p + q) = 1.0 and q = (1.0 - p).
We will begin with coins. For each coin, P(H) = p and P(T) = q. For an unbiased coin p = q = 1/2, but we will continue to use p and q in order to be able to deal with unequal probabilities, such as F2 phenotypic frequencies (p = 3/4; q = 1/4), at a later time. If we flip only one coin, the result can be only H or T. Because H and T are mutually exclusive and the only possibilities, the sum rule tells us that
If we flip two coins, the possible combinations are HH, HT, TH, TT. Since the two coins behave independently, the product rule tells us that the probabilities of these four combinations are pp, pq, qp, qq, respectively. Since these are mutually exclusive possibilities, their sum must be exactly 1.0. Thus, .
This is mathematically equivalent to taking the square of both sides of the original equation
If we are only interested in the unordered probability distribution of the total number of heads (s) relative to the total number of tosses (n), and not in the order of the events, we can rearrange the probability expression into the more familiar binomial form
If we add another coin, we multiply the previous expression by (p + q) for a total of 8 terms, ppp, ppq, pqp, pqq, qpp, qpq, qqp, qqq. In the binomial form, this simplifies to
which is equivalent to (p + q)3. Each term in this expansion is the product of a probability and a C value (binomial coefficient). The probability of 3 heads is p3 and there is only one way to obtain 3 heads. The probability of two heads and one tails in any particular order is p2q and there are three different combinations that will yield two heads and one tails.
This binomial expansion can be continued indefinitely to any desired power of (p + q). For any power n (which represents the number of independently assorting events), the expansion contains n+1 terms. If we use x to designate the number of times a particular event (such as heads) occurs in each of the terms, the range of x will be from n to 0. In mathematical notation, n >= x >= 0.
If one of the mutually exclusive events occurs x times in n tries, the alternative event must occur n-x times. In order to keep the equations arising out of the binomial expansion as simple as possible, the symbol y is frequently used to replace n-x as the number of occurrences of the alternative event. (Please note that in many textbooks, s and t are used instead of x and y to denote the number of occurrences of the event of interest and its alternative, respectively).
As described above, each term of the binomial expansion consists of an ordered probability (pxqy) multiplied by a binomial coefficient (C) reflecting the number of different combinations that will generate x events in n tries. Without going into the mathematics of it, the binomial coefficient C can be calculated by using the following formula, which can be verified easily by testing it for expansions small enough to determine C manually or by use of Pascal's triangle as described on page 367 of the textbook:
Be sure to remember the mathematical rule that 0! = 1 when applying this formula.
Thus, the unordered probability that an event with a probability of p will occur x times in n tries can be expressed as follows:
This is a key formula that you need to committ to memory. Once you have it memorized, you can generally modify it rather easily as needed to deal with most of the probability calculations that you will encounter in this class. Be sure to remember that y = (n - x) and n! means n factorial.
For cases where p = q = 1/2, such as heads and tails or a first approximation of boys and girls in a family, the ordered probability expression pxqy simplifies to
This in turn simplifies the unordered probability that an event with a probability of 1/2 per trial will occur x times in n trials to:
Forked line method: The forked line approach to the calculation of probability has already been illustrated in the notes for previous lectures. It is simply a binomial expansion laid out diagramatically from left to right. Each branch is equivalent to multiplying by another binomial. The example in the lecture 22 notes used phenotypes with a 3/4 + 1/4 probability, rather than coin tosses with 1/2 + 1/2 probability.
Mixed probabilities: It is also possible to mix probabilities, as in the cross of AaBbCc x AabbCc, where the phenotypic ratios for genes A and C are 3/4 to 1/4 and the ratio for gene B is 1/2 to 1/2. In such a case, the appropriate multiplications are done at each fork. Thus, a phenotype of A dominant, B dominant, C recessive would have a probability of 3/4 x 1/2 x 1/4 = 3/32.
For genetic combinations of moderate complexity, the forked line approach is often the simplest way to arrive at a result. However, an additional fork is needed for each independently assorting event. Thus, for complex calculations, such as probability of 6 girls in a family of ten children, it becomes more desirable to use the binomial formula.
P(6|10) = 10x9x8x7x6x5x4x3x2x1 / 1024x6x5x4x3x2x1x4x3x2x1
The forked line approach can also be used for multinomial expansions, such as genotypic frequencies, where each branch point involves 3 alternatives whose overall probability equals 1.0 (AA = 1/4, Aa = 1/2, aa = 1/4). In the cross used in the example given above, the probability of being homozygous for all three recessive alleles would be 1/4 x 1/2 x 1/4 = 1/32.