Textbook Assignment: Chapter 19, Pages 581 - 591 (to start of section 19.6).
Major concepts
Overview of population genetics: Population genetics is concerned with allelic frequencies within a population and the selective forces that cause changes in those frequencies. It is a quantitative science that must ultimately be analyzed in terms of mathematical formulations. However, it is important not to let the mathematics obscure the relative simplicity of the basic concepts that are involved.
Basic concepts: Population genetics deals with three basic concepts, 1) the quantitative distribution of alleles within a population, 2) the quantitative distribution of genotypes within a population, and 3) the quantitative distribution of phenotypes within a population. In each case, the total population = 1.0, with its constituent parts expressed as decimal fractions whose sum is 1.0.
Hardy-Weinberg Equilibrium: The basic calculations that are used as the foundation of population genetics begin with two alleles at a single genetic locus that are assumed to have frequencies p and q, whose sum is 1.0.
Genotypic frequencies: When random mating occurs within a population containing those alleles, the probabilities of obtaining homozygous offspring are p2 and q2, respectively, and the probability of heterozygotes is 2pq, since either allele can come from either parent. The distribution of genotypic frequencies in the progeny is simply the product of the allelic frequencies for the parental generation.
Binomial relationships: Thus, the relationship between
allelic frequency and genotypic frequency is the relationship
between a simple binomial and that binomial squared. This relationship
can be depicted visually as a unit square with each of the sides
subdivided linearly to depict the values of p and q. The homozygous
genotypic frequencies then become the smaller squares depicting
p x p and q x q, with the two rectangles p x q together representing
the heterozygous population. This relationship is shown in figure 19.1.
A simplified representation of the relationship in Figure 19.1, in
which p = 0.7 and q = 0.3, is presented below (pp = homozygous
p, qq = homozygous q, and pq = heterozygoous).
If the 100 letter pairs in this matrix reflect the entire population = 1.0, it is clear that 0.7x0.7 yields a value of 0.49 for pp, and 0.3x0.3 yields a value of 0.09 for qq. The heterozygous population, pq, is represented by two separate groups of 0.7x0.3 = 0.21, for a total heterozygous population of 0.42. This verifies the relationship for genotypes derived from two alternative alleles
Predictions: The Hardy-Weinberg equilibrium predicts that in the absence of distorting forces (discussed later in this lecture and continued into the next lecture), allelic and genotypic frequencies will both remain constant in a closed population. In addition, if that equilibrium is perturbed, a new equilibrium will be reached within one generation based on the allelic frequencies of the remaining population.
Calculation of allelic frequencies: Observed frequencies of specific genotypes often serve as the starting point for calculations of allelic frequencies, p and q. For these calculations, we will use f(pp) as the frequency of homozygous p individuals, f(qq) as the frequency of homozygous q, and either f(pq) or H as the frequency of heterozygotes. Since each homozygous p individual carries two p alleles and each heterozygote carries one p allele,
Similarly,
Note that some textbooks (but not ours) use P and Q to designate the frequencies of the two classes of homozygotes. In such notation, p = 2P + H, and q = 2Q + H. Also, please note that in our current textbook, these calculations were done in boxed example 19.1, without a separate description of the mathematical equations that are involved.
Genetic Variation: Most normal populations contain substantial genetic variation, much of which appears to be neutral in its effects on fitness under current environmental conditions. An example is the MN blood type distribution (boxed example 19.1). The allelelic frequencies of the M and N alleles can vary widely from one population to another. In the boxed example, a population of 1,747 individuals (whose diploid genomes contain a total of 3494 alleles) was found to have the following genotypic distribution: MM, 1,409; NN, 28; MN, 310. Using the nomenclature introduced above, allelic frequencies can be calculated as follows:
Calculation of allelic and carrier frequencies for rare genetic diseases: For recessive diseases, it is usually necessary to begin with the frequency of afflicted homozygous individuals, f(qq). The square root of f(qq) is the allelic frequency q. The "normal" allele has a frequency p = (1-q). Heterozygous carriers can be calculated as H = 2pq = 2q(1-q), using the square root of f(qq) as the value for q. These calculations yield carrier frequencies (H) that are far larger than most non-geneticists would expect.
Example: As an example, phenylketonuria (PKU, page 366) occurs with an incidence of about 1/11,000 (= 0.0000909). The allelic frequency, q is thus about 0.0095, which means that p is about 0.9905 The calculated frequency of heterozygous carriers, 2pq is therefore 0.0188 or almost 2% of the random human population.
Hardy-Weinberg assumptions: The conditions that must be met for the predictions of the Hardy-Weinberg equilibrium to be valid are described below:
Effects of multiple alleles; Hardy-Weinberg calculations can be done with three or more alleles. However, the mathematical manipulations rapidly become more complex as binomials are replaced with trinomials or higher orders. In addition, it is necessary to have a precise understanding of the dominance (and codominance) relationships among the alleles when three or more alternatives are possible. Boxed example 19.2 in the textbook examines such relationships for the ABO blood group locus in which types A and B are codominant and both are dominant over type O. You are encouraged to read this material as an example of how to do such calculations. However, you will not be required to be able to do calculations involving more than two alleles at any given locus.
Sex linkage: Unlike autosomal alleles, alleles carried on the homogametic sex chromosome (X or Z) do not reach equilibrium within one generation after a perturbation (figure 19.2). The total number of alleles in the population is immediately in equilibrium, but several generations are required fo achieve equilibrium between males and females because of the manner in which the alleles are passed back and forth between the two sexes, with the male pattern always reflecting the pattern in the previous generation of females in XX/XY systems.
Effects of mutation: Mutation rates are typically low enough so that they do not significantly affect allelic ratios in a large population on a relatively short-term basis. In a small population, a single mutation can be significant, particularly if it results in positive selection. In addition, negative selection against deleterious mutations normally keeps them at a relatively low equilibrium level, as will be discussed in the next lecture. Also, over longer periods of times imbalances between forward and backward mutation rates or between mutation rate and selection can lead to significant alterations of allelic frequencies, even in large populations.
Effects of migration: Migration in or out of a population only has a significant effect when the population entering or leaving has an allelic distribution that is different from that of the population under study. As an example, consider a population of Drosophila with equal numbers of wingless and wild type alleles. The wingless phenotype (qq) would constitute 25% of the total population. If the entire population were kept under conditions where some of the flies could escape by flying out of their container, the frequency of the wingless allele in the population would increase over a period of time because homozygous wingless flies could not escape, whereas flies with at least one wild-type allele could escape.
Effects of non-random mating: The Hardy-Weinberg equilibrium assumes a completely random pattern of pairing of all of the alleles in the population during each round of reproduction. Any lack of randomness in mating patterns can alter the outcome. There are two basic patterns that can disrupt the equilibrium, assortative mating, in which mating is more likely to occur between genetically similar individuals than among dissimilar individuals, and disassortative mating in which mating is more likely to be with a genetically dissimilar individual than with a genetically similar individual. A number of examples of each pattern can be visualized easily. Assortative mating occurs whenever there is any degree of inbreeding, including self-fertilization, or in human terms, a tendancy of individuals to marry others from the same race, rather than from other races. Disassortative mating occurs in many plants that have barriers to self-fertilization, and also in human populations due to societal prohibitions against marrying close relatives.
Loss of heterozygosity due to inbreeding: Inbreeding is defined as the occurrence of matings between individuals who share a common ancestry. As illustrated earlier in the semester (figure 12.14), individuals who share a common ancestor have an increased chance of being homozygous for an allele carried by that ancestor. This effect becomes particularly evident in the increased probability of recessive disease resulting from inbreeding (figure 19.6). Inbreeding always increases homozygosity and reduces heterozygosity. This is most clearly evident in species that reproduce primarily by self-fertilization, including many plants, such as the peas used in Mendel's original studies, and the small nematode, C. elegans, which is widely used in studies on developmental genetics, including research being conducted in this department.
Self fertilization as an extreme case of inbreeding: In self fertilization, individuals that are homozygous will produce only homozygous progeny (with the exception of rare mutations). Individuals that are heterozygous will produce progeny that are 50% heterozygous and 50% homozygous. Thus, in each successive generation, the fraction of heterozygous individuals is reduced by half, with corresponding increases in both classes of homozygotes (figure 19.3). The net result is that after a few generations, heterozygotes are rare and virtually all of the population is homozygous for one or the other of the two alleles from the original heterozygous individual. As an aside, note that this property makes it possible to isolate homozygous recessive mutations in C. elegans or in self-fertilizing plants simply by allowing them to reproduce by self-fertilization for two or more generations after mutagenesis.
Inbreeding does not change allelic frequency: In the example in figure 19.3, the F1 generation was obtained by crossing true-breeding AA and aa individuals. Thus p = q = 0.5. The F1 population was not in a Hardy-Weinberg equilibrium, in that it was all heterozygous, rather than in a 1:2:1 ratio of AA:Aa:aa. After one generation of self-fertilization (which in this case was equivalent to random mating since only one genotype was present), the F2 population was temporarily in Hardy-Weinberg equilibrium. However, from there onward, self fertilization caused the two types of alleles to become increasingly segregated into homozygous populations, with the remaining fraction of heterozygotes reduced by half in each successive round of self-fertilization. The allelic frequency remained unchanged in this closed population, but it moved steadily further and further away from Hardy-Weinberg equilibrium. In time, the remaining fraction of heterozygous individuals would asymptotically approach zero.
Outbreeding: The opposite of inbreeding is outbreeding (disassortative mating), in which mating between close relatives does not occur. Outbreeding tends to reduce the frequency of homozygosity and increase the frequency of heterozygosity to a greater degree than predicted on the basis of truly random mating. Because close inbreeding is culturally forbidden and outbreeding is the societal norm for human reproduction, human populations do not conform exactly to Hardy-Weinberg predictions. However, the total pool of potential mates is so large that exclusion of close-relatives as possible mates can generally ignored except when specifically analyzing the effects of inbreeding at the individual level or in small isolated populations. Another example cited in the textbook occurs in plant species that have a block against self-pollination (which may extend to other individuals of the same strain). This also results in a level of heterozygosity that is greater than predicted from the Hardy-Weinberg equilibrium.
Selective mating: Positive assortative mating, in which individuals with similar phenotypes mate, also tends to increase homozygosity, whereas negative assortative mating, in which individuals with different phenotypes mate (opposites attract), increases heterozygosity. Examples of positive assortative mating include maintenance of "pure-bred" lines of pets and other domestic animals, and a tendancy for many human populations to select mates primarily from within their own ethnic or racial groups.
Inbreeding coefficient: An inbreeding coefficient (F) is used for quantitative calculation of the effects of inbreeding. F reflects the extent to which heterozygosity is reduced relative to the Hardy Weinberg expectation derived from allelic frequencies.
where H is the actual observed frequency of heterozygous individuals in the population. If there are no heterozygotes, F = 1.0, whereas if the number of heterozygotes equals the Hardy-Weinberg predicted value of 2pq, then there is no effect of inbreeding and F = 0.
Two different ways to calculate F: Two very different mtheods can be used to calculate the value of the inbreeding coefficient F. They both arrive at the same number, but the methods that are used are so different that careful examination is needed to realize that they are simply two different ways of achieving the same goal. Our current textbook uses a procedure based on summation of pathway diagrams (figure 19.4 and equation 19.11) to arrive at a final value for F. Although this method is effective, it involves a procedure that is not intuitively self-explanatory and that must be basically memorized. These notes will therefore begin with a more direct calculation, and then relate that calculation to the use of pathway diagrams.
Significance of inbreeding coefficient: The inbreeding coefficient (F) is the probability that the two alleles at a genetic locus in an individual are identical by descent (derived from one particular allele in their ancestry). This is referred to as autozygosity and is generally understood to be limited to recent ancestry, since broad populations also share common alleles (referred to as allozygosity). Inbreeding causes a population to become completely homozygous over a number of generations, whereas random mating maintains heterozygosity at a level of 2pq. The balance between the two depends on the F value, with increased homozygosity as F becomes larger and increased heterozygosity as F becomes smaller.
Consanguinous marriage: For calculations concerning the consequences of inbreeding in consanguinous marriages, the only genes that are of special interest are those that end up homozygous by descent (autozygous) because of the consanguinous marriage. As an example, we can consider the offspring of a marriage of first cousins (grandchildren of the same couple). In this situation, the parental generation can be considered to have a total of four unique alleles (2 from the male and 2 from the female) at each genetic locus. Note that each allele is considered "unique" for this purpose even if some of them are indistinguishable except for their ancestral origins. Also note that the situation becomes more complex if there is already autozygosity in the parental generation (a situation that we will not examine here). The inbreeding coefficient (F) that we are seeking to derive is the probability that the children from the marriage between first cousins will be homozygous by descent for any one of the four original alleles from their great-grandparents.
Homozygosity by descent: If we look at the four alleles one at a time,, each child of the original couple will have a 1/2 probability of carrying any one of the four original alleles. Each grandchild will have a 1/4 probability, assuming the other parent is unrelated in each case. Gametes produced by the cousins (grandchildren of the original couple) will each have a probability of 1/8 of carrying any one of the alleles from the original couple. Thus, the probability that a great-grandchild of the original couple born as a result of the first cousin marriage will be homozygous by descent for any one of the four original alleles is 1/8 x 1/8 = 1/64. However, since there are four different alleles derived from the original couple, the combined probability is 4 x 1/64 = 1/16 that the children of the first cousin marriage will be homozygous by descent for one of the four alleles from the original couple. Thus, the F value for the children of first cousins is 1/16. For children of second cousins (individuals who share common great-grandparents), the F value is 1/64. For the offspring of brother-sister matings, which are common in genetic studies on laboratory organisms, the F value is 1/4.
Use of path diagrams to calculate inbreeding coefficient F: The use of path diagrams is discussed on pages 588-589 and illustrated in figure 19.4 for a first-cousin marriage. To do a calculation for an individual, start with one parent of that individual, construct a path back to an ancestor shared in common by both parents, and continue the path to the second parent. Count the number of individuals on that path. For the second cousin marriage illustrated in figure 19.4, the number is 5 (XVTWY in figure 19.4). Then construct a similar path for each other ancestor that is shared in common, each time going from one parent to the ancestor, and then to the other parent. In figure 19.4, the second possible pathway is XVUWY, again resulting in a path length of 5. The next step is to seperately raise 1/2 to a power equal to the nomber of individuals in each of the paths, and then add all of the individual results together to obtain the F value. For the example in figure 19.4, the F value is
In this exercise, pathway XUTWY yields a value of 1/32, which is the probability that either of the two alleles at any particular locus carried by greatgrandparent T might have been passed down to the subject Z. When added to the same probability for alleles from great-grandparent U, the final result is 1/16, which is in full agreement with the direct step-by-step calculations for a first cousin marriage discussed above. You may use either method as long as you understand what you are doing and do it correctly. (I personally prefer the step-by-step method because the genetic relationships that are involved at each step are more immediately obvious.)
Increased probability of homozygous recessive states: One of the greatest hazards from inbreeding is an increased risk of recessive genetic diseases. We have already seen that the frequency of heterozygous carriers of rare recessive diseases is much higher than than the frequency of homozygous patients. For the child of a consanguinous marriage, the risk of being homozygous by descent for a recessive allele is the F value times the allelic frequency of the disease, Fq. For an allele with a q value of 0.001 and a first cousin marriage with an F value of 1/16 (0.0625), the risk factor from inbreeding is 0.0625 x 0.001 = 0.0000625. This is subsltantially greater than the risk from random chance homozygosity (0.001x0.001 = 0.000001) that is applicable to the remainder of that individual's heredity (1-F) that is not subject to homozygosity by descent. Thus, without going into a full mathematical development of the equation, one can conclude that the probability of homozygosity for a rare allele with a frequency q in the general population coupled with an inbreeding coefficient of F is:
Recessive genetic diseases: To illustrate the effect of consanguinous marriage on genetic diseases more specifically, consider a rare autosomal recessive disease allele with an allelic frequency q of 0.001. The probability of a homozygous afflicted child from random mating is
However, for a first cousin marriage with an F value of 1/16, the probability is
Thus, the probability of the disease is increased about 63 fold in the first cousin marriage. There would also be a similar increased risk for other recessive diseases, with the greatest relative increase in risk occurring for those diseases with the lowest allelic frequencies.
Inbreeding coefficient for a population: A first approximation of the degree of inbreeding in a population can be obtained from the equation
If the amount of observed heterozygosity (H) is less than predicted from allelic frequencies by the Hardy-Weinberg value of 2pq, it is indicative that a substantial amount of inbreeding occurs within the population.
Appendix: Error in last year's textbook Last year.s textbook, Klug and Cummings, Concepts of Genetics, 5th Edition, attempted to demonstrate a step-by-step calculation of the inbreeding coefficient, F, based on probability of inheritance of an ancestral allele at each generation, as described in these notes. However, there was a serious error in their presentation. If anyone attempts to use that book as a supplement to this year's book, please be sure to read and understand the following paragraph from last year's notes.
Textbook error: The textbook attempts to diagram homozygosity by descent in figure 24.17, but it contains serious mistakes that have arisen from a failed attempt to make similtaneous calculations of F values (homozygosity by descent for any one of the four original alleles) and the probability of homozygosity by descent for a single recessive allele a. In both marriages (first cousin and second cousin), the probabilities stated in the figure that the cousins are carriers of the recessive allele a are correct. However, even if both are heterozygous for a, only half of the gametes produced by each of them will carry the recessive allele, such that only 1/4 of their children will be homozygous recessive. Thus, for the first cousin marriage, the probability that one of their children will be homozygous by descent for a recessive allele derived from one grandparent will be only 1/64. The F value for all four of the alleles carried by the original couple is 4 times that amount = 1/16. Similarly, for the second cousin marriage, the F value is 1/64, but the probability of a child who is homozygous for the recessive allele a is only 1/256.