Revised December 5, 1998

MCDB 2150 Lecture 39

Population Genetics I

Textbook Assignment: Chapter 24, pages 659-680, with emphasis on the topics covered by these notes (basic concepts, 659-663; heterozygosity, 665-666; inbreeding, 673-676). Also please note that most of the parts of Chapter 24 that are not included in this lecture are covered in the next lecture. Thus, you may want to read straight through the chapter.

Textbook Error: Please note that there is a serious error in Figure 24.17, which is discussed in detail in the section on inbreeding later in these notes. In brief summary, the authors have incorrectly blended two separate concepts together. The values for F, the coefficient of inbreeding, are correctly calculated for the first and second cousin marriages. However, these values represent the probability that the offspring of the consanguinous marriages could be homozygous by descent for any one of the four alleles at a particular locus in the parental generation at the top of the pedigree. The error occurred when the authors attempted to calculate the probability of homozygosity for a recessive allele a that was present in one copy in one member of the parental generation. In these calculations, they failed to take into account the fact that only one fourth of the progeny of a couple who are both heterozygous for a recessive allele will be homozygous recessive. Therefore, the correct values for homozygous aa offspring are 1/64 for the first cousin marriage and 1/256 for the second cousin marriage. The F values of 1/16 for a first cousin marriage and 1/64 for a second cousin marriage are correct, but these reflect the probability of the children being homozygous by descent for any one of the four original alleles, and are not applicable to the homozygous recessive state aa.

Major concepts

Overview of population genetics: Population genetics is concerned with allelic frequencies within a population and the selective forces that cause changes in those frequencies. It is a quantitative science that must ultimately be analyzed in terms of mathematical formulations. However, it is important not to let the mathematics obscure the relative simplicity of the basic concepts that are involved.

Basic concepts: Population genetics deals with three basic concepts, 1) the quantitative distribution of alleles within a population, 2) the quantitative distribution of genotypes within a population, and 3) the quantitative distribution of phenotypes within a population. In each case, the total population = 1.0, with its constituent parts expressed as decimal fractions whose sum is 1.0.

Hardy-Weinberg Equilibrium: The basic calculations that are used as the foundation of population genetics begin with two alleles at a single genetic locus that are assumed to have frequencies p and q, whose sum is 1.0.

p + q = 1.0

Genotypic frequencies: When random mating occurs within a population containing those alleles, the probability of obtaining homozygous offspring is is p2 and q2, respectively, and the probability of heterozygotes is 2pq, since either allele can come from either parent. The distribution of genotypic frequencies in the progeny is simply the product of the allelic frequencies for the parental generation.

(p + q)(p + q) = (p + q)2 = p2 + 2pq + q2 = 1.0

Binomial relationships: Thus, the relationship between allelic frequency and genotypic frequency is the relationship between a a simple binomial and that binomial squared. This relationship can be depicted visually as a unit square with each of the sides subdivided linearly to depict the values of p and q. The homozygous genotypic frequencies then become the smaller squares depicting p x p and q x q, with the two rectangles p x q together representing the heterozygous population. This relationship is shown in figures 24.1 and 24.2 except that the areas are not drawn to scale. A crudely scaled depiction of the relationship in Figure 24.2, in which p = 0.7 and q = 0.3 is presented below, where pp = homozygous p, qq = homozygous q, and pq = heterozygotes.

pp pp pp pp pp pp pp   pq pq pq
pp pp pp pp pp pp pp   pq pq pq
pp pp pp pp pp pp pp   pq pq pq
pp pp pp pp pp pp pp   pq pq pq
pp pp pp pp pp pp pp   pq pq pq
pp pp pp pp pp pp pp   pq pq pq
pp pp pp pp pp pp pp   pq pq pq

pq pq pq pq pq pq pq   qq qq qq
pq pq pq pq pq pq pq   qq qq qq
pq pq pq pq pq pq pq   qq qq qq

If the 100 letter pairs in this matrix reflect the entire population = 1.0, it is clear that 0.7x0.7 yields a value of 0.49 for pp, and 0.3x0.3 yields a value of 0.09 for qq. The heterozygous population, pq, is represented by two separate groups of 0.7x0.3 = 0.21, for a total heterozygous population of 0.42. This verifies the relationship for genotypes derived from two alternative alleles

p2 + 2pq + q2 = 1.0

0.49 + 2x0.21 + 0.09 = 1.0

Predictions: The Hardy-Weinberg equilibrium predicts that in the absence of distorting forces (discussed below), both allelic and genotypic frequencies will remain constant in a population and that if the equilibrium is perturbed a new equilibrium will be reached within one generation based on the allelic frequencies of the remaining population. The conditions that must be met for the predictions of the Hardy-Weinberg equilibrium to be valid are described below:

  1. Random mating. Mating patterns must randomly reflect the entire breeding population, with no dependence on genotype or closeness of relationship (either positive or negative). The effects of inbreeding are discussed later in this lecture.
  2. No sex bias in allelic frequencies. The distribution of alleles must be the same in both sexes.
  3. All genotypes equally viable and fertile. There must not be any selective advantages or disadvantages. This is seldom true in a real population, and often must be taken into account in terms of evolutionary pressures.
  4. Mutation rate too low to alter ratios. The basic assumption is that alleles are stable through many generations and are not altered or degraded significantly by mutation. In practice this is generally not a serious problem.
  5. Closed population (no in or out migration). The "population" that is being considered must be a constant one. Introduction of new genes into the breeding pool or loss of genes from the breeding pool by migration between "populations" can distort trends.
  6. Population must be large. The population must be large enough so that there are no confounding effects due to genetic drift (random events altering allelic frequencies by pure chance) or due to "founder" effects, where a recessive gene becomes fixed in a population because too many of its members are descendants of a single individual.

Calculation of allelic frequencies: Observed frequencies of specific genotypes often serve as the starting point for calculations of allelic frequencies, p and q. For these calculations, we will use f(pp) as the frequency of homozygous p individuals, f(qq) as the frequency of homozygous q, and either f(pq) or H as the frequency of heterozygotes. Since each homozygous p individual carries two p alleles and each heterozygote carries one p allele,

p = 2f(pp) + f(pq).

Similarly,

q = 2f(qq) + f(pq).

Note that some textbooks (but not ours) use P and Q to designate the frequencies of the two classes of homozygotes. In such notation, p = 2P + H, and q = 2Q + H.

Genetic Variation: Most normal populations contain substantial genetic variation, much of which appears to be neutral in its effects on fitness under current environmental conditions. An example is the MN blood type distribution, which our textbook draws upon heavily for examples. As shown in Table 24.3, the allelelic frequencies of the M and N alleles can vary widely from one population to another. In the following example from last year's textbook, a randomly selected population of 200 individuals (whose diploid genomes contain a total of 400 alleles) was found to have the following genotypic distribution: MM, 114; NN, 10; MN, 76. Using the nomenclature introduced above, allelic frequencies can be calculated as follows:

f(M) = p = (2x114 + 76)/400 = 304/400 = 0.76.

f(N) = q = (2x10 + 76)/400 = 96/400 = 0.24.

p + q = 0.76 + 0.24 = 1.0

Calculation of allelic and carrier frequencies for rare genetic diseases: For recessive diseases, it is usually necessary to begin with the frequency of afflicted homozygous individuals, f(qq). The square root of f(qq) is the allelic frequency q. The "normal" allele has a frequency p = (1-q). Heterozygous carriers can be calculated as H = 2pq = 2q(1-q), using the square root of f(qq) as the value for q. These calculations yield carrier frequencies (H) that are far larger than most non-geneticists would expect.

Example: As an example, phenylketonuria (PKU, page 366) occurs with an incidence of about 1/11,000 (= 0.0000909). The allelic frequency, q is thus about 0.0095, which means that p is about 0.9905 The calculated frequency of heterozygous carriers, 2pq is therefore 0.0188 or almost 2% of the random human population.

Inbreeding: The relationship between allelic frequencies and genotypic frequencies postulated by the Hardy-Weinberg principle is strongly based on the assumption of random mating among all members of the population. That assumption often has to be altered for real populations due to deviations from true random mating.

Loss of heterozygosity due to inbreeding: The text (pages 673-677) describes deviations from Hardy-Weinberg expectations that result from inbreeding, which can be defined as the occurrence of matings between individuals who share a common ancestry. The net result of inbreeding is to increase the frequency of homozygosity and decrease the frequency of heterozygosity. This is most clearly evident in species that reproduce primarily by self-fertilization, including many plants, such as the peas used in Mendel's original studies, and the small nematode, C. elegans, which is widely used in studies in developmental genetics, including studies in this department.

Self fertilization as an extreme case of inbreeding: In self fertilization, individuals that are homozygous will produce only homozygous progeny (with the exception of rare mutations). Individuals that are heterozygous will produce progeny that are 50% heterozygous and 50% homozygous. Thus, in each successive generation, the fraction of heterozygous individuals is reduced by half, with corresponding increases in both classes of homozygotes (Figure 24.16). The net result is that after a few generations, heterozygotes are rare and virtually all of the population is homozygous for one or the other of the two alleles from the original heterozygous indivual. As an aside, note that this property makes it possible to isolate homozygous recessive mutations in C. elegans or in self-fertilizing plants simply by allowing them to reproduce by self-fertilization for two or more generations after mutagenesis.

Outbreeding: The opposite of inbreeding is outbreeding, in which mating between close relatives does not occur. Outbreeding tends to reduce the frequency of homozygosity and increase the frequency of heterozygosity to a greater degree than predicted on the basis of truly random mating. Because close inbreeding is culturally forbidden and outbreeding is the societal norm for human reproduction, human populations do not conform exactly to Hardy-Weinberg predictions. However, the total pool of potential mates is so large that exclusion of close-relatives as possible mates can generally ignored except when specifically analyzing the effects of inbreeding at the individual level or in small isolated populations.

Selective mating: Positive assortative mating, in which individuals with similar phenotypes mate, also tends to increase homozygosity, whereas negative assortative mating, in which individuals with different phenotypes mate (opposites attract), increases heterogeneity.

Inbreeding coefficient: An inbreeding coefficient (F) is used for quantitative calculation of the effects of inbreeding. F reflects the extent to which heterozygosity is reduced relative to the Hardy Weinberg expectation derived from allelic frequencies.

F = 1 - (H/2pq),

where H is the actual observed frequency of heterozygous individuals in the population. If there are no heterozygotes, F = 1.0, whereas if the number of heterozygotes equals the Hardy-Weinberg predicted value of 2pq, then there is no effect of inbreeding and F = 0.

Significance of inbreeding coefficient: The inbreeding coefficient (F) is the probability that the two alleles at a genetic locus in an individual are identical by descent (derived from one particular allele in their ancestry). This is referred to as autozygosity and is generally understood to be limited to recent ancestry, since broad populations also share common alleles (referred to as allozygosity). Inbreeding causes a population to become completely homozygous over a number of generations, whereas random mating maintains heterozygosity at a level of 2pq. The balance between the two depends on the F value, with increased homozygosity as F becomes larger and increased heterozygosity as F becomes smaller.

Consanguinous marriage: For calculations concerning the consequences of inbreeding in consanguinous marriages, the only genes that are of special interest are those that end up homozygous by descent (autozygous) because of the consanguinous marriage. As an example, we can consider the offspring of a marriage of first cousins (grandchildren of the same couple). In this situation, the parental generation can be considered to have a total of four unique alleles (2 from the male and 2 from the female) at each genetic locus. Note that each allele is considered "unique" for this purpose even if some of them are indistinguishable except for their ancestral origins. Also note that the situation becomes more complex if there is already autozygosity in the parental generation (a situation that we will not examine here). The inbreeding coefficient (F) that we are seeking to derive is the probability that the children from the marriage between first cousins will be homozygous by descent for any one of the four original alleles from their great-grandparents.

Homozygosity by descent: If we look at the four alleles one at a time,, each child of the original couple will have a 1/2 probability of carrying any one of the four original alleles. Each grandchild will have a 1/4 probability, assuming the other parent is unrelated in each case. Gametes produced by the cousins (grandchildren of the original couple) will each have a probability of 1/8 of carrying any one of the alleles from the original couple. Thus, the probability that a great-grandchild of the original couple born as a result of the first cousin marriage will be homozygous by descent for any one of the four original alleles is 1/8 x 1/8 = 1/64. However, since there are four different alleles derived from the original couple, the combined probability is 4 x 1/64 = 1/16 that the children of the first cousin marriage will be homozygous by descent for one of the four alleles from the original couple. Thus, the F value for the children of first cousins is 1/16. For children of second cousins (individuals who share common great-grandparents), the F value is 1/64. For the offspring of brother-sister matings, which are common in genetic studies on laboratory organisms, the F value is 1/4.

Textbook error: The textbook attempts to diagram homozygosity by descent in figure 24.17, but it contains serious mistakes that have arisen from a failed attempt to make similtaneous calculations of F values (homozygosity by descent for any one of the four original alleles) and the probability of homozygosity by descent for a single recessive allele a. In both marriages (first cousin and second cousin), the probabilities stated in the figure that the cousins are carriers of the recessive allele a are correct. However, even if both are heterozygous for a, only half of the gametes produced by each of them will carry the recessive allele, such that only 1/4 of their children will be homozygous recessive. Thus, for the first cousin marriage, the probability that one of their children will be homozygous by descent for a recessive allele derived from one grandparent will be only 1/64. The F value for all four of the alleles carried by the original couple is 4 times that amount = 1/16. Similarly, for the second cousin marriage, the F value is 1/64, but the probability of a child who is homozygous for the recessive allele a is only 1/256.

Increased probability of homozygous recessive states: One of the greatest hazards from inbreeding is an increased risk of recessive genetic diseases. We have already seen that the frequency of heterozygous carriers of rare recessive diseases is much higher than than the frequency of homozygous patients. For the child of a consanguinous marriage, the risk of being homozygous by descent for a recessive allele is the F value times the allelic frequency of the disease, Fq. For an allele with a q value of 0.001 and a first cousin marriage with an F value of 1/16 (0.0625), the risk factor from inbreeding is 0.0625 x 0.001 = 0.0000625. This is subsltantially greater than the risk from random chance homozygosity (0.001x0.001 = 0.000001) that is applicable to the remainder of that individual's heredity that is not subject to homozygosity by descent. Thus, without going into a full mathematical development of the equation, one can conclude that the probability of homozygosity for a rare allele with a frequency q in the general population coupled with an inbreeding coefficient of F is:

f(qq) = q2(1-F)+Fq

Recessive genetic diseases: To illustrate the effect of consanguinous marriage on genetic diseases more specifically, consider a rare autosomal recessive disease allele with an allelic frequency q of 0.001. The probability of a homozygous afflicted child from random mating is

f(qq) = q2 = 0.000001.

However, for a first cousin marriage with an F value of 1/16, the probability is

f(qq) = q2(1-F)+Fq

= 0.000001x15/16 + 0.001x1/16

= 0.00000094+0.0000625 = 0.0000634.

Thus, the probability of the disease is increased about 63 fold in the first cousin marriage. There would also be a similar increased risk for other recessive diseases, with the greatest relative increase in risk occurring for those diseases with the lowest allelic frequencies.

Inbreeding coefficient for a population: A first approximation of the degree of inbreeding in a population can be obtained from the equation

F = 1 - (H/2pq).

If the amount of observed heterozygosity (H) is less than predicted from allelic frequencies by the Hardy-Weinberg value of 2pq, it is indicative that a substantial amount of inbreeding occurs within the population.

Extensions of the Hardy-Weinberg Law: The textbook discusses modifications that must be made to the Hardy Weinberg Law when there are more than two alleles at a given locus, and when dealing with genes carried on sex chromosomes (pages 663-665). The mathematics quickly become more complex, but the basic principles are similar. Because of lack of time, we will not explore the details of these variations on the basic theme of population genetics.