Reading assignments:
Major concepts
Introduction: These notes are organized primarily in terms of the Human Genome Project. The last three sections of chapter 15 of our textbook deal with various aspects of large scale genomic analysis, including some limited discussion of human genomic analysis. Additional material on the Human Genome Project is presented on pages 744-746, and genetic testing is described on pages 746-749. However, these sections of the textbook do not fully describe all of the material that will be covered in this lecture. You should therefore view these notes and the web page referenced above as supplemental text material. Also, since these notes are combining material from two different parts of the textbook, no attempt will be made to present the material in the same sequence as in the textbook. instead, it will be organized primarily in terms of progressive stages of the Human Genome Project.
Genomic analysis: Until recently, the large size of a typical genome and the vast number of different genes that it contains has made it impossible to think seriously about gaining a full understanding of the entire sequence. However, a number of advances in molecular biology, including automated sequencing and the ability to use recombinant DNA technology to work with overlapping clones that cover a wide range of sizes, together with advances in computer technology that make it feasible to work with the amount of data that must be handled, are now making it realistically possible to obtain complete genomic sequences for a variety of species, including humans.
Human genome project: A major international project is currently seeking to sequence the entire human genome. This is being done in a coordinated set of steps in many laboratories worldwide. The overall strategy has been organized into five stages, which in practice are somewhat overlapping:
MCDB involvement in Human Genome Project: Because of the magnitude of the overall task, various parts of the Human Genome Project have been parceled out to many different laboratories. Dr. Kenneth Krauter, a member of the MCDB faculty, is one of the participants in the Human Genome Project. His laboratory has in the past participated in high resolution physical mapping and assembly of contigs (contiguous sets of overlapping cloned sequences) for human chromosome 12, and is currently working on a similar project for human chromosome 18.
Human linkage markers: Studies of genetic linkage require the availability of polymorphic loci whose alternative forms occur with a sufficient frequency so that a substantial portion of the population is heterozygous. Until quite recently, it has been difficult to find suitable markers for such studies in humans. The total number of identified human protein-coding genes is still rather small (although it is now growing rapidly because of recent technological advances).
Identification of linkage groups; The phenomenon of sex linkage made it relatively easy to identify human genes that are carried on the X-chromosome. In addition, examination of the male progeny of women who are known to be heterozygous for two different X-linked genes can provide at least a crude estimate of crossover frequency and map distance. Because humans have 22 autosomal linkage groups and because many linked genes are too far apart to be seen as linked in simple pedigrees, it was initially much more difficult to verify autosomal linkage. In addition, many of the genetic loci have been identified only in terms of relatively rare disease-causing alleles, with the vast majority of the population carrying wild-type alleles that do not differ significantly from one individual to another.
Polymorphic DNA markers: The first goal of the Human Genome Project was to generate high resolution genetic linkage maps for each human chromosome, with markers spaced no further than about 1 - 2 Mb (megabase pair = 1 million base pairs) apart. In order to accomplish this goal, it was necessary to identify additional polymorphic markers whose linkages to each other and to known protein-coding genes could be analyzed. Fortunately, a variety of polymorphic DNA markers are now available that can be used for linkage studies (described on pages 404-410 of our textbook). These include RFLPs, VNTRs (minisatellites), and STRPs (microsatellites).
Lod scores: Statistically valid confirmation of linkage in humans usually requires the combining of data from numerous pedigrees. This is done by calculating a lod score (log of odds). This calculation determines the liklihood of obtaining the observed results based on an assumption of linkage, as compared to the liklihood of obtaining the same results by pure chance. The result is expressed as a logarithm to the base 10 to accomodate the wide range of values that are encountered. A lod score greater than 3.0 (1000:1 odds) is considered strong evidence for linkage. Details of lod score calculations are presented in appendix 1 at the end of these notes.
Completion of initial genetic linkage map: A low resolution genetic map was completed in 1992, and resolution was rapidly increased after that time, such that the first phase was considered to be successfully completed by about 1994. A timeline of progress in the Human Genome Project through 1997 can be viewed online.
Physical map: The second goal of the Human Genome Project was to construct a "physical map" of the genone. Unlike a genetic (linkage) map, which uses recombination frequency as a basis for map distance in centimorgans, the physical map is based on the actual location of genes in the chromosome, referenced either to the banding pattern of condensed mitotic chromosomes or the position of the marker relative to the entire length of DNA in the chromosome. The initial goal of the physical mapping program was to identify markers spaced about every 100,000 base pairs along all of the chromosomes. This goal, which required 30,000 markers to cover the entire 3 x 109 base pairs in the human genome, was completed in 1996.
Additional types of markers: Since physical mapping is based on position in the chromosome, rather than crossover frequencies, the markers that are used do not have to be polymorphic. As described below, a variety of techniques that allow specific genes to be associated with specific chromosomes have been employed, including deletion mapping, somatic cell hybrids containing only one or a few human chromosomes, and in situ hybridization. A variety of non-polymorphic DNA markers, such as sequence-tagged sites (STS) and expressed sequence tags (EST) have also been used. In addition, collections of large cloned segments of DNA (in vectors such as YACs and BACs) have been prepared and assembled into overlapping sequences called contigs, as described in the section on cloning, below. These clones and subclones derived from them, which are the starting points for the actual sequencing of the genome, have also been incorporated into the physical maps of the individual chromosomes. .
Deletion mapping: Small deletions that do not remove enough of a chromosome to be lethal in heterozygotes behave much like recessive mutations. For example, if a female Drosophila carries the white-eyed mutation on one of her X-chromosomes and a deletion of the immediate region that includes the white-eyed locus on her other X-chromosome, she will have white eyes. Functionlly, she is hemizygous for the chromosomal region that corresponds to the deletion. In some cases, small deletions in human chromosomes can be associated with loss of function of certain genetic loci, thus allowing those genes to be physically mapped to the location of the deletion.
Somatic cell hybrids: Mammalian cells from various species can be fused in ways that generate viable hybrid cells whose genomes are essentially the summation of the two parental genotypes. When normal human cells are fused with rapidly growing mouse or Chinese hamster cell lines, human chromosomes are preferentially lost from the hybrids, often leaving lines with only one or just a few human chromosomes. In cases where these cells can be shown to produce a specific human protein, the gene coding for that protein can be assigned to one of the human chromosomes in the hybrid cell. In many cases, it is necessary to examine several lines that each contain a few human chromosomes to establish a correlation between a specific human protein and a specific human chromosome. (Table 15.2 and boxed example 15.9). The human chromosomes in the hybrid cells can be identified by their banding patterns.
In situ hybridization: Fluorescence in situ hybridization was described in the chapter on chromosomes and eukaryotic genomic organization (see figure 10.22). Any sequence that has been cloned can be tagged with a fluorescent dye and hybridized to a preparation of metaphase chromosomes. This allows physical mapping to a specific segment of a specific chromosome. Newer techniques are available that allow the location of specific sequences to be localized within a YAC or BAC. To see an example of this, go to Highlights of Research Progress from the 1997 Human Genome Program Report. There is a "thumbnail" picture of FISH mapping on DNA fibers that can be clicked on to generate a larger picture. The legend claims it can achieve a resolution of 3 to 5 kilobases.
Sequence tagged sites (STS): An STS is a short randomly cloned sequence that can be used in a manner similar to an annonymous probe for an RFLP, except that no polymorphism is needed. PCR primers have been identified that uniquely amplify specific STS sequences. This allows PCR to be used as an alternative to hybridization to determine whether a specific STS is contained in a large clone, such as a YAC or BAC.
Expressed sequence tags (EST): An EST is simply a cDNA prepared from an mRNA that is expressed somewhere in the body. Localization of ESTs on the physical map is particularly valuable because each EST represents a protein coding gene that is actually expressed somewhere in the body. Large collections of ESTs have been accumulated and are now frequently used to identify the human form of genes originally identified in other species, as described in boxed example 26.1.
Relationship between genetic and physical maps: Although there is an overall correlation between genetic linkage maps and maps based on physical position on the chromosome, the relationship is not strictly linear. Thus, it is not possible to use linkage data to predict the exact positions of genes on the physical map. In addition, there are major species differences. On the average, one centimorgan in the human genome corresponds to about one million nucleotide pairs, whereas in yeast, one centimorgan corresponds to about three thousand nucleotide pairs.
Cloning the entire genome: As a prelude to actually determining the nucleotide sequence of the entire human genome, a major effort is being made to identify cloned genomic DNA sequences that collectively contain the complete sequence. Two types of approaches designated, bottom-up and top-down, are used for genomic cloning, depending on the size of the entire genome.
Bottom-up approach: In the bottom-up approach, which until recently has only be used with relatively small genomes, one starts with a series of overlapping clones, obtained by incomplete digestion, or with several different restriction endonucleases. Restriction maps (boxed example 9.1, pages 257-259) are prepared for the overlapping clones, and the overlapping clones are assembled together to generate a longer contiguous region, known as a contig. Additional clones are isolated to link the contigs into larger contigs, until an entire chromosome is represented in one large contig. This provides a physical map, which can then be converted to a complete nucleotide sequence by sequencing each of the individual clones that make up the large contig. Although this approach was initially developed for small genomes, more advanced computer analysis, together with automated sequencing techniques is beginning to make it far more feasible for much larger genomes.
Top-down approach: For larger genomes, the top-down approach has generally proven to be more effective. In this case, the first step is to construct a library of very large overlapping fragments, cloned in YACs, BACs and other vectors capable of carrying large inserts. These large clones are then assembled into contigs that cover the entire region to be analyzed. Each large clone is then cut with various restriction endonucleases and subcloned into smaller pieces, which are further restriction mapped until pieces small enough for convenient sequencing are generated.
Repetitive DNA patterns: As discussed briefly in Chapter 10, there are numerous types of highly repetitive DNA sequences scattered through the human genome. In some cases, the pattern of distribution of these sequences can be used in the process of aligning large clones into contigs (figure 26.1).
Chromosome walking: It is often desirable to determine what lies next to a genomic sequence that has been cloned (for example, to identify an adjacent regulatory sequence, or to move toward a disease gene that has been shown to be closely linked to an identified marker. This can often be done by a process called chromosome walking (figure 15.21). Large clones that overlap the known sequence are identified. Additional clones that partially overlap the first overlapping clones are then identified by hybridization. A series of partial overlaps is used to literally "walk" along the chromosome until the desired location is found.
Chromosome jumping: If a substantial distance needs to be covered, it is possible to use a technique called chromosome jumping to get there faster (figure 15.22). Very large clones are isolated by various techniques such as cutting with restriction endonucleases with 8 nucleotide cut sites. These large segments are then circularized in a manner that allows the two ends to be identified. When the location where one of the ends hybridizes is identified, it becomes possible to jump directly to the location where the other end hybridizes. The textbook illustrates a 65 kb jump in figure 15.22. However, it is possible to use the ends of a BAC to jump as much as 300 kb.
Chromosome-length contigs: At this point in the Human Genome Project, contigs have been assembled for essentially all parts of each chromosome and large scale sequencing is currently being undertaken. The existence of a rough draft version of the complete human genomic sequence has been announced, although not all of the details are yet in place. A high quality sequence is expected within 2 - 3 years.
High speed automated sequencing: As contigs are completed, the major task remaining is to generate subclones that are small enough for sequencing and then to do the actual sequencing. Highly automated sequencing procedures have greatly increased both the speed at which sequence can be read and the size of the DNA segments that can be easily sequenced. Because of the many new technologies that have been developed, the Human Genome Project is far ahead of its originally projected schedule. In addition, a number of companies have realized the potential commercial opportunities associated with a knowledge of the human genome and are proceeding to do as much sequencing and filing of patents on the sequences they obtain as they can. In particular, Celera Genomics is applying a bottom-up approach seeking to obtain as much sequence information as quickly as possible and then link it all together through the use of high powered computerized analysis. Unfortunately, they are only releasing their findings to paid clients at this time.
Portions of genome that cannot be sequences with current technology: There are some highly repetitive parts of the human genome that are currently resistant to all known sequencing techniques. These are generally considered not to be genetically funcitonal and ignored during estimates of overall progress in sequencing the genome.
Matching coding sequences with functions: Recent estimates suggest that the human genome may contain betweem 80,000 and 120,000 protein-coding genes. Initially, many of these will be known only as ESTs. The task of trying to relate all of these apparently functional genes to specific biological functions will continue for a long time after the genomic sequence itself is completely understood. Procedures that have been use to identify and clone genes associated with several human genetic diseases are summarized in Appendix 2, below.
Prenatal diagnosis: Techniques such as chorionic villus sampling or amniocentesis (withdrawal of amniotic fluid that contains viable fetal cells) allow small amounts of fetal DNA to be obtained for prenatal diagnosis of genetic defects. Numerous types of assays are now possible, ranging from gross karyotypic analysis to detect conditions such as Down syndrome to RFLP analysis and allele-specific oligonucleotide screening, which can detect single base pair changes in coding sequences. One example presented in the text is RFLP analysis to detect sickle cell anemia. In this case, the sickle cell mutation results in loss of a restriction site (figure 13.28). Allele specific hybridization of a relatively short oligonucleotide is used to detect the most common mutation in cystic fibrosis (pages 748-749). We do not have time to analyze these procedures in detail, but the relevant portion of the textbook (pages 746-749) should be read. Additional details can be found on pages 466-469 of Klug and Cummings, Concepts of genetics, 5th Edition (Norlin reserve).
Web sites: A number of web sites have been established for sharing the data from the human genome project, which is so vast that it must be managed in large computer databases. Most of these sites are quite technical in nature. However, you may want to look at some of the following:
U. S. Department of Energy Human Genome Project. This site is closely related to the various Oak Ridge National Laboratory sites that are linked to various parts of these lecture notes.
The Science Behind the Human Genome Project. This is a general information page that attempts to explain in relatively non-technical terms the scientific principles that are involved in the Human Genome Project.
Genethon in France (click on English Version after connecting)
Online Mendelian Inheritance in Man. This is the definitive site to visit for information of specific human genetic loci and human genetic diseases.
Other species: Genomic mapping projects are also underway for a variety of other species used in research in various MCDB laboratories, including E. coli , yeast, C. elegans , Drosophila , and mice. The massive amount of data that will be gathered in these projects will greatly expand our understaning of many different aspects of molecular biology, including gene regulatory mechanisms, and patterns of evolution.
APPENDIX 1: LOD SCORES
Determining linkage and map distance from human pedigrees: Because of small numbers of progeny and lack of ability to do controlled matings, alternative methods must be used to construct human linkage maps. Our textbook simply says that "more complicated" statistical methods are used. The use of lod scores (log of odds) is summarized briefly in this appendix to the notes. You should work through this procedure to the point where you understand the general principles involved. However, you will not be required to do calculations of lod scores. The example described below is based on a description of the procedure on pages 125-126 of Tamarin, Pinciples of Genetics, 5th Edition, W.C. Brown, 1996, on reserve in Norlin.
Initial estimate of recombination frequency: In order to provide statistical evidence that two genetic loci appear to be linked, one must demonstrate the apparent absence of independent assortment. However, the analysis must also take into account the expected degree of recombination of the two loci, based on their proposed map distance. As described below, the calculation of lod scores begins with an estimate of recombination frequency besed on a preliminary examination of the available data.
Combining data from multiple pedigrees: It is usually necessary to combine data from several separate pedigrees to obtain a large enough sample size to conclude that linkage is highly likely. However, for purposes of illustration, an example can be based on a single large family, as is done below. The first step is to identify a pair of genetic loci that superficially appear to be linked, based on an extended family pedigree or several independent pedigrees. The available data are then used to obtain an initial estimate of the extent of recombination (map distance) between the two loci. In the example used here, the genes that are suspected of being linked produce phenotypes (dominant Nail-Patella syndrome, and codominant A and B blood types) that make it possible to see which alleles are present at both of the loci in each child studied. In the pedigree that this example is based on, there are eight children, of which only one exhibits apparent recombination (by having a phenotype that is inconsistent with the parental genotypes). This yields an initial estimate of 12.5 map units between the two loci. Please note that the procedures described below would be exactly the same if data from several pedigrees had been combined to show that recombination had occurred in one case out of eight possibilities that had been examined.
Ordered probability of observed births based on assumed linkage: Each recombination event generates two recombinant gametes. Since the probability of recombination is 0.125, the probability of a child receiving either of the recombinant chromosomes is 1/2 of the recombination frequency (0.0625). Similarly, the probability that recombination will not occur (based on the initial estimate) is 0.875, making the probability of a child receiving either of the non-recombinant chromosomes 0.4375. Since each birth is an inependent event, the product rule applies. The ordered probability for all of the births, based on the assumed degree of linkage is calculated by multiplying all of the individual probabilities together.
Ordered probability based on assumption of no linkage: To determine whether the proposed linkage is a better fit to the observed data than random probability, the ordered probability of the observed births is also calculated, based on the assumption that there is no linkage. Independent assortment in a dihybrid cross yields a probability of 0.25 for each of the four possible types of gametes (0.5 probability of receiving a particular allele at each locus and thus 0.25 probability for any particular combination of alleles at the two loci).
Calculation of lod score: The ordered probability of obtaining the observed births based on the assumption of linkage is then divided by the ordered probability of obtaining those births based on the assumption that there is no linkage. This provides a measurement of much greater (if any) the probability is based on linkage than without assuming linkage. Because the numbers that are obtained sometimes become very large, the results are usually reported as the logarithm to the base 10 of the ratio, commonly referred to as the "lod" (log of odds) score. In the example we have been analyzing, the lod score (Z) is calculated as follows:
Confirmation of linkage: A positive lod score indicates the odds are greater than 1:1 that there is linkage. A lod score of 3.0 (1000:1 odds) or more is considered to be strong confirmation of linkage, while lower positive values are considered suggestive of linkage. Negative values suggest that the the hypothesis being tested is wrong. In the example above, a lod score of just over 1.0 suggests slightly greater than 10:1 odds that there is linkage. This is not adequate for a firm conclusion that linkage exists, since there is close to a 1 in 10 chance that the results observed could have resulted from random chance.
Most likely recombination frequency: In a more sophisticated computerized analysis, it is possible to vary the proposed recombination frequency (theta) over a wide range of values. The value of theta that yields the highest lod score is considered to be the most likely recombination frequency. Note that an assumption of complete linkage (theta = 0) will give a lod score of minus infinity if a single recombination occurs. (If recombination is given a zero probability and a recombination occurs, that zero probability will be multiplied together with the other birth probabilities to generate an overall probability of zero that the observed pattern of births could have occurred without recombination. This in turn will cause the value assuming no recombination divided by the value assuming independent assortment to be zero. The log of zero is minus infinity).
APPENDIX 2: IDENTIFICATION OF HUMAN DISEASE GENES
Human disease genes: The material that follows is left over from before this course started to use the current textbook. Six different examples of identification and cloning of genes responsible for inherited human diseases are presented below, each of which employed a somewhat different experimental approach. With the exception of cystic fibrosis, these are not well covered in our current textbook, although a number of them are mentioned briefly in various parts of the book. We will not have time to deal with them in detail, but they illustrate well the various techniques that have been employed. All references to page, figure and "textbook" are to Klug and Cummings, Concepts of Genetics, 5th Edition, available at Norlin Reserve desk, unless explicitly stated otherwise.
Neurofibromatosis: Type 1 neurofibromatosis is an autosomal dominant condition associated with a wide range of nervous system defects, including benign tumors and learning disabilities. As described in the textbook (pages 464-465), a search for linkage to specific RFLPs localized the candidate gene to a region near the centromere of human chromosome 17. After the gene was localized as much as possible, chromosome walking was undertaken until a candidate gene was encountered. Its involvement in the disease was verified by sequencing studies that showed mutations in individuals afflicted with the disease. The overall process that led to the discovery of the NF1 gene is callled positional cloning. The wild-type gene appears to function in intracellular signal transduction, and more specifically in down-regulating cellular reproduction.
Marfan syndrome: A rather different approach was taken to identify the gene that is defective in Marfan syndrome, an autosomal dominant condition that causes alterations in connective tissue. Particular attention was given to genes coding for proteins known to function in various types of connective tissue. A protein known as fibrillin, which is found in tissues known to be affected by Marfan syndrome was identified as a likely candidate. The gene for fibrillin had already been cloned and mapped to the long arm of human chromosome 15. RFLP studies verified a linkage between the inheritance of Marfan syndrome and markers on chromosome 15. Cloning of the fibrillin gene from individuals with Marfan syndrome then verified the substitution of a proline for arginine at position 239 in the protein. The textbook describes this as the candidate gene approach.
Huntington disease: The search for the gene responsible for Huntington disease (also known as Huntington's chorea) was described in a previous textbook as an example of the use of RFLPs (Weaver and Hedrick, Basic Genetics, 2nd Edition, pages 393-399 -- on reverve in Norlin Library). Huntington disease is an autosomal dominant degenerative brain disease that usually does not exhibit any obvious symptoms prior to middle age. There is then a progressive loss of motor coordination, accompanied by uncontrolled spontaneous movements, ultimately resulting in death, but only after a prolonged period of increasingly severe symptoms.
Extended pedigrees and anonymous probes: Two key elements in the search for the HD gene were the existence of a large family in Venezuela with seven generations of documented HD and a rather long (about 15 kb) annonymous probe known as G8. The pedigree of the family covered seven generations, with the disease apparently traced back to a settler of European origin. The G8 probe identifies a complex RFLP pattern in DNA cut with Hind III. Because two polymorphic Hind III cut sites are involved, a total of four different G8 haplotypes are possible. The actual polymorphisms involve a 15.0 kb fragment vs. a 17.5 kb fragment (determined by presence or absence of cut site 1, which is beyond the 5'-end of the probe) and a 4.9 kb fragment vs. fragments of 1.2 and 3.7 kb (determined by cut site 2, which is located within the region that hybridizes with the probe). Because each individual is diploid, anyone who is heterozygous will exhibit two different haplotype patterns superimposed. In the Venezuelan family, HD was strongly associated with the C haplotype (possessing both polymorphic cut sites). In other pedigrees, different G8 haplotypes may be associated with HD. Do not make the mistake of assuming that a specific G8 haplotype is associated with HD outside of the family group in which the association has been demonstrated.
Chromosomal localization: Studies on cultured mouse cell lines that also contained a single human chromosome revealed that the G8 probe was associated with chromosome 4, and further studies placed the HD gene near one of the ends of chromosome 4. Ultimately, it was tentatively associated with a region of about 500 kb on chromosome 4.
Identification of the HD gene: Within the 500 kb region, a process known as exon-trapping was used to specifically examine sequences that were bordered by splicing signals that marked intron/exon boundaries. This procedure clones random fragments from the region of interest into a special vector that is engineered so that a splicing reaction will occur if the cloned fragment contains an intron/exon boundary. The splicing reaction in turn changes the selective pattern of the vector, such that it becomes possible to select for those that contain cloned splice sites. Sequences that had been identified as presumptive exons were then used to identify cDNA clones that contained the complete coding sequences of the corresponding genes.
Triplet repeats: One of these coded for a large protein (3,144 amino acids) that did not closely resemble any known proteins. In addition, the protein had a very unusual region of sequence with 23 glutamine residues in a row, coded by a repeated triplet CAG (with one CAA codon also). When the same gene was isolated from known HD patients, the number of CAG repeats was found to be greatly expanded, ranging from 42 to about 100 in initial studies. A further study of unaffected people revealed a range of 11 to 34 CAG repeats, with 98% of the unaffected individuals under 24 repeats. In a few rare cases of afflicted individuals with two normal parents, new mutations had expanded the number of CAG repeats in the patients.
Cystic fibrosis: Cystic fibrosis is a complex disease with a variety of phenotypic manifestations. Thickened mucous that leads to respiratory problems and susceptibility to one particular type of pneumonia are the usual causes of death, but there are also numerous other problems, including digestive difficulties, abnormal composition of sweat, etc. The disease was found in linkage studies to be assoicated with a region on the long arm of human chromosome 7. Investigators then took advantage of the fact that active genes have CCGG sequences in which the second C is not methylated, while inactive genes often have that site methylated. The restriction endonuclease Hpa II cuts only when there is no methylation. This allowed identification of "islands" of DNA that could be cut in a background "sea" of DNA that could not be cut within the general region where the CF gene was thought to reside.
Brute force sequencing: This pinned down a region of 1.5 megabases. Sequencing through about 250 kb yielded a candidate gene that proved to be altered in CF patients. The gene codes for a membrane-spanning chloride transport channel whose malfunction explains the diverse phenotypic properties associated with the disease (this is explored in extended detail in MCDB 3120). The most common molecular defect is loss of function of a key part of the protein that binds ATP to provide the energy needed for the transport. The localized nature of the most common defects in this gene provides a basis for genetic screening to detect the defects in carriers.
Duchenne muscular dystrophy: Duchenne muscular dystrophy (DMD) is a sex linked disease that causes skeletal muscle cells to be very fragile, such that they are continually breaking down and being replaced by regeneration. The actual disease phenotype usually does not seriously manifest itself until 3-5 years of age when the victim uses up all of the replicative potential of skeletal muscle satellite cells, such that regeneration can no longer occur effectively. There is also an unusually high rate of spontaneous mutation to generate new cases of DMD, which appears to be related to the huge size of the gene (2 megabases of DNA, coding for a protein with a molecular weight of about 400,000).
Deletion analysis: Females do not normally exhibit the disease, because male victims do not live long enough to reproduce. Thus, rare female cases usually result from inheritance from a heterozygous mother plus a new mutation in the paternally-derived X-chromosome. Karyotypic analysis of several female patients revealed small deletions in one of their X-chromosomes at a particular location, which together with a mutation in the other X appeared to be responsible for their DMD. Analysis of the genetic region in which these deletions occurred led to identification of the DMD gene.
Hereditary breast cancer: The search for the gene defect responsible for hereditary breast cancer involved a variety of competing laboratories. The gene was traced to the long arm of chromosome 17, and through a series of linkage studies localized within about 600 kilobases. Techniques that enrich for cDNAs that hybridize within a specific chromosomal region were used to identify candidate genes, including the gene now designated BRCA1, which was found to be mutated in the familial breast cancer patients. Gene BRCA1 codes for a protein of 1863 amino acids that has DNA binding properties (zinc fingers) and an acidic C-terminal domain that together make it appear to be a transcription control factor. A wide variety of mutations in various parts of this protein have been found in breast cancer pedigrees, making it unlikely that genetic screening can easily be established, since most are point mutations.
Abnormal localization of BRCA1 protein in breast cancer cells: In breast cancer cells, the BRCA1 gene product is found abnormally located in the cytoplasm, rather than in the nucleus where it needs to be to act as a transcription factor. Unexpectedly, this was found to be true even when the cancer was not related to BRCA1 mutations, suggesting that interactions with other proteins are needed for normal function of the BRCA1 protein, and that its failure to enter the nucleus is closely related to the cancerous state.
Genetics of cancer: For those who may be interested, Klug and Cummings has an entire chapter on genetics and cancer that we do not have time to cover in this course (Chapter 21, pages 591-613). Our current textbook (Fairbanks and Andersen) also has a chapter on Genes and Cancer (Chapter 24, pages 701-719).