Textbook Assignment: Chapter 16, pages 461-469. These notes also contain material not covered in the textbook.
Major concepts
Review of basic techniques: The use of restriction fragment length polymorphisms (RFLPs) and variable length tandem repeats (VNTRs) as genetic markers and tools in DNA fingerprinting is heavily dependent on the use of Southern blotting with appropriate restriction endonucleases and carefully selected probes. Therefore, these notes begin with a brief summary and review of these essential techniques, which have already been discussed in previous lectures.
Southern blotting: Genomic DNA or DNA from a specific source, such as a lambda phage or cosmid clone, is digested, usually to completion, with a restriction endonuclease. Electrophoresis is then used to separate the fragments by size. The fragments are then blotted from the electrophoretic gel onto a sheet of nitrocellulose or similar support material, and fixed onto it by heating or other treatments. The attached DNA fragments are denatured to separate the strands and annealed with a radioactive probe that is single stranded or also denatured. The nitrocellulose sheet is then washed, removing all unbound probe, and leaving radioactivity only where the probe has hybridized to the original DNA bound to the membrane. A sheet of X-ray film is then laid over the nitrocellulose for a time period long enough for the radioactivity to "expose" the film. When the film is developed, dark bands appear wherever there were DNA fragments capable of hybridizing with the radioactive probe. Size standards run on the same electrophoretic gel allow the sizes of the fragments identified by the probe to be determined.
Cloned DNA probes: Any DNA (or RNA) that can be prepared as a single uniform sequence can be used as a probe. Cloned DNA is particularly useful, because it consists of multiple copies of a single sequence, and is generally carried in a vector that is sufficiently "foreign" so that it will not react with any of the DNA in the Southern blot that is being analyzed. If a probe hybridizes with only a single band, one can conclude that only one size class of fragments contains the probe sequence. However, if two or more bands hybridize, two very different interpretations are possible: 1) that there is a restriction endonuclease cut site within the sequence that hybridizes to the probe, causing the hybridizing sequence to be cleaved into two different restriction fragments that can both hybridize with parts of the probe; or 2) that more than one copy of the target sequence was present in the original DNA sample, with each copy emerging in a different sized restriction fragment. Because DNA fingerprinting (discussed in the next lecture) is often concerned with finding repeated sequences, a relatively small probe is generally used to minimize the chance of a single target sequence being cleaved into two halves during digestion of the original sample.
Restriction mapping: Digestion to completion with a restriction enzyme generates a series of fragments of discrete sizes from genomic DNA. In many cases, a single probe, such as a cDNA is long enough to hybridize with more than one such fragment in a Southern blot. Alternatively, a large cloned sequence can be cut into a series of smaller fragments that can be identified by size without the need for a specific hybridization probe. This can be done through the use of a fluorescent dye such as ethidium bromide, which intercalates into the DNA and causes all bands to fluoresce as illustrated in figure 15.24 (page 445). As a first step toward reconstructing the overall sequence in either of these cases, it is often useful to construct a restriction map. Several techniques can be used to do so.
Partial digests: Partial digestion will yield a series of fragments of various length, some of which are the sum of two or three or more shorter fragments. It is possible to end label the shorter fragments and use them as probes to determine which of the longer fragments contain the same sequences. This allows nested sets of fragments that contain the same sequences to be identified, and when overlapping fragments are obtained, it is often possible to fit them all together to reconstruct the original larger fragment, or the entire area that hybridizes with a probe.
Double digests: Digestion with two different restriction endonucleases separately and with the two together yields a larger total number of fragments that can be fitted together to generate a more detailed restriction map of the area of interest (figure 15.25). Restriction mapping with two or more enzymes yields relatively short segments of DNA of known position that provide a physical map of the genomic region upon which known mutations and other markers of interest can be localized. It also provides an ordered series of relatively small fragments for sequencing, which collectively can be used to sequence a relatively long stretch of DNA.
Use of RFLPs as genetic markers: When a specific cloned DNA probe is used to analyze a Southern blot of human (or other) DNA, a limited number of restriction fragments of specific and characteristic lengths will be identified. Because single base mutations can either create additional restriction sites or destroy pre-existing sites, DNA preparations from different individuals frequently exhibit different patterns of size distribution of restriction fragments that hybridize with a particular probe. These differences are called restriction fragment length polymorphisms (RFLPs). In many cases, the genetic polymorphisms that generate RFLPs will have no obvious genetic effect because they are located in introns or involve "silent" mutations that convert a codon to different codon specifying the same amino acid. However, they are inherited as codominant Mendelian markers and are extremely useful in studies of human genetic linkage.
Annonymous probes: The special advantage of RFLPs as genetic markers is that they do not need to have any special properties other than the existence of the restriction endonuclease that responds to the presence or absence of a particular cut site and the availability of a probe that can be used to visualize the fragments. Any random clone, including sequences located in introns or between genes, that happens to emerge during "shotgun" cloning can potentially be used as a probe. Probes of this sort that do not correspond to any known genes are referred to as annonymous probes. Many useful RFLPs are identified with annonymous probes.
Human linkage markers: It is difficult to find suitable linkage markers for human genetic linkage studies. The total number of known genes is still rather small (although it is now growing rapidly because of the human genome project). In addition, many of the genetic loci have been identified only in terms of relatively rare alleles that cause disease phenotypes, with the vast majority of the population carrying the wild-type alleles that do not differ from one individual to another.
Codominant expression: RFLP haplotypes (RFLPs carried on single chromosomes in a genome) are stable genetic markers that are inherited in a codominant manner, often with a relatively high frequency of alternative alleles in healthy individuals. This allows them to be used in all types of genetic studies, including analysis of their linkage to the genes responsible for human genetic diseases. Because of their usefulness, large numbers of human RFLPs have been studied in detail, including the chromosomal locations of the DNA sequences responsible for the polymorphisms.
Linkage to RFLP haplotypes: Because most human genetic diseases are initially identified only by the disease phenotype, demonstration of linkage to a specific RFLP haplotype is frequently the first step toward identifying the chromosome that carries the disease gene. In addition, a close linkage (identified by a high lod score) can localize the disease gene to a specific region of the chromosome. This in turn provides the starting point for studies leading to the isolation and cloning of the specific gene that is responsible for the disease. Six different examples of identificaiton and cloning of genes responsible for inherited human diseases are presented below, each of which employed a somewhat different experimental approach.
Neurofibromatosis: Type 1 neurofibromatosis is an autosomal dominant condition associated with a wide range of nervous system defects, including benign tumors and learning disabilities. As described in the textbook (pages 464-465), a search for linkage to specific RFLPs localized the candidate gene to a region near the centromere of human chromosome 17. After the gene was localized as much as possible, chromosome walking was undertaken until a candidate gene was encountered. Its involvement in the disease was verified by sequencing studies that showed mutations in individuals afflicted with the disease. The overall process that led to the discovery of the NF1 gene is callled positional cloning. The wild-type gene appears to function in intracellular signal transduction, and more specifically in down-regulating cellular reproduction.
Marfan syndrome: A rather different approach was taken to identify the gene that is defective in Marfan syndrome, an autosomal dominant condition that causes alterations in connective tissue. Particular attention was given to genes coding for proteins known to function in various types of connective tissue. A protein known as fibrillin, which is found in tissues known to be affected by Marfan syndrome was identified as a likely candidate. The gene for fibrillin had already been cloned and mapped to the long arm of human chromosome 15. RFLP studies verified a linkage between the inheritance of Marfan syndrome and markers on chromosome 15. Cloning of the fibrillin gene from individuals with Marfan syndrome then verified the substitution of a proline for arginine at position 239 in the protein. The textbook describes this as the candidate gene approach.
Huntington disease: The search for the gene responsible for Huntington disease (also known as Huntington's chorea) was described in a previous textbook as an example of the use of RFLPs (Weaver and Hedrick, Basic Genetics, 2nd Edition, pages 393-399 -- on reverve in Norlin Library). Huntington disease is an autosomal dominant degenerative brain disease that usually does not exhibit any obvious symptoms prior to middle age. There is then a progressive loss of motor coordination, accompanied by uncontrolled spontaneous movements, ultimately resulting in death, but only after a prolonged period of increasingly severe symptoms.
Extended pedigrees and anonymous probes: Two key elements in the search for the HD gene were the existence of a large family in Venezuela with seven generations of documented HD and a rather long (about 15 kb) annonymous probe known as G8. The pedigree of the family covered seven generations, with the disease apparently traced back to a settler of European origin. The G8 probe identifies a complex RFLP pattern in DNA cut with Hind III. Because two polymorphic Hind III cut sites are involved, a total of four different G8 haplotypes are possible. The actual polymorphisms involve a 15.0 kb fragment vs. a 17.5 kb fragment (determined by presence or absence of cut site 1, which is beyond the 5'-end of the probe) and a 4.9 kb fragment vs. fragments of 1.2 and 3.7 kb (determined by cut site 2, which is located within the region that hybridizes with the probe). Because each individual is diploid, anyone who is heterozygous will exhibit two different haplotype patterns superimposed. In the Venezuelan family, HD was strongly associated with the C haplotype (possessing both polymorphic cut sites). In other pedigrees, different G8 haplotypes may be associated with HD. Do not make the mistake of assuming that a specific G8 haplotype is associated with HD outside of the family group in which the association has been demonstrated.
Chromosomal localization: Studies on cultured mouse cell lines that also contained a single human chromosome revealed that the G8 probe was associated with chromosome 4, and further studies placed the HD gene near one of the ends of chromosome 4. Ultimately, it was tentatively associated with a region of about 500 kb on chromosome 4.
Identification of the HD gene: Within the 500 kb region, a process known as exon-trapping was used to specifically examine sequences that were bordered by splicing signals that marked intron/exon boundaries. This procedure clones random fragments from the region of interest into a special vector that is engineered so that a splicing reaction will occur if the cloned fragment contains an intron/exon boundary. The splicing reaction in turn changes the selective pattern of the vector, such that it becomes possible to select for those that contain cloned splice sites. Sequences that had been identified as presumptive exons were then used to identify cDNA clones that contained the complete coding sequences of the corresponding genes.
Triplet repeats: One of these coded for a large protein (3,144 amino acids) that did not closely resemble any known proteins. In addition, the protein had a very unusual region of sequence with 23 glutamine residues in a row, coded by a repeated triplet CAG (with one CAA codon also). When the same gene was isolated from known HD patients, the number of CAG repeats was found to be greatly expanded, ranging from 42 to about 100 in initial studies. A further study of unaffected people revealed a range of 11 to 34 CAG repeats, with 98% of the unaffected individuals under 24 repeats. In a few rare cases of afflicted individuals with two normal parents, new mutations had expanded the number of CAG repeats in the patients.
Cystic fibrosis: Cystic fibrosis is a complex disease with a variety of phenotypic manifestations. Thickened mucous that leads to respiratory problems and susceptibility to one particular type of pneumonia are the usual causes of death, but there are also numerous other problems, including digestive difficulties, abnormal composition of sweat, etc. The disease was found in linkage studies to be assoicated with a region on the long arm of human chromosome 7. Investigators then took advantage of the fact that active genes have CCGG sequences in which the second C is not methylated, while inactive genes often have that site methylated. The restriction endonuclease Hpa II cuts only when there is no methylation. This allowed identification of "islands" of DNA that could be cut in a background "sea" of DNA that could not be cut within the general region where the CF gene was thought to reside.
Brute force sequencing: This pinned down a region of 1.5 megabases. Sequencing through about 250 kb yielded a candidate gene that proved to be altered in CF patients. The gene codes for a membrane-spanning chloride transport channel whose malfunction explains the diverse phenotypic properties associated with the disease (this is explored in extended detail in MCDB 3120). The most common molecular defect is loss of function of a key part of the protein that binds ATP to provide the energy needed for the transport. The localized nature of the most common defects in this gene provides a basis for genetic screening to detect the defects in carriers.
Duchenne muscular dystrophy: Duchenne muscular dystrophy (DMD) is a sex linked disease that causes skeletal muscle cells to be very fragile, such that they are continually breaking down and being replaced by regeneration. The actual disease phenotype usually does not seriously manifest itself until 3-5 years of age when the victim uses up all of the replicative potential of skeletal muscle satellite cells, such that regeneration can no longer occur effectively. There is also an unusually high rate of spontaneous mutation to generate new cases of DMD, which appears to be related to the huge size of the gene (2 megabases of DNA, coding for a protein with a molecular weight of about 400,000).
Deletion analysis: Females do not normally exhibit the disease, because male victims do not live long enough to reproduce. Thus, rare female cases usually result from inheritance from a heterozygous mother plus a new mutation in the paternally-derived X-chromosome. Karyotypic analysis of several female patients revealed small deletions in one of their X-chromosomes at a particular location, which together with a mutation in the other X appeared to be responsible for their DMD. Analysis of the genetic region in which these deletions occurred led to identification of the DMD gene.
Hereditary breast cancer: The search for the gene defect responsible for hereditary breast cancer involved a variety of competing laboratories. The gene was traced to the long arm of chromosome 17, and through a series of linkage studies localized within about 600 kilobases. Techniques that enrich for cDNAs that hybridize within a specific chromosomal region were used to identify candidate genes, including the gene now designated BRCA1, which was found to be mutated in the familial breast cancer patients. Gene BRCA1 codes for a protein of 1863 amino acids that has DNA binding properties (zinc fingers) and an acidic C-terminal domain that together make it appear to be a transcription control factor. A wide variety of mutations in various parts of this protein have been found in breast cancer pedigrees, making it unlikely that genetic screening can easily be established, since most are point mutations.
Abnormal localization of BRCA1 protein in breast cancer cells: In breast cancer cells, the BRCA1 gene product is found abnormally located in the cytoplasm, rather than in the nucleus where it needs to be to act as a transcription factor. Unexpectedly, this was found to be true even when the cancer was not related to BRCA1 mutations, suggesting that interactions with other proteins are needed for normal function of the BRCA1 protein, and that its failure to enter the nucleus is closely related to the cancerous state.
Genetics of cancer: For those who may be interested, our textbook has an entire chapter on genetics and cancer that we do not have time to cover in this course (Chapter 21, pages 591-613).
Prenatal diagnosis: Techniques such as chorionic villus sampling or amniocentesis (withdrawal of amniotic fluid that contains viable fetal cells, figure 16.6) allow small amounts of fetal DNA to be obtained for prenatal diagnosis of genetic defects. Numerous types of assays are now possible, ranging from gross karyotypic analysis to detect conditions such as Down syndrome to RFLP analysis and allele-specific ologonucleotide screening, which can detect single base pair changes in coding sequences. One example presented in the text is RFLP analysis to detect sickle cell anemia. In this case, the sickle cell mutation results in loss of a restriction site (Figure 16.8). Allele specific hybridization of a relatively short oligonucleotide is used to detect the most common mutation in cystic fibrosis. We dp not have time to analyze these procedures in detail, but the relevant portion of the textbook (page 466-469 should be read).