Revised November 19, 1998

MCDB 2150 Lecture 32

DNA fingerprinting, Human genome project

Textbook Assignment: Chapter 16, pages 473-478, Boxed article on pages 482-483, Chapter 17, pages 512-514.

Major concepts:

Variable number tandem repeats (VNTR): In addition to single copy genes, the genomes of humans and other complex organisms contain DNA sequences that are present in variable numbers of multiple copies, including some that are highly repetitive and others that are repeated only a few times. Some of the sequences that are repeated only a few times are clustered together as tandem repeats. Such clusters are called minisatellites . When Southern blots were probed with for the presence of the repeated sequences, it was found that the size distributions of the restriction fragments carrying those sequences varied greatly from one individual to another. Further analysis revealed that the reason for the variation was that the numbers of tandem repeats were different in different individuals. Such sequences are now commonly referred to as variable number tandem repeats (VNTR).

DNA fingerprinting: With appropriate probes, together with restriction endonucleases that do not cut within the repetitive sequences, it is possible to generate detailed patterns of bands on Southern blots that are highly individualistic. This has given rise to the technique of DNA fingerprinting, which is now widely used in criminal investigations to match blood, hair or semen left at a crime scene to that of suspects. When done properly, such techniques can identify a specific individual with virtual certainty (although defense lawyers are still finding clever ways to challenge the evidence). Non-human applications are also possible. A boxed article on pages 482-483 describes a case in which DNA fingerprinting was used to identify a particular tree as the source of a seed pod in a criminal investigation

Codominant Mendelian inheritance: VNTRs show up on Southern blots as fragments of different sizes that are inherited as Mendelian markers with strictly codominant expression. In most cases, each individual will exhibit only two bands, one from each parent. Although usually discussed separately because they are caused by the presence of repeated sequences, VNTRs are actually a specialized type of restriction fragment length polymorphism (RFLP) in which the variation in fragment length reflects the number of repeats, rather than the presence or absence of cut sites.

Genomic analysis: Until recently, the large size of a typical genome and the vast number of different genes that it contains has made it impossible to think seriously about gaining a full understanding of the entire sequence. However, a number of advances in molecular biology, including automated sequencing and the ability to use recombinant DNA technology to work with overlapping clones that cover a wide range of sizes, together with advances in computer technology that make it feasible to work with the amount of data that must be handled, are now making it realistically possible to obtain complete genomic sequences for a variety of species, including humans.

Overlapping clones: The textbook illustrates two different techniques for working with overlapping clones. In the bottom-up approach, which works best with relatively small genomes, one starts with a series of overlapping clones, obtained by incomplete digestion, or with several different restriction endonucleases (Figure 16.15). Restriction maps are prepared for the overlapping clones (Figure 15.25), and the overlapping clones are assembled together to generate a longer contiguous region, known as a contig. Additional clones are isolated to link the contigs into larger contigs, until an entire chromosome is represented in one large contig (Figure 16.15). This provides a physical map, which can then be converted to a complete nucleotide sequence by sequencing each of the individual clones that make up the large contig.

Top-down approach: For larger genomes, the top-down approach is generally more effective (Figure 16.16). In this case, the first step is to construct a library of very large overlapping fragments, cloned as YACs. The YACs are then assembled into contigs that cover the entire region to be analyzed. Each YAC is then cut with various restriction endonucleases and subcloned into smaller pieces, which are further restriction mapped until pieces small enough for convenient sequencing are generated.

Linkage groups and chromosomes: For some species, such as yeast, it is possible to physically separate nearly all of the chromosomes as a starting point for genomic analysis (Figure 15.18). Even in the case of human chromosomes, substantial separations of specific chromosomes can be achieved by flow cytometry (Figure 15.17).

Human genome project: A major international project is currently seeking to sequence the entire human genome. This is being done in a coordinated set of steps in many laboratories worldwide. The description of this project in the textbook is substantially oversimplified, but we do not have time to explore it in greater detail. Similar projects are also under way for several other species.

Genetic map: The first step was to generate high resolution genetic maps for each human chromosome, with markers spaced no further than about one Mb (megabase pair = 1 million base pairs) apart. The genetic markers that have been used include restriction fragment length polymorphisms (RFLPs, including VNTRs), sequence tagged sites (STS, sites where randomly cloned sequences are known to hybridize uniquely), and expressed sequence tags (EST, sites where cDNA sequences corresponding to actual expressed genes are known to hybridize uniquely). A genetic map containing about 1,500 markers was completed in 1995. Note that the textbook contains a serious error. The correct number is 1,500, and not 15,000. A collection of 15,000 genetic markers spaced an average of 2 million base pairs apart would define 3 x 1010 base pairs of DNA, whereas the human haploid genome is "only" 3x109 base pairs.

Physical map: The second step was to construct a "physical map" of each chromosome with some 30,000 markers spaced about every 100,000 base pairs. This phase is now essentially completed. More-or-less concurrently with this phase, collections of overlapping YACs that contain these markers have been assembled, and organized into contigs that now cover essentially all of the human genome. The major effort that remains is to subclone these YACs into more manageable sized units, and to obtain the complete sequences for each of the subclones, which will then be assembled to generate the complete human genome sequence of about 3 x 109 base pairs. This ambitious project is moving forward rapidly.

MCDB involvement: Dr. Kenneth Krauter, a member of the MCDB faculty, is directly involved in the human genome project. His laboratory has in the past participated in high resolution physical mapping and assembly of contigs for human chromosome 12, and is currently working on a similar project for human chromosome 18.

Web sites: A number of web sites have been established for sharing the data from the human genome project, which is so vast that it must be managed in large computer databases. All of these sites are quite technical in nature. However, you may want to look in on one of the following:

MIT Whitehead Institute

Genethon in France (click on English Version after connecting)

Online Mendelian Inheritance in Man.

Other species: Genomic mapping projects are also underway for a variety of other species used in research in various MCDB laboratories, including E. coli , yeast, C. elegans , Drosophila , and mice. The massive amount of data that will be gathered in these projects will greatly expand our understaning of many different aspects of molecular biology, including gene regulatory mechanisms, and patterns of evolution.