Textbook Assignment: Chapter 9, Pages 287 - 293 (includes chapter-end material for entire chapter). Some of the material in these notes is not described in the textbook. Please consider these notes to be part of the text for this lecture.
Symbol substitution: The symbol ø is being used in these notes to represent the lower case Greek letter phi.
Major concepts
DNA sequencing methods: Modern procedures for determining DNA sequences are employ two specialized types of technology:
Overview of Sanger dideoxy method:. The dideoxy chain termination procedure developed by Frederick Sanger begins with a primer of precisely defined length that has a radioactive 32P-phosphate label at its 5' end. The primer is designed to hybridize at the 3'-end of the template strand whose sequence is to be determined. Synthesis of a new strand of DNA is initiated from the primer, and a chain termination process (incoproration of a 2'-,3'-dideoxynucleotide, which leaves no 3'-OH group for further chain growth) is used to stop synthesis selectively at any one of the four DNA nucleotides. This generates a series of fragments whose relative lengths can then be used to read the nucleotide sequence. The Sanger procedure is described in detail below, togehter with a brief review of the basic molecular biology that it is based on.
5' to 3' synthesis of new DNA: The backbone structure of DNA consists of alternating deoxyribose and phosphate molecules linked together through a series of 3' to 5'-phosphodiester bonds, with one of the four possible DNA bases attached at the 1'-position of each deoxyribose. During DNA synthesis, deoxyribonucleotide triphosphates are guided into position at the 3'-end of the growing strand by base pairing with the template strand, and then form phosphodiester bonds with the free 3'-hydroxyl group by splitting off pyrophosphate residues. The new strand is thus assembled one nucleotide at a time in a 5'-to-3' direction, with the last nucleotide added always providing a new 3'-hydroxyl group for addition of the next nucleotide.
Primers: You will recall from our discussion of DNA synthesis that the DNA polymerase enzymes are unable to start the synthesis of new chains without a primer because they must have a free 3'-hydroxyl group at the end of a pre-existing polynucleotide chain as a site for addition of the next nucleotide. During DNA replication, this problem is solved by use of a primase enzyme that synthesizes a short RNA primer whose free 3'-hydroxyl group provides a starting point for growth of the new DNA strand. In sequencing reactions, a purified DNA polymerase with no primase activity is used, thus preventing any possible spontaneous initiation of synthesis at unwanted locations. Priming is done with with a synthetic oligonucleotide of precisely known length that is complementary to a specific sequence on the template strand.
Vector-specific primers: In many cases, the primer is complementary to a portion of the multiple cloning sequence of the vector adjacent to the cloned DNA insert that is to be sequenced. In such cases, the sequence that is read will begin with the portion of the vector between the end of the primer and the beginning of the insert. The sequence of the cut site for the restriction endonuclease used to clone the insert identifies the boundary between the vector and the insert.
DNA synthesis: For sequencing, four identical reaction mixtures are set up in four separate containers, each with the same template and same labeled primer. Synthesis is usually done with the Klenow fragment of DNA polymerase I, which supports normal 5' to 3' synthesis of DNA and retains the 3' to 5' exonuclease activity that is used in DNA proofreading, but lacks the 5' to 3' exonuclease activity that is normally used for removal of RNA primers during DNA synthesis. (Our textbook simply says that DNA polymerase is used). Absence of the 5' to 3' exonuclease assures that there will be no degradation of the 5'-ends of the newly synthesized fragments, whose relative lengths must be accurately read to the nearest nucleotide in order to be able to determine the DNA sequence.
Chain termination: Chain termination is achieved in the Sanger method by incorporating a nucleotide analog at the end of the growing DNA strand that lacks a 3'-hydroxyl group for attachment of the next nucleotide. This is done by individually introducing a small amount of the 2',3'-dideoxyribonucleotide-5'-triphosphate for one of the four DNA bases (figure 9.35) into each of the four DNA synthesis reaction mixtures described above. The dideoxy analogs base pair and are incorporated onto the 3'-ends of the growing chains like normal deoxynucleotides, but they lack the 3'-hydroxyl group needed for addition of the next nucleotide. Thus, whenever one of them is randomly incorporated, growth of the chain is irreversibly terminated at that point. The relative frequency of termination is determined by the ratio of dideoxy nucleotide triphosphates to normal dNTPs. A different dideoxyribonucleotide triphosphate is added to each of the four reaction mixtures. The amount is adjusted relative to the normal deoxyribonucleotide triphosphate so that the probability of chain termination is low enough to generate a wide range of fragment sizes and yet high enough so that all of the fragment sizes can be detected.
Electrophoresis and autoradiography: After the synthesis step has been completed, the next step is to denature the DNA to separate the newly synthesized radioactive fragments from their templates. Each reaction mixture now contains a mixture of labeled single-stranded DNA fragments whose relative lengths reflect every occurrence of the DNA base corresponding to the dideoxynucleotide triphosphate used in that particular reaction mixture. The denaturation step is followed by electrophoresis on elongated sequencing gels. The fragments produced by the four chain-terminating didoxyribonucleotides are loaded into four side-by-side lanes on the sequencing gel. After electrophoresis is completed, the radioactive fragments are blotted onto nitrocellulose and autoradiography is used to identify the positions occupied by the fragments.
Reading the sequence: The sequence of the newly synthesized strand (reflected in the family of chain-terminated fragments) is read from the developed autoradiographic film, starting from the shortest fragment, which will consist of the labeled primer with one added dideoxynucleotide. The dideoxynucleotide that stopped the reaction serves to identify the first nucleotide in the sequence. In figure 9.34, ddTTP caused the first stop, showing that the first base in the sequence was T. Growth of the next longer fragment was stopped by ddCTP, showing that the second base was C, etc. Reading up the gel (from shortest to longer fragments) yields the sequence of the newly synthesized strand in figure 9.34 as TCCATGGACCAGAGA.
Reverse complement: The sequence that is read from the sequencing gel is the reverse complement of the template strand because of complementary base pairing and the anitparallel nature of the double helix. Thus, the first base added to the newly synthesized strand (corresponding to the shortest fragment on the sequencing gel) is complementary to the extreme 3'-position of the template strand, immediately adjacent to the site that the primer hybridizes to. The sequence of the template strand can be read down the gel (from longest to shorter fragments) by substituting the base-pairing complement for each of the bases identified by the dideoxynucleotides that were used to terminate the chain growth. This is referred to as reading the reverse complement of the newly synthesized strand. The newly synthesized strand itself is the reverse complement of the template strand. Genetic coding sequences may be found in either of the strands in a sequencing reaction.
Single-stranded template: Sequencing is frequently done with a single stranded clone in a viral vector such as a modified M13 virus that contains a multiple cloning site. The viral DNA has a double-stranded replicative form, which makes cloning into it relatively easy, and a single-stranded form in the virion (virus particle), which is convenient for use as the template in a sequencing reaction. The primer that is normally used is complementary to a portion of the multiple cloning site just beyone the 3'-end of the cloned insert.
Alternative methods: It is also possible to do sequencing directly from denatured double-stranded DNA, as long as a primer can be found that anneals uniquely to just one of the strands. In practice, a wide variety of alternative methods are used, with primers now commercially available for many of the widely used cloning vectors. In addition, sequencing is often done from both ends of a clone, such that part of the sequence is read directly while the rest is obtained initially as the reverse complement.
Sequential priming: It is usually possible to read 200-300 nucleotides of sequence from a good sequencing gel. One trick that can be used to read further is to make a synthetic oligonucleotide primer whose sequence is the same as the last of the sequence that can be read unambiguously from the original gel. Priming from there allows additional sequence to be read. This process can be repeated as many times as needed to read longer DNA sequences.
Non-radioactive automated sequencing: Large scale sequencing is now usually done with non-radioactive primers and with dideoxynucleotide triphosphates that have fluorescent dyes attached to them. Four different colors of fluorescence are used to distinguish the four dideoxynucleotides. Each chain termination event leaves its DNA fragment labeled with the fluorescent dye corresponding to the nucleotide responsible for the chain termination. A single reaction mixture containing all four labeled ddNTPs is used and the resulting mixture of chain-terminated fragments is subjected to polyacrylamide gel electrophoresis. As each fragment reaches the end of the gel, it travels through a fluorescence detector, starting with the smallest fragments, which move through the gel the most rapidly. Each band of fluorescence is read automatically and recorded (figure 9.37). The DNA sequence is also printed out automatically based on the sequence of colors of the fluorescent peaks that are detected.
Open reading frames: As sequence data become available, it is desirable to identify potential genes. This is usually done by using a computer program to search for open reading frames. Because there are 3 stop codons out of a total of 64 possibilities, there is a 3/64 probability that any randomly selected group of three bases will be a stop codon. This means that on the average, a stop codon is encountered about once in every 21 codons of random DNA, which is much shorter than a typical protein. An open reading frame (ORF) in a prokaryotic genome is defined as a stretch of DNA that starts with an ATG (or GTG) initiation codon and contains no stop codons over a distance large enough to code for a protein. An open reading frame is viewed as a potential gene, although the relationship cannot be considered to be proven until transcription and translation of the sequence have been verified. The definition is somewhat more complex in eukaryotic genomic DNA because it must also take into account the presence of introns in virtually all protein coding sequences. However, the principle is the same -- to identify potential protein coding sequences contained within a much larger amount of non-coding DNA. We will return to the techniques that are being used when we examine the human genome project and other large-scale sequencing projects.
Overlapping genes: One of the interesting observations that came from sequencing of the genome of bacteriophage øX174 was that this very small single-stranded DNA virus uses overlapping regions of its genome to code for proteins in different reading frames. Although unusual, this phenomenon is not strictly unique to øX174 (if you wish to pursue this topic further, see page 335 of last year's textbook, Klug and Cummings, Concepts of Genetics, 5th Edition (Norlin Reserve). ).