Textbook Assignment: Chapter 15, pages 449-450; Appendix A, pages A6-A7. This topic is not covered as well as it could be in our current textbook. Please study the material on the Sanger dideoxynucleotide chain termination method carefully, including the description of the proceduremethod in Appendix A. The Maxam and Gilbert selective cleavage method is mostly of historical interest and does not need to be studied in comparable detail. Please regard these notes as a supplement to the textbook and study them in detail.
Symbol substitution: The symbol ø is being used in these notes to represent the lower case Greek letter phi.
Major concepts
DNA sequencing methods: Modern methods for determining DNA sequences are based on two specialized types of technology: 1) methods for cleaving DNA strands or selectively terminating their growth at specific nucleotides (A, C, G, or T) so that single-stranded DNA fragment length can be used to identify nucleotide positions in the DNA sequence, and 2) high resolution electrophoresis that is sensitive enough to separate single-stranded DNA fragments that differ in length by one nucleotide. Our textbook rather briefly describes two quite different techniques for determining the nucleotide sequence of DNA, both of which were published in 1977.
Maxam and Gilbert method:The first method to be published was developed by Alan Maxam and Walter Gilbert. It uses specialized chemical treatments to cleave single-stranded DNA selectively at specific nucleotides, generating a series of fragments whose relative lengths can be used to read the nucleotide sequence. Although it was an important breakthrough when it was first published, the Maxam and Gilbert method is more complex and less convenient to use than the dideoxy chain termination method described below, and is now mostly of historical interest. You only need to have a general idea of how this method works.
Sanger dideoxy method:. The chain termination procedure developed by Frederick Sanger is now the most widely used. It begins with a template strand of DNA and a precisely defined primer that has a radioactive 32P-phosphate label at its 5' end. Synthesis of a new strand of DNA is initiated from the primer, and a chain termination process (described below) is used to stop synthesis selectively at any one of the four DNA nucleotides. This generates a series of fragments whose relative lengths can then be used to read the nucleotide sequence, as described below. Because the Sanger method is easier to understand and far more widely used, it is the only one we will examine in detail in this course. It is important for you to understand this method well, despite the rather limited discussion of it in our current textbook. The clearest description of the method in the textbook starts at the very bottom of page A6 and continues onto page A7.
5' to 3' growth of newly synthesized strands of DNA: The backbone structure of DNA consists of alternating deoxyribose and phosphate molecules linked together through a series of 3' to 5'-phosphodiester bonds, with one of the four possible DNA bases attached at the 1'-position of each deoxyribose. During DNA synthesis, deoxyribonucleotide triphosphates are guided into position at the 3'-end of the growing strand by base pairing with the template strand, and then form phosphodiester bonds with the free 3'-hydroxyl group by splitting off pyrophosphate residues. The new strand is thus assembled one base at a time, with the last nucleotide added always providing a new 3'-hydroxyl group for addition of the next nucleotide.
Primers: You will recall from our discussion of DNA synthesis that the DNA polymerase enzymes are unable to start the synthesis of new chains without a primer because they must have a free 3'-hydroxyl group to add the next nucleotide to. During DNA replication, this problem is solved by use of a primase enzyme that synthesizes a short RNA primer whose free 3'-hydroxyl group provides a starting point for growth of the new DNA strand. In sequencing reactions, a purified DNA polymerase with no primase activity is used, thus preventing any possible spontaneous initiation of synthesis at unwanted locations. Priming is done with with a synthetic oligonucleotide that is complementary to a specific sequence on the template strand.
Vector-specific primers: In many cases, the primer is complementary to a portion of the multiple cloning sequence of the vector just upstream from the cloned DNA insert that is to be sequenced. In such cases, the sequence that is read will begin with the portion of the vector between the end of the primer and the beginning of the insert. The sequence of the cut site for the restriction endonuclease used to clone the insert signals the boundary between vector and insert.
DNA synthesis: For sequencing, four identical reaction mixtures are set up in separate containers, each with the same template and same labeled primer. Synthesis is usually done with the Klenow fragment of DNA polymerase I, which supports normal 5' to 3' synthesis of DNA and retains the 3' to 5' exonuclease activity that is used in DNA proofreading, but lacks the 5' to 3' exonuclease activity that is normally used for removal of RNA primers during DNA synthesis. (The Klenow fragment is described in the glossary, page B11, but is not indexed elsewhere in book). Absence of the 5' to 3' exonuclease assures that there will be no degradation of the 5'-ends of the newly synthesized fragments, whose relative lengths must be accurately read to the nearest nucleotide in order to be able to determine the DNA sequence.
Chain termination: Chain termination is achieved in the Sanger method by incorporating a nucleotide analog at the end of the growing DNA strand that lacks a 3'-hydroxyl group for attachment of the next nucleotide. This is done by individually introducing a small amount of the 2',3'-dideoxyribonucleotide-5'-triphosphate for one of the four DNA bases into each of the four DNA synthesis reaction mixtures described above. The dideoxy analogs base pair and are incorporated onto the 3'-ends of the growing chains like normal deoxynucleotides, but they lack the 3'-hydroxyl group needed for addition of the next nucleotide. Thus, whenever one of them is randomly incorporated, growth of the chain is irreversibly terminated at that point. The relative frequency of termination is determined by the ratio of dideoxy nucleotide triphosphates to normal dNTPs. A different dideoxyribonucleotide triphosphate is added to each of the four reaction mixtures. The amount is adjusted relative to the normal deoxyribonucleotide triphosphate so that the probability of chain termination is low enough to generate a wide range of fragment sizes and yet high enough so that all of the fragment sizes can be detected.
Electrophoresis and autoradiography: After the synthesis step has been completed, the next step is to denature the DNA to separate the newly synthesized radioactive fragments from their templates. Each reaction mixture now contains a mixture of labeled single-stranded DNA fragments whose relative lengths reflect every occurrence of the DNA base corresponding to the dideoxynucleotide triphosphate used in that particular reaction mixture. The denaturation step is followed by electrophoresis on elongated sequencing gels. The fragments produced by the four chain-terminating didoxyribonucleotides are loaded into four side-by-side lanes on the sequencing gel. After electrophoresis is completed, the radioactive fragments are blotted onto nitrocellulose and autoradiography is used to identify the positions occupied by the fragments.
Reading the sequence: The sequence of the newly synthesized strand (reflected in the family of chain-terminated fragments) is read from the developed autoradiographic film, starting from the shortest fragment, which will consist of the labeled primer with one added dideoxynucleotide. The dideoxynucleotide that stopped the reaction serves to identify the first nucleotide in the sequence. In figure A.12 in Appendix A, ddTTP caused the first stop, showing that the first base in the sequence was T. Growth of the next longer fragment was stopped by ddGTP, showing that the second base was G, etc. Reading up the gel (from shortest to longer fragments) yields the sequence of the newly synthesized strand. This is shown in a drawing for a sequence starting TGCAATCG... in figure A.12 in Appendix A.
Textbook error: There is also a dubious phorograph of a portion of a sequencing gel in Chapter 15 (Fig. 15.17). The legend claims that it shows the sequence TTAACCCGG..., but the bottom of the photograph was cropped during production of the book, such that the TTAA segment is totally missing. The sequence that is visible reads CCCGGCACGGC.... Unfortunately, this photograph is not of a very good sequencing gel. The bands are fuzzy and too close together, such that it is hard to determine how many times the same letter is repeated in some of them.
Reverse complement: The sequence that is read from the sequencing gel is the reverse complement of the template strand because of complementary base pairing and the anitparallel nature of the double helix. Thus, the first base added to the newly synthesized strand (corresponding to the shortest fragment on the sequencing gel is complementary to the extreme 3'-position of the template strand, immediately adjacent to the site that the primer hybridizes to. The sequence of the template strand can be read down the gel (from longest to shorter fragments) by substituting the base-pairing complement for each of the bases identified by the dideoxynucleotides that were used to terminate the chain growth. This is referred to as reading the reverse complement of the newly synthesized strand. The newly synthesized strand itself is the reverse complement of the template strand. Genetic coding sequences may be found in either of the strands in a sequencing reaction.
Sequential priming: Sequencing is frequently done with a single stranded clone in a viral vector such as a modified M13 virus that contains a multiple cloning site. The viral DNA has a double-stranded replicative form, which makes cloning into it relatively easy, and a single-stranded form in the virion (virus particle), which is convenient for use as the template in a sequencing reaction. The primer that is normally used is complementary to a portion of the multiple cloning site just upstream from where the cloned insert starts. It is usually possible to read 200-300 nucleotides of sequence from a good sequencing gel. One trick that can be used to read further is to make a synthetic oligonucleotide primer that is the same as some of the last sequence that can be read unambiguously from the original gel. Priming from there allows additional sequence to be read, and this process can be repeated until the entire sequence has been identified.
Alternative methods: It is also possible to do sequencing directly from denatured double-stranded DNA, as long as a primer can be found that anneals uniquely to just one of the strands. In practice, a wide variety of alternative methods are used, with primers now commercially available for many of the widely used cloning vectors. In addition, sequencing is often done from both ends of a clone, such that part of the sequence is read directly while the rest is obtained initially as the reverse complement.
Non-radioactive automated sequencing: Large scale sequencing is usually now done with non-radioactive primers and with dideoxynucleotide triphosphates that have fluorescent dyes attached to them. Four different colors of fluorescence are used to distinguish the four dideoxynucleotide reaction mixtures. Each chain termination event leaves its DNA fragment labeled with the fluorescent dye corresponding to the nucleotide responsible for the chain termination. A single reaction mixture containing all four labeled ddNTPs is used and the resulting mixture of chain-terminated fragments is run together in the same lane of a polyacrylamide gel. The gel is then scanned with a fluorescence reader, or the fragments are electrophoresed off the end of the gel and pass through a liquid fluorescence scanner. In either case, the fluorescence detector automatically records the color peaks and converts the sequence of colors into a nucleotide sequence as shown in figure 15.28.
Open reading frames: As sequence data become available, it is desirable to identify potential genes. This is usually done by searching for open reading frames. Because there are 3 stop codons out of a total of 64 possibilities, there is a 3/64 probability that any randomly selected group of three bases will be a stop codon. This means that on the average, a stop codon is encountered about once in every 21 codons of random DNA, which is much shorter than a typical protein. An open reading frame (ORF) is defined as a stretch of DNA that starts with an AUG (or GUG) initiation codon and contains no stop codons over a distance large enough to code for a protein. An open reading frame is viewed as a potential gene, although the relationship cannot be considered to be proven until transcription and translation of the sequence have been verified.
Overlapping genes: One of the interesting observations that came from sequencing of the genome of bacteriophage øX174 was that this very small single-stranded DNA virus uses overlapping regions of its genome to code for proteins in different reading frames. Although unusual, this phenomenon is not strictly unique to øX174 (see page 335 of our textbook).