Revised November 3, 1998; minor additional editing Nov. 4, 1998 (paragraph on blunt-end ligation, addition of notation that terminal transferase adds poly-T and poly-A sequences to 3'-ends of each strand in blunt-ended DNA).

MCDB 2150 Lecture 27

Restriction Endonucleases, Vectors, DNA cloning

Textbook Assignment: Chapter 15, pages 428-437. Some of the material in these notes is not covered well in the textbook. You will be held responsible for the material in the notes.

Major concepts

Recombinant DNA technology: This is the first of a series of lectures on gene cloning and other aspects of recombinant DNA technology. Recombinant DNA in this context refers to the creation of a new combination of DNA segments that are not found together naturally. Such technology is now widely used in many practical applications ranging from basic research on control of gene expression to forensic medicine to biotechnology. This lecture focuses on a group of highly specialized enzymes, the restriction endonucleases, which allow DNA to be cut a specific sites in a manner that permits the rejoining of cut ends to create new combinations. This opens the way for gene cloning by inserting the gene of interest into a self-replicating bacterial plasmid (a small circular DNA molecule with its own origin of replication). Later parts of the lecture discuss the various types of vectors that are available for DNA cloning and some of the techniques used to identify vectors containing cloned DNA sequences.

Restriction endonucleases: An endonuclease is an enzyme that can cleave the phosphodiester bonds of a nucleic acid at an internal site (as opposed to cleavage by an exonuclease, which can only remove nucleotides from one of the ends of a nucleic acid). Some endonucleases cut internal bonds of DNAs or RNAs randomly. However, restriction endonucleases cut both strands of a double stranded DNA only at specific restriction sites. There are many different restriction endonucleases, and each is highly specific for a restriction site, which usually consists of 4, 6, or 8 base pairs, with a few exceptions that will be discussed later.

Restriction sites: Restriction sites are normally palindromes (they have the same DNA base sequence in both directions). The widely used Eco RI enzyme, for example, recognizes the sequence GAATTC (read 5' to 3'). When read 5' to 3', the complementary strand also has the same sequence, GAATTC. Bacteria use restriction endonucleases as defense mechanisms, for example, against viral invasion. Each type of bacterium tends to have its own restriction enzymes and specific recognition sites. Foreign DNA is effectively destroyed by being cut at the recognition sites. Bacteria protect themselves by modifying bases in their own restriction sites by processes such as methylation. The name "restriction" endonuclease was originally given to these enzymes because they "restrict" invasion by foreign DNAs, such as those of bacterial viruses. Strictly speaking, the palindrome-specific restriction endonucleases that we will be dealing with should be called type II restriction nucleases, but the other types are not very useful and are generally ignored.

Frequency of cutting: Because of their restriction site specificity, the restriction endonucleases cut DNA into fragments whose average length is determined by the number of base pairs in the restriction site (and to a lesser extent by the ratio of bases in the DNA). For DNA that has equal amounts of all four bases, each base has a probabilty of 1/4 at any particular position in the DNA sense strand. For a restriction site of 4 base pairs, the probability of random occurrence of that sequence is (1/4)(1/4)(1/4)(1/4) = 1/256. For 6 base pairs, the probability is 1/4,096, and for 8 base pairs it is 1/65,536. Thus, a restriction endonuclease with a 6 base pair restriction site would generate fragments whose average length is 4,096 base pairs. Such fragments are large enough to contain a complete gene (provided that the are no restriction sites within the gene for the enzyme that is used).

Effect of base composition: For DNA whose base composition differs from 50% GT (which is equivalent to equal numbers of all four bases), it is necessary to calculate the probability of a site as the product of the probabilities of each of its components. For example, if a DNA is 66.7% GC (2/3 of its base pairs are GC) and one assumes random orientation of the base pairs, A and T will each have probabilities of 1/6 and G and C will have probabilities of 2/6. Thus the probability of GAATTC would be (1/3)(1/6)(1/6)(1/6)(1/6)1/3) = 1/11,664, as opposed to 1/4096 when all four bases are present in equal amounts. Thus, the average fragment length generated by EcoRI would be longer in a DNA with a higher GC content.

Sticky ends: In many (but not all) cases, the DNA strand is not cut at the center of the restriction site. Thus, for example, Eco RI cuts its GAATTC recognition site between the G and the first A on each of the DNA strands (G|AATTC). This staggered cutting pattern leaves an overhanging segment of 4 base pairs AATT attached to the new 5' ends created by the cuts on each of the strands of all DNA fragments generated by EcoRI. These short unpaired segments are capable of forming transient double helical structures that can hold cut ends together long enough for DNA ligase to reseal them. If the cut products from two different DNA molecules are mixed, some of the ligation will generate new combinations, for example between an isolated gene with sticky ends and the ends of a circular plasmid that has been opened by a single cut with the same enzyme.

Matching sticky ends: Because different restriction enzymes create different types of sticky ends, it is important to to use the same enzyme (or two enzymes that cut at identical sites) to cut both of the molecules that are to be joined. Alternatively, cutting with different enzymes can sometimes be used to avoid joining the wrong combinations together.

Isoschizomers: In certain cases, two or more different enzymes recognize identical sites but have different cutting patterns. Enzymes from different sources that recognize the same site and cut it either the same way or differently are called isoschizomers. Sma I and Xma I, below, are an example of isoschizomers that cut the same site in different ways.

Rejoining sticky ends: It is possible to have an overhang at the 5'-end or the 3'-end, or to cut straight across the middle of the recognition site, leaving blunt ends. In order to use sticky ends for joining recombinant DNA molecules, it is necessary to have the same type of overhang on both fragments. This is usually done by cutting them with the same enzyme. Note that the bond that is formed can be cut again by the same restriction endonuclease, making it relatively easy to retrieve cloned DNA molecules.

Naming of restriction endonucleases: Restriction endonucleases are named for the species and strain of bacteria they are derived from. The first letter is for the genus, the next two for the species designation, the fourth for the strain, and the Roman numerals that follow designate which enzyme from that strain. In addition, the first three letters, standing for genus and species are normally italicized. Thus, Eco RI is the first restriction endonuclease derived from E. coli strain RY13.

Problems with html portrayal of restriction endonuclease names: Because of problems with the on-screen display, we have inserted a space after the italics so that the italic and non-italic portions of restriction endonuclease names will not overlap. Although this is needed for a clear on-screen presentation, it results in the presence of a gap in the printed version of the notes that is not a correct protrayal of the preferred style, and should not be used when not needed for clarity. The printed format used in the textbook is the correct way to write the names of restriction endonucleases. Thus, Eco RI looks better on screen, but EcoRI is the correct printed form.

List of selected restriction endonucleases: The problem sets that accompany this and subsequent lectures will require you to work with a larger set of restriction endonucleases than those listed in our current textbook. In the following list, the sequence is given for only one strand of the palindrome (the other is its reverse complement and is identical when read 5' to 3'). The cut site is shown by a vertical line (|) placed between the bases that are separated by the cut. Sticky ends will result whenever the cut is not at the exact center of the sequence. Pu means any purine (A or G), Py means any pyrimidine (C or G). (A/T) means A or T (an AT base pair in either orientation). Note that Eco RII is unusual in that its recognition sequence contains an odd number of bases.

Blunt-end ligation: As an alternative to restriction endonuclease cloning, which may cut within sequences that one wishes to preserve, it is possible to do blunt-ended cloning of randomly sheared DNA (whose random breaks should sometimes be located outside of the desired sequences). Deoxynucleotide terminal transferase can be used to add short poly-T sequences at the 3'-ends of an opened vector and to add short poly-A sequences to the 3'-ends of the DNA to be cloned (figure 15.3). Repair enzymes and ligase can be used to seal the ends. Alternatively, random blunt end ligation can be done with the phage enzyme T4 DNA ligase. Blunt ends can be generated by shearing the DNA, or in the case of the vector by cutting with blunt end restriction endonucleases, such as HindII.

Vectors: In order to achieve replication of cloned genes, it is necessary to insert them into self-replicating genomes that are referred to as vectors. The most useful vectors possess 3 important properties. First, the vector must contain an origin of replication (ori +), which allows the DNA to replicate itself and the DNA it carries independent of the host DNA. Second, most vectors code for some kind of selectable marker so that the presence of the vector is required for the host organism's survival under selective conditions. Third, to be useful for cloning, vectors must be capable of being opened by restriction endonucleases without any loss of vector DNA. This requires the presence of "unique" restriction sites that are present only once in the entire circular vector DNA. Cutting with a restriction endonuclease that is specific for such a site opens the circle and "linearizes" the vector without any loss of vector DNA. This allows the complete plasmid to be reconstituted simply by rejoining the cut ends. It also allows a foreign DNA sequence to be inserted between the cut ends to create a new larger circular plasmid containing an inserted sequence, such as a cloned coding sequence.

Multiple cloning sites: Most sophisticated modern vectors contain multiple cloning sites, which consist of a short stretches of artificially synthesized DNA containing cut sites for a number of different restriction endonucleases located side by side (see figures 15.7 and 15.13 for examples). This allows selection of a restriction endonuclease for cloning that does not cause internal cuts in the gene being cloned. It also allows different cuts to be made at the two ends of the gene in order to force it into the vector in a particular orientation. In many cases, expression of the gene is driven by a promoter in the vector, and it is necessary to insert the gene so that it is read in a forward direction. One additional advantage of using different cuts at the two ends is that the vector cannot religate without an insert because its two ends no longer have matching sticky ends.

Plasmids: Bacterial plasmids are small circular DNAs that have their own origins of replication and are capable of autonomous replication within bacterial cells. Plasmids that carry appropriate genes are capable of making bacteria resistant to antibiotics, which makes it possible to select for bacteria that have taken up specific plasmids. Because of their small size, plasmids often have only a single cut site for a particular restriction endonuclease, which allows the circles to be opened for adding a foreign DNA without risk of losing parts of the plasmid. They can also be modified by the addition of multiple cloning sites. Another trick that is sometimes used is to eliminate naturally-occurring cut sites by modifying the third positions of codons in ways that eliminate restriction endonuclease cut sites without altering the amino acid coding of genes carried in the vector.

The pBR322 plasmid (textbook figure 15.6) carries genes conferring resistance to ampicillin and tetracycline. If a cut is made with Pst I in the middle of the ampicillin resistance gene, a foreign gene that has been cut out of its genome with Pst I can be inserted into the plasmid because it has the same sticky ends as the opened plasmid. This makes a larger circle, but if the insert is not too large, the plasmid is still capable of replication, thus providing a means for replication of the cloned gene.

Screening: To screen for recombinant PBR322 plasmids, rejoined plasmids with a cloned DNA in the middle of the ampicillin resistance gene are initially infected into bacteria and plated on nutrient agar containing tetracycline. Only those bacteria that have taken up plasmids with intact tetracycline resistance genes can form colonies. Replica colonies are transferred to ampicillin plates. Bacteria containing religated plasmids without inserts will multiply, but those whose plasmids have inserts disrupting the ampicilin resistance gene will not. Comparing the growth patterns allows one to pick colonies from the tetracycline plates that lack ampicillin resistance and thus contain plasmids carrying cloned foreign DNA. Alternatively, this scheme can be reversed by cloning into the middle of the tetracycline resistance gene with Bam HI and then selecting for bacteria that are resistant to ampicillin and sensitive to tetracycline, as shown in textbook figure 15.15.

Blue-white screening is a more sophisticated screen that does not require replica plating and is now more commonly used. This system takes advantage of the fact that a functional beta-galactosidase enzyme can be generated from separate N-terminal and C-terminal halves of the enzyme protein. A bacterial strain that synthesizes only the N-terminal part of the enzyme is used. The plasmid contains a antibiotic-resistance gene plus an engineered gene that has an AUG start codon followed by a complex multiple cloning site (explained below), followed in frame by the C-terminal half of beta-galactosidase. Together the bacteria and the vector generate functional enzyme, which is capable of hydrolyzing a complex synthetic galactoside called X-gal to generate a blue color. A cloned insert disrupts production of the C-terminal part of the enzyme, causing colonies to stay white in the presence of X-gal. Colonies that are resistant to antibiotic and remain white with X-gal are likely to contain a cloned gene insert in their plasmids (see figures 15.7 and 15.8).

Selection: Selection for the presence of a vector is usually performed by adding a specific antibiotic to the media on which the host grows. Selection differs from screening since screening relies on the investigator to choose which colony contains the vector he wants (i.e., a white colony vs. a blue one in blue-white screening.) In selection, cells that lack the vector (which contains resistance to an antibiotic) will die, so the only colonies present must contain the vector.

Lambda phage vectors: The life cycle of bacteriophage lambda is described in detail in chapter 18 (pages 534-536). Because this complex system is studied in detail in MCDB 3500, it is only briefly covered in this course. In brief summary, lambda phage can either function as a normal bacterial virus that lyses its host cell and releases a burst of new phage particles (lytic phase) or else it can integrate into the DNA of the host cell and be carried for long periods without doing cellular damage (lysogenic phase), with occasional release due to conditions such as UV light damage.

The genetic map of lambda is circular, but the DNA is linearized by a single sticky-ended cut before packing into the mature phage head. In the linear form, the lambda phage genome has head and tail protein genes at the left, genes related to DNA synthesis and lytic cycle on the right, and genes related to the lysogenic cycle in the middle. When used as a vector, the genes related to lysogeny are removed and replaced with an insert of 12-20 kb (kilobase pairs) of DNA. The two arms and the insert are simply ligated together and packed into phage heads. Packaging does not work if there is not enough DNA. Thus the insert must be at least 12 kb. If it is over 20 kb, all of the DNA will not fit into the head. Thus, lambda phage vectors are particularly useful for relatively large DNA inserts in the 12-20 kb range. They are often used for genomic libraries (collections of phage containing all of a genome in 12-20 kb fragments). Lambda phage vectors of this sort are sometimes called Charon vectors.

M13 Vectors: Another bacteriophage vector, M13 has a single-stranded DNA genome. After M13 infection, a double-stranded replicative form (RF) of the DNA is made. The complementary strand that was synthesized to form the double stranded RF DNA serves as a template for synthesis of new single-stranded viral DNA, which is then extruded from the cell (see Figure 15.11). The single-stranded DNA is useful in sequencing analysis of foreign DNA cloned into M13. In addition, M13 clones can easily be subjected to site-directed mutagenesis, as described below.

Site-directed mutagenesis: One of the more interesting examples of how cloned genes can be manipulated is site-directed mutagenesis, which our textbook chose to describe on pages 414-415, before gene cloning was discussed. This technique permits specific mutations to be introduced into a cloned gene maintained in single-stranded form in bacteriophage M13. A synthetic oligonucleotide is prepared that is complementary to the region that is to be mutated and contains the desired mutation. This is used as a primer to make a complete complementary strand, similar to complementary strand that is made naturally during replication of the phage, but containing the desired mutation.

The double stranded form is then infected into E. coli, where the altered complementary strand serves as a template for production of mutated single-stranded phage. If desired, the mutant gene can be made double-stranded, cut out of the M13 phage vector, and put into any other convenient double-stranded DNA vector. The net result of this procedure is to permit selective change of one base pair in the coding sequence and thus, one amino acid in the coded protein (see figure 14.19). This is an extremely powerful tool for detailed analysis of the roles of individual amino acids in the overall function of a protein.

Cosmids: Linearized lambda-phage genomes contain cos sites at their ends (cohesive ends). It is possible to create an artificial plasmid consisting of two cos sites plus a plasmid origin of replication and up to 50 kb of foreign DNA. This will replicate as a plasmid and then can be packaged into lambda-phage heads for infection into bacteria. These highly engineered vectors are a sort of cross between a plasmid and a lambda phage and are capable of carrying 40-50 kb of foreign genes with only a little genetic material of their own.

Shuttle Vectors: Shuttle vectors contain origins of replication and selectable markers for more than one species, allowing the vector to be moved from one to another, such as from E. coli to yeast.