Revised November 5, 1998

MCDB 2150 Lecture 28

cDNA, Libraries, Probes, Restriction maps, Southern and Northern Blots.

Textbook assignment: Chapter 15, pages 437-449. Note that this includes pages 442-449, which were originally listed in the syllabus as part of lecture 29. Also, please note that some of the topics in these notes are not covered well in the textbook and therefore must be studied from the notes or other sources. Also, please be aware that there are major errors in this particular section of the textbook (listed on our errors in textbook web page.

Major concepts .

Cloning in prokaryotic hosts: After an appropriate recombinant DNA vector has been assembled, it is transfected into a bacterial cell, where it multiplies along with its host. Because typical procedures generate large numbers of viable vectors, only some of which carry the desired sequence, selection and screening procedures are typically used, as described in the previous lecture notes. An example is selection for ampicillin resistance (vectors cause the host cells to be resistant to the antibiotic) and tetracycline sensitivity (a sequence has been cloned into the middle of the tetracycline resistance gene), as described in Figure 15.15 and the accompanying text.

Cloning in eukaryotic hosts: The principles involved in recombinant DNA cloning in eukaryotic cells, such as yeast, are rather similar to those for prokaryotic cells, although many details differ. In some cases, naturally occurring yeast plasmids are employed. However, the use of a yeast artificial chromosome (YAC) is more popular because it allows very large pieces of DNA, as large as one megabase (1 million base pairs) to be cloned. The key elements that yeast artificial chromosomes must possess include an orgin of replicaiton (ARS), a centromere (CEN), telomeres (TEL) at both ends, and appropriate restriction sites for cloning, as depicted in Figure 15.16. Much of the work on the human genome project utilizes YACs to permit large blocks of contiguous genes to be worked with. For recombinant DNA cloning in cultured mammalian, insect or plant cells, vectors are often used that have been generated by modification of naturally occurring viruses, such that they will infect their host cells and multiply without killing them.

Libraries: As an alternative to selective cloning of specific DNA sequences, it is possible to construct a library of cloned sequences that is sufficiently complex so that it is statistically expected to include all of the sequences in the DNA that was used as a starting point. This can be the entire genome of an organism or all of the genes carried on one particular chromosome, or all of the mRNA sequences in a particular tissue that have first been converted to cDNA as described below. Libraries are typically constructed in vectors that accept only a limited range of sizes of inserts. To avoid cutting some genes into fragments that are too small to be cloned in such vectors, the digestion process is often stopped before it has been carried to completion. This generates cloned sequences that still contain some cut sites for the enzyme used for the cloning procedure, and usually results in overlapping clones, which are very useful for locating adjacent sequences. Alternatively, it is sometimes possible to use enzymes with less frequent cut sites if the sizes of their digestion fragments match the range that can be cloned in the vector that is being used.

cDNA cloning: An alternative cloning procedure called complementary DNA (cDNA) cloning makes it possible to use mRNA as a starting point for cloning a coding sequence. The first step is to make a DNA copy of the RNA. The first strand of DNA is templated from the RNA by a viral enzyme known as reverse transcriptase. The RNA template is then digested away with ribonuclease H, which is specific for digesting the RNA part of an RNA:DNA hybrid. It is also possible to use selective alkaline hydrolysis with NaOH to remove the RNA without damaging the DNA, which is more resistant. The 3'-end of the single-stranded DNA tends to fold back on itself and find enough complementary base pairing to form a hairpin loop, which serves as a primer for second strand synthesis. The Klenow fragment  of DNA polymerase I   is then used to synthesize the second DNA strand and the loop is cut with S1-nuclease.

The so called cDNA (complementary DNA) is then ligated into a vector. This can be done either by blunt end ligation or by adding sticky ends artificially. cDNA cloning is often done in an expression vector that contains a promoter that allows synthesis of mRNA and expression of the coded protein in the host cell, as will be discussed later in this lecture.

Hybridization probes Complementary strands of DNA, RNA, or DNA plus RNA hybridize readily to form double stranded helical structures when placed under suitable annealing conditions. This property is used extensively in molecular genetics to identify specific nucleic acid sequences. A probe consisting of radioactively labeled DNA (or RNA) is hybridized to denatured DNA (or naturally single-stranded RNA) immobilized on a support, such as a nitrocellulose membrane. Hybridization is normally done at a temperature about 25°C below the melting temperature for the DNA. Probe sequences that do not hybridize because there is no immobilized complementary strand for them are washed off. The sites that contain sequences capable of hybridization are now radioactive and can be detected by autoradiography or by direct counting with scanning counters. When combined with electrophoretic separation of DNA (or RNA) fragments by size, hybridization probes play major roles in many different molecular biology procedures, including Northern and Southern blotting (discussed below), DNA fingerprinting, DNA sequencing, etc.

Reverse genetics with degenerate oligonucleotide probes: In certain cases, it is possible to generate a hybridization probe for a coding sequence based on the amino acid sequence of the coded protein. This is dependent on finding a stretch of at least six amino acids whose codon redundancies are relatively low. The procedure is to artificially synthesize a mixture of all of the possible nucleotide sequences that could code for the amino acid sequence in question, and to use that mixture as a hybridization probe as illustrated in figure 15.21. Please note that there is a serious error in figure 15.21. As explained on our errors in the textbook page, a total of 32 different nucleotide sequences are capable of coding for the peptide segment used in the figure, rather than 8 as stated. One of the members of the mixture is expected to contain the exact coding sequence and thus to hybridize stringently with the message sequence. In addition, others with a single base mismatch may hybridize sufficiently so that they can be seen to be associated with the message sequence if the stringency of the hybridization reaction is reduced somewhat by adjusting temperature and/or salt concentrations so that slightly mismatched probes are not washed off of their immobilized target sequences.

Screening a library: It is possible to transfer replicas of bacterial colonies containing cloned vectors to a nitrocellulose membrane, followed by lysis of the cells and fixing the DNA onto the membrane. Hybridization with a radioactive probe followed by washing off of unhybridized probe then reveals which colonies contain the desired cloned DNA. From the position of the radioactivity, which is detected by laying an X-ray film over the membrane and then developing it after an appropriate exposure period, it is possible to go back to the original plate containing the colonies of bacteria and pick the colony (or colonies) that contain plasmids with the desired cloned DNA inserts. It is also possible to transfer bacteriophage from plaques onto nitrocellulose membranes and use a similar hybridization process to identify those plaques that contain specific cloned DNA sequences.

Electrophoresis: When suspended in a suitably porous support matrix, such as a polyacrylamide or agarose gel, nucleic acids, which are negatively charged, migrate in an electric field. In the absence of confounding effects such as double stranded vs. single stranded, linear vs. circular, and relaxed vs. supercoiled, the rate of migration is inversely proportional to the size of the nucleic acid fragment. Small pieces move rapidly and larger pieces more slowly (see pages 291-292 and Lecture 21 notes).

Southern blotting: The Southern blot (named for its inventor) uses gel electrophoresis together with hybridization probes to characterize restriction fragments of genomic DNA (or DNA from other sources, such as plasmids). A probe is prepared that will hybridize with a particular sequence, which might be the cDNA coding for one protein or a repetitive sequence that occurs more than once in a genome. The DNA to be analyzed is digested to completion with a restriction endonuclease (and sometimes sequentially with two or more restriction enzymes). It is applied to an appropriate gel and electrophoresis is performed in a manner that will maximally separate restriction fragments in the expected size range. A set of standards of known size is run in one lane of the gel. The fragments are then "blotted" onto a nitrocellulose membrane and hybridized with the probe.

Matching the positions of the radioactive spots with those of the size standards identifies the sizes of the digestion fragments that hybridize with the probe. For example, a cDNA probe for a gene that contains two internal cut sites for the restriction enzyme will generate three fragments (which will usually have enough size difference so that all three can be detected). More complex patterns generated by repetitive sequences form the basis for DNA fingerprinting, which will be discussed in a future lecture. Note that it is not necessary for the entire length of the probe to hybridize with the entire length of the DNA fragment. A relatively short complementary sequence (less than 100 bp) is usually enough to obtain a strong hybridization signal. In addition, modification of the annealing conditions can alter the stringency of hybridization (the precision of base-pair matching needed for hybridization). By using reduced stringency, it is often possible to obtain hybridization between the coding sequences for the same protein from different species.

Restriction mapping: By cutting an initially rather large fragment of DNA with a variety of restriction endonucleases, both singly and in mixtures, followed by Southern blotting or direct electrophoretic size analysis, it is possible to obtain a nested set of fragments that are defined by the positions of the cut sites for the various enzymes within the original larger fragment. By analysis of the relative sizes and patterns of overlap of such fragments, it is often possible to arrange them in a definite sequence, such that one can tell which fragments are located next to each other in the intact larger fragment or in intact chromosomal DNA (see Fig. 15.25 for details and our errors in textbook web page for a correction of the discussion of figure 15.25).

Northern blotting: In a Northern blot (named because it is the opposite of a Southern blot), RNA molecules of varying lengths (often naturally occurring mRNAs) are separated by size and blotted onto nitrocellulose. A DNA probe (often a cDNA) is then used to identify bands that contain particular sequences. Northern blots are particularly useful for determining the conditions under which specific genes are being expressed, including which tissues in a complex organism express which of its genes at the mRNA level.

Dot blotting: In cases where the goal is simply to determine whether a particular gene has been cloned, size separation can be bypassed altogether and a bit of DNA from each putative clone can be transferred to a nitrocellulose membrane as a "dot", followed by hybridization to a probe and autoradiography. Only those dots that contain the desired sequence will hybridize and become radioactive. This procedure is similar to the colony and plaque hybridization techniques discussed earlier in this lecture.

Western blotting: In a Western blot, proteins are separated by electrophoresis and blotted onto an appropriate support matrix. The matrix is then exposed to an antibody to the desired protein and all unbound antibody is washed off. The bands (or spots in a dot blot) where the antibody has bound are then detected by various means, such as binding of a second antibody that is radioactively labeled and specific for the first antibody.

Non-radioactive techniques: Although we do not have time to discuss them in detail, a variety of alternative techniques are beginning to replace the use of radioactive labels. These include fluorescent labels and a variety of reactions that produce colored end products.

Expression vectors: (This topic is not well covered in the textbook). It is often desirable to be able to obtain expression of the protein coded by a cloned gene or cDNA. Expression vectors are useful for production of the coded protein in various types of host organisms, including commercial production of proteins that are difficult to obtain in adequate quantities from natural sources. They also permit the proteins that are produced to be used to identify the cloned genes that code for those proteins. They can also be used to produce fusion proteins that are useful in the isolation of previously unknown protein products.

Expression vectors of many different types have been designed for use in various types of host cells. Typically they contain either constituitively strong promoters or promoter constructs that are capable of regulated expression. They also usually contain a ribosome-binding site, such as the bacterial Shine-Dalgarno sequence, to insure vigorous translation of the transcripts that are produced from them. In many cases, they also contain an ATG start codon, followed by a few amino acids from a host protein. In such cases, the cloned gene must be in-frame and inserted in the right direction. The resultant fusion protein can usually be detected with antibodies, either by Western blotting or by an antibody dot-blot.

Use of antibodies to identify cloned genes: Cloned genes (or cDNAs) that code for proteins that can be identified with antibodies are frequently detected through the use of an expression vector derived from bacteriophage lambda. It is called lambda-gt11 (usually written with the Greek letter lambda). This vector produces a fusion protein under control of the inducible promoter for the E. coli lactose (lac) operon. The vector lyses the bacterial cells, forming plaques on a lawn of E. coli. When expression is induced with allo-lactose or an appropraite analogue at a critical time during plaque formation, the fusion protein is released into the plaques and can be detected by blotting onto nitrocellulose, followed by binding of antibody and then binding of radioactive Protein A, which attaches to the antibody. This allows plaques that express proteins capable of binding to the antibody to be located by autoradiography.

Inducible expression: Production of large amounts of a foreign protein can be toxic to a host cell. Many expression vectors have been designed with inducible expression to allow the vector to be grown up to a high level in the cell without expression, followed by a burst of intense expression to generate as much as possible of the product before the host cell ceases to function. The host cell and the vector are often engineered to work together. One example is use of a bacterial strain that produces large amounts of the lac repressor protein and a vector with the cloned gene under the control of the lac promoter/operator system. The repressor inhibits expression while the vector population is being expanded. Expression is activated at the appropriate time by adding a synthetic analog of allo-lactose called IPTG that is not metabolized by the cells and thus provides a strong stable induction signal. Promoters that respond to hormones or to heavy metals are often used for inducible expression in eukaryotic cells. The regulatory signals involved in control of the lac-operon will be discussed in detail in a future lecture (pages 522-529).

Chromosome walking: It is often desirable to determine what lies next to a genomic sequence that has been cloned (for example to identify adjacent regulatory sequences or to find the rest of a gene that is just partially represented in the clone or to find a gene known to be closely linked to the one that is cloned). This can often be done by chromosome walking. The cloned sequence is broken into smaller fragments that are used as probes to identify DNA fragments that partially overlap the original clone. Incomplete digestion or cutting separately with several different restriction enzymes will often generate overlapping fragments. As each new fragment is identified, its distal end is used to make more probes to identify more partial overlaps. This allows one to "walk" along the chromosome, sometimes for substantial distances.

Heteroduplex analysis: Hybridization of two nucleic acids that contain a mixture of complementary and non-complementary sequences will cause the formation of a series of loops that are similar in principle to insertion and deletion loops in Drosophila polytene chromosomes. Such structures can be visualized quite readily with an electron microscope. Thus, for example, it is possible to see a cloned DNA sequence in a vector that has been denatured and hybridized with a vector that lacks the insert. This technique is also valuable for visualizing introns in eukaryotic genomic sequences hybridized with mRNA or cDNA sequences (see figure 12.11).