Text Assignment: Chapter 3, Pages 56-87
Important concepts
INTRODUCTION:
Working copies of genetic informaiton: This is the third of four lectures reviewing basic concepts of molecular biology and the central dogma that are covered in MCDB 1150, This lecture focuses on transcription of genetic information from double-stranded DNA to single stranded messenger RNA, as well as the DNA-templated synthesis of RNA sequences that are not subsequently translated, such as ribosomal, transfer, and small-nuclear RNAs. The RNAs produced by transcription can be viewed as working copies of the information stored in archival form in the DNA.
Transcription: major concepts. The "central dogma" of molecular biology describes the transcription of genetic information from a DNA nucleotide triplet code to an RNA triplet code, followed by translation to specific amino acid sequences in protein. DNA-templated RNA synthesis is achieved by a process that is superficially quite similar to leading strand synthesis in DNA replication. The antisense strand of the double-stranded DNA serves as the template for assembly of an RNA sequence that is the reverse complement of the antisense sequence. This results in formation of an RNA molecule that is identical in its base sequence to the DNA sense strand except that uracil replaces thymine and the sugar-phosphate backbone contains ribose instead of deoxyribose.
Promoters: Only specific parts of the DNA that correspond to protein coding sequences or various types of cellular RNA that function without being translated are transcribed. Site specific transcription is initiated by the interaction of RNA polymerases and additinal proteins called transcription factors with specific upstream sequences known as promoters. Once initiated, transcription proceeds in a 5'-to-3' direction by the addition of ribonucleotide triphosphates to the free 3'-hydroxyl group at the end of the growing chain. Energy for the bond formation is derived from splitting off a pyrophosphate as each phosphodiester bond in the backbone of the RNA is formed (Figure 3.12). Transcription continues until specific termination sites are reached, or until a sequence specific cleavage of the growing chain takes place. Details of transcription, including the nature of the RNA polymerases and of the promoters they interact with, differ substantially between prokaryotic and eukaryotic cells.
Sense can be on either strand: It is important to be aware that genes on an extended double helical DNA molecule (which is what a chromosome is) can have their sense strands on either strand of the double helix. Because of the antiparallel nature of the double helix, coding sequences on the two strands are transcribed in opposite directions relative to the overall chromosomal structure. In some unusual cases, transcription can even be initiated in both directions from a centrally located bidirectional promoter region.
PROKARYOTIC TRANSCRIPTION
Numbering of promoter sequences: The numbering of nucleotides in a transcript (and in the corresponding DNA sense strand) normally starts at the 5'-end and proceeds toward the 3'-end. Upstream nucleotides in the DNA sense strand carry negative numbers, starting with -1 adjacent to the start of transcription. Typical prokaryotic promoters have a consensus sequence of TATAAT at about -10, and a consensus sequence of TTGACA at about -35. As shown in table 3.3, there can be substantial variation among the actual sequences for individual genes.
Prokaryotic RNA polymerase: A typical bacterial RNA polymerase core enzyme contains a total of four subunits, consisting of two identical alpha subunits, one beta subunit, and one beta-prime subunit. The core polymerase is capable of elongating transcripts that have already been initiated as well as a low level of non-specific initiation. Addition of a sigma subunit converts the core polymerase to a polymerase holoenzyme, which is now capable of a high level of promoter-specific initiation. The sigma subunit drops off after successful initiation, leaving the core enzyme to complete the elongation process.
Selective Initiation of transcription: As we will see later in the semester, the use of alternative sigma factors allows initiation from different sets of promoters in special situations such as bacterial sporulation and some types of bacterial virus infection. In addition, as we will see in lectures 10 and 11, there are a variety of mechanisms for control of expression of genes that are only needed under specific circumstances, such as for the utilization of lactose as an energy source, or synthesis of the amino acid tryptophan when the environment does not provide an adequate supply.
Intrinsic terminator: For many prokaryotic genes, termination of transcription occurs when the polymerase transcribes an intrinsic terminator signal. The termination signal has two parts. The first is a sequence that will hybridize with itself to form a base-paired stem-loop (hairpin) structure. The stem-loop is immediately followed by a consensus sequence of UUUUUUA. Formation of the hairpin loop causes transcription to pause temporarily. As the stem loop forms, its component nucleotides are pulled away from the template DNA somewhat prematurely, leaving the transcript attached to the template only by a string of relatively weak A:U base pairs. In the absence of adjacent GC base pairing, this weak attachment may not be strong enough to keep the transcript attached to the template so transcription can continue. We will see an example of metabolically regulated detachment when we study the tryptophan attenuator in lecture 11.
Rho-dependent termination: A second form of termination is referred to as rho-dependent because it requires protein factor rho. This system may be related to stopping transcription soon after the end of the last coding sequence on the mRNA is reached. Rho appears to cause termination at poorly defined sites where the RNA is rich in C. Another factor, called NusA, associates with the core RNA polymerase, possibly at the sigma-binding site, and periodically slows transcription, such that the first of the ribosomes that are translating the newly synthesized message stays close behind the RNA polymerase. When translation stops, the ribosome falls off and no longer blocks the access of the rho protein to C-rich sites that have just emerged from the polymerase (see figure 3.14).
EUKARYOTIC TRANSCRIPTION
Separation of transcription and translation: Eukaryotic cells are characterized by the presence of a membrane-bound nucleus. This results in segregation of the genetic DNA and the enzymatic machinery for transcription and message processing into a separate subcellular compartment. The nucleus has only limited communication to the cytoplasm, which contains the systems needed for translation of the mRNA and post-translational modification of the proteins, as well as targeting proteins to appropriate subcellular or extracellular locations (including sending some to the nucleus).
Selective gene expression: Eukaryotic cells are capable of selective gene expression. In "simple" unicellular eukaryotes, such as yeast, alternative mating types are displayed, genes needed to metabolize specific substrates, such as lactose, are turned on only when needed, and proteins associated with progression around the mitotic cell cylce are expressed at different times. More complex multicellular organisms, such as ourselves, are composed of diverse types of "differentiated" cells that display widely different biochemical and structural properties. This cellular diversity is achieved primarily through highly selective expression of different genes within the overall genome that almost all of the cells share in common. We will examine some of the mechanisms responsible for such controls in lecture 12 .
Eukaryotic RNA polymerases: Eukaryotic cells contain three different types of RNA polymerases in their nuclei, each of which has a distinctly different role.
RNA polymerase I transcribes only ribosomal RNAs (18S, 28S, and 5.8S). The initial transcript of RNA polymerase I is a large precursor of all three of these rRNAs, which is then processessed to yield the final rRNAs.
RNA polymerase II transcribes all protein-coding sequences in eukaryotic cells. RNA polymerase II is a complex molecular machine whose structural details are not yet fully understood. It is capable of initiating transcription selectively from a variety of types of promoters, working in conjunction with complex sets of transcription factors and influenced by regulatory sequences that may be either adjacent to the promoter or relatively distant from it. RNA polymerase II is the only one of the three polymerases that we will have time to analyze in any detail. However, it should be noted that the other two polymerases also have complex regulatory interactions with transcription factors that are similar in principle to those we will be studying for RNA polymerase II.
RNA polymerase III transcribes a number of small RNA species that do not have protein coding functions, including transfer RNAs, 5S ribosomal RNA (distinct from 5.8S), and a number of small RNAs that are involved in nuclear functions, such as splicing of mRNA.
Eukaryotic promoter sites: Eukaryotic promoters are so diverse that it is not yet possible to draw any clear generalizations about them. Many, but not all, have a consensus sequence called the TATA box located about 30 bp upstream from the transcriptional start site (commonly referred to as -30). The consensus sequence is TATAAA, but there is a lot of variation. In addition, many promoters have no recognizable TATA box. In such cases, the transcriptional initiation site is usually less precisely defined, with starts occurring at several different locations. There is sequence similarity to the prokaryotic -10 box (TATAAT), but the eukaryotic TATA box is located substantially further upstream. A second upstream site that is often encountered in eukaryotic promoters is the CCAAT box, commonly referred to as the "cat box". It has a consensus sequence of GGCCAATCT, again with substantial variation, particularly at the ends. When present, it is typically located near the -75 position, relative to the transcriptional start site.
Binding of transcription factors to cis-acting sequences: The TATA and CCAAT box sequences are "cis-acting", meaning that in order to have an effect they must be part of the same extended DNA double helix as the sequence whose transcription they promote. There are also many other cis-acting consensus sequences that occur in eukaryotic promoters or just upstream from them, as well as in enhacers and silencers (described below). The cis-acting sequences serve as binding sites for a wide variety of specialized proteins called "transcription factors". These proteins, which are described in greater detail later in these notes, have the ability to bind selectively to specific types of cis-acting DNA sequences and also to interact with the RNA polymerase in ways that affect the frequency of initiation of transcription, often in a tissue-specific manner.
CG-box: One frequently encountered cis-acting sequence is the GC box, which has a consensus sequence of GGGCGG. Although CG boxes are often encountered upstream from the CCAAT box of TATA box promoters, they also frequently appear as the most recognizable sequence in promoters that lack a TATA box. Many other consensus sequences have been identified as specific binding sites for transcription factors. Much of the fine tuning of transcriptional control in eukaryotic cells is believed to be achieved in a combinatorial manner by complex interactions of multiple transcription factors with multiple consensus binding sequences for those factors.
Analysis of promoter sequences: Figure 3.9 demonstrates one of several methods of analysis of the functions of specific sequences within the general promoter region, in this case for the mouse beta-globin gene. In this example, mutations have been introduced at specific sites and the effect on transcriptional initiation has been determined, probably with a reporter gene attached to the modified promoter. Point mutations in the TATA box and in the CCAAT box both reduce transcriptional initiation substantially but do not totally abolish it (emphasizing that these are consensus sequences whose function is weakened, but not totally abolished by individual base changes). Mutations at most points between the two have little effect. However, in this example, mutations in a third sequence, GCCACACCC, whose function is not defined in the text, also seriously impair transcription. Note also that mutations just to the left of the CCAAT box, substantially increase transcriptional initiation, suggesting the presence of a down-regulatory element in the wild-type promoter.
Enhancers and Silencers: In addition to the cis-acting sequences that are generally considered to be part of the promoter itself, a number of other cis-acting sequences can also influence the extent of transcription of a particular gene. Members of one interesting subclass are referred to as enhancers. Although they may in some cases be located immediately adjacent to promoter sequences, enhancers have two additional properties that distinguish them from true components of the promoter. The first is that they can also function from more remote locations, up to thousands of base pairs upstream or downstream from the promoter, including the possibility of being located within the introns of the gene whose transcription is being enhanced. The second is that they also have the property of retaining their activities when inserted in a reversed direction. Enahncers appear to function as binding sites for gene-specific transcription factors that are capable of interaction with the overall transcriptional complex, probably through a bending of the DNA (Figure 3.11), which can happen even when the enhancer is quite distant from the promoter. Silencers are very similar, except that their function is to reduce or stop transcription of a particular gene, rather than to activate it. There are also upstream activator sequences (UAS) in yeast that are very similar in function to enhancers except that they are inactive when placed downstream from the transcriptional start site.
Eukaryotic transcription factors: Transcription factors are proteins that interact with specific consensus sequences in promoters, enhancers, and silencers to facilitate or modify transcription. The transcription factors for RNA polymerase II fall into two overall categories, commonly referred to as general transcription factors and gene-specific transcription factors. The general transcription factors interact with the promoter to form a preinitiation complex, which allows the RNA polymerase to attach to the DNA and initiate transcription. However, the amount of transcriptional initiation is quite low when only the preinitiation complex and RNA polymerase II are present. Higher levels of transcription require the presence of gene-specific transcription factors in addition to the general transcription factors. The gene-specific factors are believed to function primarily through interaction with upstream promoter and enhancer sequences, although in many cases silencers or other down-regulatory elements are also involved in achieving the final level of transcription that is characteristic of the expression of a particular gene in a particular type of cell.
Assembling the transcriptional initiation complex: The textbook briefly summarizes the steps that are involved in the assembly of a transcriptional initiation complex at the TATA box of a promoter (Page 66 and Figure 3.10).
Elongation: At least one specific transcription factor, TFIIS, contributes to the process of transcript elongation in eukaryotic cells. There are also some unusual systems in which increasing the amount of transcription is achieved in part by reversing a transcrptional stalling phenomenon (not described in our text).
Termination: Termination of transcription by RNA polymerase II appears to be poorly defined in eukaryotic cells. Most protein-coding transcripts contain a consensus sequence AAUAAA that causes the transcript to be cleaved about 11 - 30 base pairs further downstream (figure 3.15). The cut then becomes the starting point for polyadenylation of the messenger RNA (described below under Message Processing). However, transcription sometimes continues for a substantial distance beyond the cut site.
More details later in the course: We will return to transcriptional control in lectures 10-12. At that time, we will examine mechanisms responsible for selective control over transcription of specific prokaryotic and eukaryotic genes.
EUKARYOTIC MESSAGE PROCESSING
Molecular organization of eukaryotic genes: We have already seen that a typical eukaryotic gene has an upstream promoter sequence, as well as various enhancer and silencer sequences associated with it. The region that is transcribed is also quite complex, consisting of far more than a simple protein coding sequence. The messenger RNA that is derived from the transcript always begins with a 5'-untranslated sequence, which, among other things, provides sequences needed for ribosomal attachment in preparation for translation. This is followed by the actual coding sequence and a relative long 3'-untranslated region, which may contain specific signal sequences related to such things as message stability. The initial transcript and the DNA that it is transcribed from also contain numerous intervening sequences that interrupt not only the coding sequence, but also the 5'- and 3'- untranslated sequences. These are called introns. The segments of mRNA sequence located between the introns are called exons. The term hnRNA (heterogeneous nuclear RNA) is sometimes used to describe the original products of transcription of eukaryotic genes before they have been processed into messenger RNA. You may want to look ahead to "The Anatomy of a Gene" (pages 158 to 161 and Figure 6.2) for a more complete overview of the structure of a typical eukaryotic gene.
Processing of the mRNA precursor: The initial transcript of a typical eukaryotic gene is not capable of functioning as a messenger RNA. In most cases, both ends of the transcript must be modified (5'-capping and 3'-polyadenylation), and introns must be "spliced out" during the process of converting the initial transcript (hnRNA) into a functional mRNA.
Capping: Addition of a "cap" structure on the 5'-end of the message transcript occurs soon after transcription has been initiated, when the growing RNA chain is only about 50 nucleotides long. The reaction occurs in two steps. First, a GTP is added to the 5'-end, which already has a 5'-triphosphate structure left over from the first NTP that initiated growth of the transcript in a 3'-direction. An unusual linkage is formed in which the GTP is joined to the first nucleotide of the RNA in a 5'- to 5'- triphosphate bond (see figure 3.19). The second phase is methylation of the guanine in the 7'-position to generate the mature 7-methylguanosine-5',5'-triphosphate cap structure. In addition to protecting the mRNA from degradation, the cap structure is also needed to initiate ribosomal binding onto the message.
Polyadenylation: Most (but not all) eukaryotic mRNAs have a polyadenylic acid tail of 150 to 250 nucleotides added to their 3'-ends before they are exported to the cytoplasm. This is done by an enzyme complex that clips off the 3' end of the original transcript slightly downstream from a specific recognition signal (AAUAAA) and initiates polymerization of ATP to generate the poly(A) tail. Although polyadenylation is a very important phenomenon, please be aware that a subset of mRNAs, including those for the histones, do not have poly(A) tails. Also, please be fully aware that the poly (A) tail is generated by a simple polymerization process that does not require a template.
Removal of introns: Nearly all eukaryotic protein-coding genes contain introns that must be removed to generate a functional messenger RNA that is capable of being translated. There are specific recognition signals at the beginning and end of each intron that allow an enzyme complex in the nucleus to recognize the presence of an intron and to remove it, coupling the exons on either side with sufficient precision so that the codons (nucleotide triplets coding for individual amino acids) continue to be read correctly. Note that in many cases, parts of a single codon may be in two exons.
Signals that identify introns: Removal of introns is done in the nucleus before the mRNA is exported to the cytoplasm for translation. The mechanisms that are involved are described in considerable detail in the textbook (pages 76-79 and figure 3.21). The 5'-end of the intron is marked with a consensus sequence of GUAAGU, with the first two nucleotides being invariant. The 3'-end has a sequence consisting of 6 pyrimidines in a row followed by an unspecified nucleotide and then CAG (6PyNCAG), with the AG being invariant. A third sequence 18-40 nucleotides upstream from the 3'-end also plays a major role. It hae a concensus sequence PyNPyPyPuAPy, (Py = pyrimidine and Pu = purine).
Removal of introns: A multicomponent "spliceosome" composed of several small nuclear ribonucleoproteins (snRNPs) carries out the actual message splicing operations. The RNAs in the snRNPs are products of RNA polymerase III, and range in size from 100 to 250 nucleotides. If one regards the hnRNA as reading left to right from 5' to 3', the 5'-end of the intron is first detached from the exon to the left and bent into a lariat loop, with attachment to the A in the internal consensus site. This involves the formation of an unusual 2,5-phosphodiester bond between the guanine nucleotide from the 5'-end of the intron and the adenine nucleotide in the internal consensus sequence. The exon from the left side is held by the complex and joined to the exon from the right side after the 3'-end of the intron has been detached. This joining produces a continuous mRNA sequence with the intron removed and no loss of nucleotides from either exon. This entire process is referred to as "splicing out" the intron (see figure 3.21 for details).
Electron microscopy of introns: If double stranded DNA is dissociated to single strands and allowed to reform double strands in the presence of mRNA, the mRNA hybridizes readily with the antisense strand, leaving the DNA sense strand in a single stranded form with no hybridization partner (figure 3.22a). In cases where introns have been spliced out of the mRNA, the mRNA exons hybridize with the DNA antisense strand, leaving the sense strand as single stranded DNA. However, in places where the DNA codes for introns, there is no RNA sequence to hybridize to the DNA antisense strand because the corresponding RNA has been spliced out. In those areas, the DNA sense and antisense strands form a double stranded loop that is clearly visible in electron micrographs (Figure 3.22b).
Alternative splicing and RNA editing: In certain unusual cases, splicing sometimes removes larger segments of the transcript that contain potential exons. This allows different combinations of exons to be assembled together to produce alternative forms of the mRNA that can give rise to proteins with alternative amino acid sequences. (This phenomenon is described briefly on page 244 and illustrated in figure 8.29). In some cases, alternative splicing occurs under tissue-specific regulatory control. There are also rare cases of "editing" of messenger RNA can occur after its transcription (briefly described on page 107).. These are unusual situations in which insertions, deletions or base changes may occur. RNA editing is an important but highly specialized processes, which will not be analyzed in detail in this class.
Transport to cytoplasm: After it is fully modified, the mRNA must be transported to the cytoplasm before it can be translated. Our textbook does not appear to have much to say about this phenomenon, at least in the current chapters (if anyone finds a description of it, please let me know). Overall, the transport process is still not well understood. However, it is important to recognize that the mRNA must pass through relatively small nuclear pores and that this process is somehow facilitated by poly (A) and various special proteins, including a poly (A)-binding protien.
Processing of other types of RNA: Introns are also found in ribosomal and transfer RNAs, as well as RNAs transcribed from mitochondrial and chloroplast genomes. A variety of splicing mechanisms are involved in removal of these introns. In some cases, the RNA is capable of self-splicing, with no requirement for the involvement of protein enzymes. Dr. Thomas Cech of the Department of Chemistry and Biochemistry was awarded a Nobel Prize for discovery of this unexpected phenomenon. In addition to the removal of introns, the processing of ribosomal RNAs involves cleaving a very large precursor into the smaller pieces that become the final functional RNAs in the ribosomes (figures 3.23 and 3.24). Transfer RNAs undergo extensive modification of their nucleotide bases (fig. 3.26), and in some cases also have a CCA sequence added to their 3'-ends.
Go to Review Questions for This Lecture
Go to the Notes for the Next Lecture
Go to the Notes for the Previous Lecture
Return to Index of Lecture Notes
Return to MCDB 2150 home page