Text Assignment: Chapter 3, 56-75 (to start of mRNA processing).
Important concepts
This is the third of five lectures reviewing basic concepts of molecular biology and the central dogma that are covered in MCDB 1150, This lecture focuses on transcription of genetic information from double- stranded DNA to single stranded messenger RNA, as well as the DNA-templated synthesis of RNA sequences that are not subsequently translated, such as ribosomal, transfer, and small-nuclear RNAs.
Transcription: major concepts. The "central dogma" of molecular biology describes the transcription of genetic information from a DNA nucleotide triplet code to an RNA triplet code, followed by translation to specific amino acid sequences in protein. DNA-templated RNA synthesis is achieved by a process that is quite similar to DNA synthesis except that only one strand of RNA is synthesized. The antisense strand of the double-stranded DNA serves as the template for assembly of an RNA sequence that is the reverse complement of the antisense sequence. This results in formation of an RNA molecule that is identical in its base sequence to the DNA sense strand except that uracil replaces thymine and the sugar-phosphate backbone contains ribose instead of deoxyribose.
Promoters: Only specific parts of the DNA that correspond to protein coding sequences or various types of cellular RNA that function without being translated are transcribed. Site specific transcription is initiated by the interaction of RNA polymerases and additinal proteins called transcription factors with specific upstream sequences known as promoters. Once initiated, transcription proceeds in a 5'-to-3' direction by the addition of ribonucleotide triphosphates to the free 3'-hydroxyl group at the end of the growing chain. Energy for the bond formation is derived from splitting off a pyrophosphate as each phosphodiester bond in the backbone of the RNA is formed (Figure 3.12). Transcription continues until specific termination sites are reached, or until a sequence specific cleavage of the growing chain takes place. Details of transcription, including the nature of the RNA polymerases and of the promoters they interact with, differ substantially between prokaryotic and eukaryotic cells.
Prokaryotic transcription
Promoters: The numbering of nucleotides in a transcript (and in the corresponding DNA sense strand) normally starts at the 5'-end and proceeds toward the 3'-end. Upstream nucleotides in the DNA sense strand carry negative numbers, starting with -1 adjacent to the start of transcription. Typical prokaryotic promoters have a consensus sequence of TATAAT at about -10, and a consensus sequence of TTGACA at about -35. As shown in table 3.3, there can be substantial variation among the actual sequences for individual genes.
Prokaryotic RNA polymerase: A typical bacterial RNA polymerase core enzyme conttains a total of four subunits, consisting of two identical alpha subunits, one beta subunit, and one beta-prime subunit. The core polymerase is capable of elongating transcripts that have already been initiated as well as a low level of non-specific initiation. Addition of a sigma subunit converts the core polymerase to a polymerase holoenzyme, which is now capable of a high level of promoter-specific initiation. The sigma subunit drops off after successful initiation, leaving the core enzyme to complete the elongation process. As we will see later in the semester, the use of alternative sigma factors allows initiation from different sets of promoters in special situations such as bacterial sporulation and some types of bacterial virus infection.
Intrinsic terminator: For many prokaryotic genes, termination of transcription occurs when the polymerase transcribes an intrinsic terminator signal. The termination signal has two parts. The first is a sequence that will hybridize with itself to form a base-paired stem-loop (hairpin) structure. The stem-loop is immediately followed by a consensus sequence of UUUUUUA. Formation of the hairpin loop causes transcription to pause temporarily. The base-pairing that occurs as the stem loop is formed is believed to reduce the amount of base-pairing of the newly synthesized RNA to the template RNA. As the stem loop forms, it leaves the RNA strand attached to the template DNA only by A:U base pairs, which are weaker than G:C base pairs, may not be sufficiently strong to prevent detachment of the RNA strand and termination of transcription.
Rho-dependent termination: A second form of termination is referred to as rho-dependent because it requires protein factor rho. This system may be related to stopping transcription soon after the end of the last coding sequence on the mRNA is reached. Rho appears to cause termination at poorly defined sites where the RNA is rich in C. Another factor, called NusA, associates with the core RNA polymerase, possibly at the sigma-binding site, and periodically slows transcription, such that the first of the ribosomes that are translating the newly synthesized message stays close behind the RNA polymerase. When translation stops, the ribosome falls off and no longer blocks the access of the rho protein to C-rich sites that have just emerged from the polymerase (see figure 3.14).
Eukaryotic transcription
Separation of transcription and translation: Eukaryotic cells are characterized by the presence of a membrane-bound nucleus. This results in segregation of the genetic DNA and the enzymatic machinery for transcription and message processing into a separate subcellular compartment. The nucleus has only limited communication to the cytoplasm, which contains the systems needed for translation of the mRNA and post-translational modification of the proteins, as well as targeting proteins to appropriate subcellular or extracellular locations.
Selective gene expression: Eukaryotic cells are capable of selective gene expression. In "simple" unicellular eukaryotes, such as yeast, alternative mating types are displayed and proteins associated with progression around the mitotic cell cylce are expressed at different times. More complex multicellular organisms, such as ourselves, are composed of diverst types of "differentiated" cells that display widely different biochemical and structural properties. Their cellular diversity is achieved primarily through highly selective expression of different genes within the overall genome that they share in common. We will examine some of the mechanisms responsible for such controls in lecture 13.
Eukaryotic RNA polymerases: Eukaryotic cells contain three different types of RNA polymerases in their nuclei, each of which has a distinctly different role.
RNA polymerase I transcribes only ribosomal RNAs (18S, 28S, and 5.8S). The initial transcript of RNA polymerase I is a large precursor of all three of these rRNAs, which is then processessed to yield the final rRNAs.
RNA polymerase II transcribes all protein-coding sequences in eukaryotic cells. RNA polymerase II is a complex molecular machine whose structural details are not yet fully understood. It is capable of initiating transcription selectively from a variety of types of promoters, working in conjunction with complex sets of transcription factors and influenced by regulatory sequences that may be either adjacent to the promoter or relatively distant from it. RNA polymerase II is the only one of the three polymerases that we will have time to analyze in any detail. However, it should be noted that the other two polymerases also have complex regulatory interactions with transcription factors that are similar in principle to those we will be studying for RNA polymerase II.
RNA polymerase III transcribes a number of small RNA species that do not have protein coding functions, including transfer RNAs, 5S ribosomal RNA (distinct from 5.8S), as well as a number of small RNAs that are involved in nuclear functions, such as splicing of mRNA.
Eukaryotic promoter sites: Eukaryotic promoters are so diverse that it is not yet possible to draw any clear generalizations about them. Many, but not all, have a consensus sequence called the TATA box located about 30 bp upstream from the transcriptional start site (commonly referred to as -30). The consensus sequence is TATAAA, but there is a lot of variation. In addition, many promoters have no recognizable TATA box. In such cases, the transcriptional initiation site is usually less precisely defined, with starts occurring at several different locations. There is sequence similarity to the prokaryotic -10 box (TATAAT), but the eukaryotic TATA box is located substantially further upstream. A second upstream site that is often encountered in eukaryotic promoters is the CCAAT box, commonly referred to as the "cat box". It has a consensus sequence of GGCCAATCT, again with substantial variation, particularly at the ends. When present, it is typically located near the -75 position, relative to the transcriptional start site.
Binding of transcription factors to cis-acting sequences: The TATA and CCAAT box sequences are "cis-acting", meaning that in order to have an effect they must be part of the same extended DNA double helix as the sequence whose transcription they promote. There are also many other cis-acting consensus sequences that occur in eukaryotic promoters or just upstream from them, as well as in enhacers and silencers (described below). The cis-acting sequences serve as binding sites for a wide variety of specialized proteins called "transcription factors". These proteins, which are described in greater detail later in these notes, have the ability to bind selectively to specific types of cis-acting DNA sequences and also to interact with the RNA polymerase in ways that affect the frequency of initiation of transcription, often in a tissue-specific manner.
CG-box: One frequently encountered cis-acting sequence is the GC box, which has a consensus sequence of GGGCGG. Although CG boxes are often encountered upstream from the CCAAT box of TATA box promoters, they also frequently appear as the most recognizable sequence in promoters that lack a TATA box. Many other consensus sequences have been identified as specific binding sites for transcription factors. Much of the fine tuning of transcriptional control in eukaryotic cells is believed to be achieved in a combinatorial manner by complex interactions of multiple transcription factors with multiple consensus binding sequences for those factors.
Analysis of promoter sequences: Figure 3.9 demonstrates one of several methods of analysis of the functions of specific sequences within the general promoter region, in this case for the mouse beta-globin gene. In this example, mutations have been introduced at specific sites and the effect on transcriptional initiation has been determined, probably with a reporter gene attached to the modified promoter. Point mutations in the TATA box and in the CCAAT box both reduce transcriptional initiation substantially but do not totally abolish it (emphasizing that these are consensus sequences whose function is weakened, but not totally abolished by individual base changes). Mutations at most points between the two have little effect. However, in this example, mutations in a third sequence, GCCACACCC, whose function is not defined in the text, also seriously impair transcription. Note also that mutations just to the left of the CCAAT box, substantially increase transcriptional initiation, suggesting the presence of a down-regulatory element in the wild-type promoter.
Enhancers and Silencers: In addition to the cis-acting sequences that are generally considered to be part of the promoter itself, a number of other cis-acting sequences can also influence the extent of transcription of a particular gene. Members of one interesting subclass are referred to as enhancers. Although they may in some cases be located immediately adjacent to promoter sequences, enhancers have two additional properties that distinguish them from true components of the promoter. The first is that they can also function from more remote locations, up to thousands of base pairs upstream or downstream from the promoter, including the possibility of being located within the introns of the gene whose transcription is being enhanced. The second is that they also have the property of retaining their activities when inserted in a reversed direction. Enahncers appear to function as binding sites for gene-specific transcription factors that are capable of interaction with the overall transcriptional complex, probably through a bending of the DNA (Figure 3.11), which can happen even when the enhancer is quite distant from the promoter. Silencers are very similar, except that their function is to reduce or stop transcription of a particular gene, rather than to activate it. There are also upstream activator sequences (UAS) in yeast that are very similar in function to enhancers except that they are inactive when placed downstream from the transcriptional start site.
Eukaryotic transcription factors: Transcription factors are proteins that interact with specific consensus sequences in promoters, enhancers, and silencers to facilitate or modify transcription. The transcription factors for RNA polymerase II fall into two overall categories, commonly referred to as general transcription factors and gene-specific transcription factors. The general transcription factors interact with the promoter to form a preinitiation complex, which allows the RNA polymerase to attach to the DNA and initiate transcription. However, the amount of transcriptional initiation is quite low when only the preinitiation complex and RNA polymerase II are present. Higher levels of transcription require the presence of gene-specific transcription factors in addition to the general transcription factors. The gene-specific factors are believed to function primarily through interaction with upstream promoter and enhancer sequences, although in many cases silencers or other down-regulatory elements are also involved in achieving the final level of transcription that is characteristic of the expression of a particular gene in a particular type of cell.
Assembling the transcriptional initiation complex: The textbook briefly summarizes the steps that are involved in the assembly of a transcriptional initiation complex at the TATA box of a promoter (Page 66 and Figure 3.10). The process begins with a TATA box binding protein (TBP) and a group of interacting proteins known as TATA-associated factors (TAF). This initial complex of proteins is known historically as TFIID (transcription factor D for RNA polymerase II). The actual interaction with the TATA box sequence of the DNA is through the TBP component (figure 3.10). This initial binding event causes binding sites for TFIIA and TFIIB to be formed. Binding of those factors opens the way for addition of TFIIF and RNA polymerase II, which itself is a complex of a number of molecular species. The further addition of TFIIE, TFIIH, and TFIIJ completes the initiation complex and allows a minimal level of transcription to be initiated at a precisely defined site downstream from the TATA box. Further enhancement (or repression) of that transcription is achieved by interactions with additional transcription factors that bind to upstream promoter elements, enhancers, and silencers (figure 3.11).
Elongation: At least one specific transcription factor, TFIIS, contributes to the process of transcript elongation in eukaryotic cells. There are also some unusual systems in which increasing the amount of transcription is achieved in part by reversing a transcrptional stalling phenomenon (not described in our text).
Termination: Termination of transcription by RNA polymerase II appears to be poorly defined in eukaryotic cells. Most protein-coding transcripts contain a consensus sequence AAUAAA that causes the transcript to be cleaved about 11 - 30 base pairs further downstream (figure 3.15). The cut then becomes the starting point for polyadenylation of the messenger RNA (described in the next lecture). However, transcription sometimes continues for a substantial distance beyond the cut site.
More details later in the course: We will return to transcriptional control in lectures 11-13. At that time, we will examine mechanisms responsible for selective control over transcription of specific genes.