Revised December 2, 1998

MCDB 2150 Lecture 37

Eukaryotic gene expression

Textbook Assignment: Chapter 19, pages 544-556 (to start of gene amplification). Also review chapter 12, pages 341-345 on transcription and processing of the transcript in eukaryotic cells, and chapter 17, pages 507-509 on the structural organization of eukaryotic genes. In addition, the portion of the lecture 23 notes on eukaryotic transcription and processing of the transcript needs to be reviewed at this time.

Major concepts (some of which were presented at an introductory level in lecture 23).

Introduction: The first part of this lecture summarizes concepts introduced briefly in lecture 23. (Because of time constraints both last year and this year at this point in the semester, some of this background material has been directly repeated -- there has simply not been enough time available to edit it to a shorter summary). The remainder of the lecture expands on these concepts and presents a number of aspects of eukaryotic transcriptional control in greater detail. Those of you who take MCDB 3500 will explore many of these concepts in far greater depth than in possible in this initial survey course.

Basic properties of eukaryotic cells: Eukaryotic cells are characterized by the presence of a membrane-bound nucleus. This results in segregation of the genetic DNA and the enzymatic machinery for transcription and message processing into a separate subcellular compartment that has only limited communication to the cytoplasm, which contains the systems needed for translation of the mRNA and post-translational modification of the proteins, as well as targeting proteins to appropriate subcellular or extracellular locations.

Limited levels of control over gene expression in prokaryotic cells: In a prokaryotic cell translation normally starts immediately after transcription, and relatively little modification of the proteins is needed to make them functional. Thus, with the exception of various types of allosteric modification of protein function, nearly all control over functional gene expression is at the transcriptional level. Allosteric controls that we have observed during the semester include modification of function of operon repressor proteins by substrates (e.g. allolactose) or end products (e.g. tryptophan), and feedback inhibition, in which the first enzyme of a pathway is allosterically inhibited by the end product of that pathway, such as inhibition of conversion of chorismate to anthranilate by tryptophan or methyl-tryptophan.

Selective gene expression in eukaryotic cells: Individual cells in multicellular organisms exhibit highly selective gene expression. Differential gene expression allows a complex organism to contain diverse types of differentiated cells, which all have identical genomes, but nevertheless display widely different biochemical and structural properties. Thus, the genes expressed as functional proteins are very different in a muscle cell than in a liver cell or a red blood cell or a hormone-secreting cell of the anterior pituitary gland or the pancreas. In addition, there are many genes that are expressed only during specific parts of the life cycle, such as embryonic development. There are also numerous gene families whose members are expressed differentially during development and maturation, such as the globin gene family (Chapter 17, pages 509-511). To achieve this level of sophistication in control of gene expression, it is necessary to have far more complex controls over gene expression in eukaryotic cells than in prokaryotic cells.

Multiple levels of control over gene expression in eukaryotic cells: Chapter 19 of our textbook begins with a brief summary of the many different levels at which control over gene expression can be achieved in eukaryotic cells (see Figure 19.1). In addition to transcriptional controls, which will be the primary subject of this lecture, the controls include regulation of splicing, transport to the cytoplasm, stability of the mRNA, translation, and a variety of posttranslational modifications, as well as interactions with other gene products that modify the biological activity of the final protein products. We will briefly examine a number of these controls in the next lecture.

Eukaryotic RNA polymerases: Eukaryotic cells contain three different types of RNA polymerases in their nuclei, each of which has a distinctly different role. RNA polymerase I transcribes only ribosomal RNAs (18S, 28S, and 5.8S). The initial transcript of RNA polymerase I is a large precursor of all three of these rRNAs, which is then processessed to yield the final rRNAs. RNA polymerase III transcribes a number of small RNA species that do not have protein coding functions, including transfer RNAs and 5S ribosomal RNA (distinct from 5.8S), as well as a number of small nuclear RNAs (snRNAs) that are involved in functions such as splicing of mRNA. RNA polymerase II transcribes all protein-coding sequences in eukaryotic cells, and is the only one of the three polymerases that we will analyze in any detail in this course. However, it should be noted that the other two polymerases also have complex regulatory interactions with transcription factors that are similar in principle to those we will be studying for RNA polymerase II.

Eukaryotic promoter sites: As described in lecture 23, eukaryotic promoter have highly diverse structures, such that it has not yet become possible to define generalized sequences that all promoters must have in order to function. Many promoters contain two consensus sequences: 1) a TATA box located about 30 bp upstream from the transcriptional start site, with a generalized consensus sequence of TATAAA that is rather variable from one promoter to another: and 2) a CCAAT box located somewhere around -75, with a consensus sequence of GGCCAATCT, again with substantial variation. However, as noted in lecture 23, there are numerous promoters that work perfectly well without either a TATA box or a CCAATbox.

Additional concensus sites: There are also a number of other consensus sequences that frequently occur in eukaryotic promoters or just upstream from them, as well as in enhacers (described below). These serve as binding sites for a wide variety of protein transcription factors that increase or decrease the extent of transcription, often in a tissue specific manner. One such sequence is the GC box, which has a consensus sequence of GGGCGG. Although CG boxes are often encountered upstream from the CCAAT box of TATA box promoters, they also frequently appear as the most recognizable sequence in promoters that lack a TATA box.

Transcription factor binding sites: Many other consensus sequences have been identified as specific binding sites for transcription factors. Much of the fine tuning of transcriptional control in eukaryotic cells is believed to be achieved in a combinatorial manner by complex interactions of multiple transcription factors with multiple consensus binding sequences for those factors.

Analysis of promoter sequences: Figure 19.4 demonstrates one of several methods of analysis of the functions of specific sequences within the general promoter region, in this case for the beta-globin gene. In this example, mutations have been introduced at specific sites and the effect on transcriptional initiation has been determined, probably with a reporter gene attached to the modified promoter. Point mutations in the TATA box and in the CCAAT box both reduce transcriptional initiation substantially but do not totally abolish it (emphasizing that these are consensus sequences whose function is weakened, but not totally abolished by individual base changes). Mutations at most points between the two have little effect. However, in this example, mutations in a third sequence, GCCACACCC, whose function is not defined in the text, also seriously impair transcription. Note also that mutations just to the left of the CCAAT box, substantially increase transcriptional initiation, suggesting the presence of a down-regulatory element in the wild-type promoter.

Enhancers and Silencers: In addition to the cis -acting sequences that are generally considered to be part of the promoter itself, a number of other cis- acting sequences can also influence the extent of transcription of a particular gene. Enhancers may sometimes be located immediately adjacent to promoter sequences, but they also function from more remote locations, up to thousands of base pairs upstream or downstream from the promoter, including the possibility of being located within the introns of the gene whose transcription is being enhanced. They also have the property of retaining their activities when inserted in a reversed direction. Enahncers appear to function as binding sites for gene-specific transcription factors that are capable of interaction with the overall transcriptional complex, probably through a bending of the DNA (Figure 19.13), which can happen even when the enhancer is quite distant from the promoter. Silencers are very similar, except that their function is to reduce or stop transcription of a particular gene, rather than to activate it. There are also upstream activator sequences (UAS) in yeast that are very similar in function to enhancers except that they are inactive when placed downstream from the transcriptional start site.

Eukaryotic transcription factors: The transcription factors for RNA polymerase II fall into two overall categories, commonly referred to as general transcription factors and gene-specific transcription factors. The general transcription factors interact to form a preinitiation complex, which is needed for the RNA polymerase to attach to the DNA and initiate transcription. In the case of RNA polymerase II, the amount of transcriptional initiation is quite low when only the preinitiation complex and RNA polymerase II are present. Gene-specific transcription factors are also needed to achieve a high level of transcription. The gene specific factors are believed to function primarily through interaction with upstream promoter and enhancer sequences, although in many cases silencers or other down-regulatory elements are also involved in achieving the final level of transcription that is characteristic of the expression of a particular gene in a particular type of cell.

Properties of transcription factors: As a minimum, a transcription factor must possess two distinctly different domains. The first is a sequence-specific DNA binding domain. The factor must be able to recognize a specific DNA sequence and bind to it sufficiently strongly to be able to interact with the transcriptional initiation complex, which often requires bending the DNA (figure 19.13). In addition, the transcription factor must have a specific transcriptional activation domain that will interact with the transcriptional initiation complex in a manner that increases the rate of transcription. The textbook emphasizes the role of negative charges in the activation domain, but the overall story is far more complex, with several quite different classes of activation domains that superficially do not appear to share much in common structurally. The DNA binding domains are also of several different types, but these are much better defined. Three of these are described in the text and discussed briefly below.

Helix-turn-helix (HTH) DNA-binding domain: Variations of the HTH domain are found both in prokaryotic and eukaryotic cells. The lac and trp repressors that we have studied in previous lectures both belong to the HTH group. The configuration consists of two or more alpha-helical segments of protein, separated by sharp turns, such that the helices tend to lie across one another (Fig 19.8). This arrangement allows one of the helices to fit into the wide groove of the DNA, where its amino acid side chains can interact with the edges of stacked base pairs. A similar structure in eukaryotic cells is often referred to as a homeodomain. This name is based on a number of developmental regulatory proteins originally discovered in Drosophila as the products of genes whose mutations cause homeotic developmental changes, which alter the fates of particular tissues. The Antennepedia mutation that converts the antenna structures on the head of a Drosophila to legs (see figure 20.22, page 582 of the textbook) is an example of a homeotic mutation.

Zinc fingers: Another common DNA binding domain is the so-called zinc finger. The zinc finger contains a cluster of histidine and/or cysteine residues that form a coordination complex with zinc ion to fold the protein into a short finger-like loop. The amino acids in the loop interact with base pairs in the wide groove of the DNA in a sequence-specific manner. The textbook attempts to illustrate this interaction in Figure 19.9, but less than successfully.

Leucine zippers: The third DNA binding motif described in the textbook is the leucine zipper. In this case, two molecules of the transcription factor are bound together as a dimer by a series of leucine molecues spaced every second turn of an alpha helix. Hydrophobic interactions between the leucine side chains cause a strong dimerization, which aligns adjacent basic sequences in the two proteins such that they can fit into the wide groove of the DNA in a sequence-specific manner (figure 19.10).

Assembling the transcriptional initiation complex: The textbook briefly summarizes the steps that are involved in the assembly of a transcriptional initiation complex at the TATA box of a promoter (Figure 19.12). The process begins with a TATA box binding protein (TBP) and a group of interacting proteins known as TATA-associated factors (TAF). This initial complex of proteins is known historically as TFIID (transcription factor D for RNA polymerase II). The actual interaction with the TATA box sequence of the DNA is through the TBP component (figure 19.11). This initial binding event causes binding sites for TFIIA and TFIIB to be formed. Binding of those factors opens the way for addition of TFIIF and RNA polymerase II, which itself is a complex of a number of molecular species. The further addition of TFIIE, TFIIH, and TFIIJ completes the initiation complex and allows a minimal level of transcription to be initiated at a precisely defined site downstream from the TATA box. Further enhancement (or repression) of that transcription is achieved by interactions with additional transcription factors that bind to upstream promoter elements, enhancers, and silencers. See figures 19.5 and 19.6 for examples of the levels of complexity that can be encountered in these sites.

Regulation of transcription factor activity: In many cases, the activity of transcription factors can be modified by environmental signals. The example cited in the textbook is the steroid hormone receptors, which function as ligand-activated transcription factors. In addition to DNA-binding sites and transcriptional activator sites, these receptors also contain hormone-binding sites. Allosteric changes that occur when the hormones are bound to their receptors allow the DNA binding sites to interact with specific DNA sequences known as hormone-responsive elements (HRE)(table 19.1). The interaction in this case involves zinc fingers. Binding of the activated receptor protein to the appropriate HRE allows its transcriptional activating domain to interact with an initiation complex and greatly increase the transcription of specific hormone-responsive genes.

Other control systems: Although not discussed in the textbook, there are also many other ways to modify the activity of specific transcription factors. One of the most common is phosphorylation by specific protein kinases, which in turn are often activated through extended signal transduction pathways, which transmit signals from cell surface receptors for peptide hormones or growth factors to the kinases. Yet another way of activating a transcription factor is to dissociate an inhibitor from it. The whole story is far more complex than can be presented in a first level course. You will hear a lot more about these topics in MCDB 3120, MCDB 3500, and MCDB 4650.

Methylation of inactive genes: On a gross chromosomal level, regions that carry genes that are inactive in a particular tissue are often heavily methylated on the inner cytosine residue of CCGG sequences (which are pallindromic). This phenomenon can be studied with a pair of restriction endonucleases, Hpa II, which cuts at CCGG sequences only when they are not methylated, and Msp I, which cuts all CCGG sites, whether or not they are methylated. You will recall the use of Hpa II to detect regions of active gene function in the search for the cystic fibrosis gene.

Induced demethylation: Incorporation of the base analog 5-azacytosine into DNA in the place of cytosine makes methylation impossible. Methylation is maintained by an enzyme that detects a methyl in the parent strand of the CCGG pallindrome and then places a methyl in the comparable position in the newly synthesized strand. Thus, incorporation of 5-azacytosine during a single cycle of DNA synthesis is enough to permanently remove a methylated site by removing the template needed for methylation of subsequent generations. (Note that free pyrimidines are not readily taken up by the cells of most types of multicellular organisms. Because of this, the analog is normally presented as the nucleoside, 5-azacytidine). This technique has been used to reactivate expression of genes from an inactive X-chromosome in cultured cells from female mammals, and in certain cases, to alter the differentiated state of cultured cells. Another interesting potential application that is in clinical trials is to use 5-azacytidine treatment in an attempt to reactivate fetal globin genes and thus alleviate the symptoms of sickle cell anemia by allowing epsilon and gamma globins to replace the defective beta-globin in diseased individuals.