Textbook Assignment: Chapter 8, Pages 237 - 244. You may also want to review those parts of pages 58 - 73 (lecture 4) that describe transcription by RNA polymerase II in eukaryotic cells.
Major concepts
Introduction: This lecture builds on the basic information about transcription of protein-coding genes by RNA polymerase II that was presented in lecture 4 (textbook pages 58-73). This lecture focuses on mechanisms that control the extent of transcription of specific genes in various types of cells and under various environmental conditions. Control over expression of genes coding for the enzymes needed for utilization of galactose by yeast cells is examined as an example of such controls. MCDB 3500 will explore eukaryotic gene regulatory mechanisms in far greater depth than is possible in this initial survey course.
Limited levels of control over gene expression in prokaryotic cells: In a prokaryotic cell translation normally starts immediately after transcription, and relatively little modification of the proteins is needed to make them functional. Thus, with the exception of various types of allosteric modification of protein function, nearly all control over functional gene expression is at the transcriptional level. Allosteric controls that we have already observed include modification of function of operon repressor proteins by substrates (e.g. allolactose) or end products (e.g. tryptophan). Additional examples will be examined in the next lecture.
Selective gene expression in eukaryotic cells: Individual cells in multicellular organisms exhibit highly selective gene expression. Differential gene expression allows a complex organism to contain diverse types of differentiated cells, nearly all of which have identical genomes, but nevertheless display widely different biochemical and structural properties. Thus, the genes expressed as functional proteins are very different in a muscle cell than in a liver cell or a skin cell or a hormone-secreting cell of the anterior pituitary gland or the pancreas. In addition, there are many genes that are expressed only during specific parts of the life cycle, such as embryonic development. There are also numerous gene families whose members are expressed differentially during development and maturation, such as the globin gene family (described on page 313 of our textbook -- see figure 10.27). To achieve this level of sophistication in control of gene expression, it is necessary to have far more complex controls over gene expression in eukaryotic cells than in prokaryotic cells. This lecture examines transcriptional controls. Controls at diverse other levels between the gene and the final functional gene product will be examined in the next lecture.
Eukaryotic promoter sites: As described briefly in lecture 4, eukaryotic promoters for RNA polymerase II have highly diverse structures, such that it has not yet become possible to define generalized sequences that all promoters must have in order to function. Many promoters contain two consensus sequences: 1) a TATA box located about 30 bp upstream from the transcriptional start site, with a generalized consensus sequence of TATAAA that is rather variable from one promoter to another: and 2) a CCAAT box located somewhere around -75, with a consensus sequence of GGCCAATCT, again with substantial variation. However, as noted in lecture 4, there are numerous promoters that work perfectly well without either a TATA box or a CCAAT box.
Additional concensus sites: There are also numerous other consensus sequences that influence the expression of eukaryotic genes. These sequences are sometimes found as parts of the promoters for specific genes, and other cases may be located quite far from the immediate promoters. Sequences that must remain reasonably close to the transcriptional start site and that are inactive when reversed in direction are usually considered to be part of the promoter. Sequences that can stimulate transcription from a greater distance either upstream or downstream from the transcriptional start site and that remain active when reversed in polarity are called enhancers. Sequences with similar properties that serve to reduce or stop transcription are called silencers. There are also upstream activator sequences (UAS) in yeast that are very similar in function to enhancers except that they are inactive when placed downstream from the transcriptional start site.
Binding sites for activators and repressors: Many of the consensus sequences associated with promoters, enhancers, upstream activating sequences, and silencers have been identified as specific binding sites for regulatory protiens known as activators (also called transcription factors) and repressors. Much of the fine tuning of transcriptional control in eukaryotic cells is believed to be achieved in a combinatorial manner by complex interactions of multiple activators and repressors with multiple consensus binding sequences for those factors. The nomenclature for these factors is in flux. Our textbook frequently uses the newer term "activators" to describe protein factors that are not part of the basal transcription initiation complex, but serve instead to regulate transcription from sites outside the immediate minimal promoter (figure 8.20). Many older textbooks refer to these as a subclass of transcription factors, and our textbook also tends to revert to that nomenclature when describing the details of these factors. .
Eukaryotic transcription factors: The transcription factors for RNA polymerase II fall into two overall categories, commonly referred to as general transcription factors and gene-specific transcription factors. The general transcription factors interact to form a preinitiation complex, which is needed for the RNA polymerase to attach to the DNA and initiate transcription. In the case of RNA polymerase II, the amount of transcriptional initiation is quite low when only the preinitiation complex and RNA polymerase II are present. Gene-specific transcription factors (activators) are also needed to achieve a high level of transcription. The gene specific factors are believed to function primarily through interaction with upstream promoter and enhancer sequences, although in many cases silencers or other down-regulatory elements are also involved in achieving the final level of transcription that is characteristic of the expression of a particular gene in a particular type of cell (figure 8.20). Note that a variety of proteins known as "coactivators" form links between the activators and the basal transcription complex.
Properties of transcription factors: As a minimum, a transcription factor must possess two distinctly different domains. The first is a sequence-specific DNA binding domain. The factor must be able to recognize a specific DNA sequence and bind to it sufficiently strongly to be able to interact with the transcriptional initiation complex, which often requires bending the DNA (figure 8.20). In addition, the transcription factor must have a specific transcriptional activation domain that will interact with coactivators in the transcriptional initiation complex in a manner that increases the rate of transcription. The textbook emphasizes the role of a glutamine-rich region in the activation domain of transcription factors such as sp-1 (figure 8.26). There are also several other types of activation domains that superficially do not appear to share much in common structurally. One other type you should be aware of carries a strong negative charge, either because of the presence of acidic amino acids, or because it has been activated by phosphorylation of hydroxyl groups carried on serine, threonine or tyrosine.
Helix-turn-helix (HTH) DNA-binding domain: Variations of the HTH domain are found both in prokaryotic and eukaryotic cells. The lac and trp repressors that we have studied in previous lectures both belong to the HTH group. The configuration consists of two or more alpha-helical segments of protein, separated by sharp turns, such that the helices tend to lie across one another (Fig 8.23). This arrangement allows one of the helices to fit into the wide groove of the DNA, where its amino acid side chains can interact with the edges of stacked base pairs. A similar structure in eukaryotic cells is often referred to as a homeodomain. This name is based on a number of developmental regulatory proteins originally discovered in Drosophila as the products of genes whose mutations cause homeotic developmental changes, which alter the fates of particular tissues. The Antennepedia mutation that converts the antenna structures on the head of a Drosophila to legs is an example of a homeotic mutation.
Zinc fingers: Another common DNA binding domain is the so-called zinc finger. The zinc finger contains a cluster of histidine and/or cysteine residues that form a coordination complex with zinc ion to fold the protein into a short finger-like loop. The amino acids in the loop interact with base pairs in the wide groove of the DNA in a sequence-specific manner. The textbook illustrates this interaction in figure 8.22. The steroid hormone receptors are an example of transcription factors that contain zinc fingers.
Leucine zippers: The third DNA binding motif described in the textbook is the leucine zipper. In this case, two molecules of the transcription factor are bound together as a dimer by a series of leucine molecues spaced every second turn of an alpha helix. Hydrophobic interactions between the leucine side chains cause a strong dimerization, which aligns adjacent basic sequences in the two proteins such that they can fit into the wide groove of the DNA in a sequence-specific manner (figure 8.24).
Regulation of transcription factor activity: In many cases, the activity of transcription factors can be modified by environmental signals. Steroid hormone receptors, which function as ligand-activated transcription factors are a good example. In addition to DNA-binding sites and transcriptional activator sites, these receptors also contain hormone-binding sites. When steroid hormones binds to their receptors, allosteric changes occurs that allow the DNA binding sites on the receptors, which form dimers, to interact with specific DNA sequences known as hormone-responsive elements (HRE). (Please note that the receptor for each type of steroid hormone binds to a different HRE. The HRE sequence presented in the textbook is one version of the glucocorticoid response element consensus sequence, and is not applicable to steroid hormones in general. The receptors for other steroid hormones, such as estrogens, progestins, and androgens, bind to different HREs). The interaction between a steroid hormone receptor and its HRE involves zinc fingers. Binding of the hormone-activated receptor to the appropriate HRE allows the transcriptional activating domain of the receptor to interact with coactivators in an initiation complex and greatly increase the amount of transcription of specific hormone-responsive genes (figure 8.21). The Theoretical Biophysics Group at the University of Illinois Urbana-Champaign has a very nice web site that provides many details about members of the steroid receptor superfamily, also known as nuclear hormone receptors. You can see many additional details by following the links that are provided. In addition, many of the small figures can be enlarged by clicking on them.
Other control systems: Although not discussed in the textbook, there are also many other ways to modify the activity of specific transcription factors. One of the most common is phosphorylation by specific protein kinases, which in turn are often activated through extended signal transduction pathways, which transmit signals from cell surface receptors for peptide hormones or growth factors to the kinases. Yet another way of activating a transcription factor is to dissociate an inhibitor from it. The whole story is far more complex than can be presented in a first level course. You will hear a lot more about these topics in MCDB 3120, MCDB 3500, and MCDB 4650.
Methylation of inactive genes: On a gross chromosomal level, regions that carry genes that are inactive in a particular tissue often have CG doublets (which are palindromic) that are heavily methylated at the 5'-position of both cytosines. There are enzymes that make it possible for methylation of palindromic sequences to be retained in a stable fashion as cells replicate their DNA. We have already seen an example in mismatch repair in bacteria where a deoxyadenosine methylase detects a methylated adenosine in a palindromic GATC sequence in the parental strand and places a methyl in the same position in the palindrome in the newly synthesized strand. A very similar phenomenon occurs in CG doublets, which are also palindromic. The textbook provides evidence in boxed example 8.4 that the inhibitory effect of methylation on gene expression is mediated indirectly, with a specific protein factor binding for the methylated cytosines functioning as an inhibitor of expression of the methylated genes.
Galactose metabolism in yeast: The following descripiton includes several details about the regulation of galactose metabolism in yeast that are not included in our current textbook. In order for the yeast, Saccharomyces cerevisiae to utilize galactose, it must activate a set of four genes that remain inactive when galactose is not present. The genes and the funcitons of their enzymatic products are as follows:
GAL1: Converts galactose to galactose-1-phosphateGAL2: Galactose permease, facilitates entry of galactose into the cells
GAL7: Converts galactose-1-phosphate to UDP-galactose
GAL10: Converts UDP-galactose to UDP-glucose.
UDP-glucose then feeds into a normal pathway of glucose utilization, with the first step being conversion to glucose-1-phosphate.
Upstream activating sequences: Three of the genes whose transcription is induced by the presence of galactose, GAL1, GAL7, and GAL 10, are clustered close together on chromosome II. However, they do not form an operon. Each is transcribed separately as a monocistronic mRNA. A galactose-specific upstream activating sequence (UASG) that serves as a binding site for a galactose-specific transcription enhancing factor (the GAL4 gene product) is associated with the promoter for each of these genes. The binding domain is a 17 bp palindrome, which is repeated four times in the UAS.
Bidirectional transcription: The GAL7 promoter and its UAS have no unusual features. However, the GAL1 and GAL10 genes are transcribed in opposite directions from divergent promoters and UAS sequences that are located between the transcriptional start sites for the two genes. In the usual depiction of the gene cluster, GAL1 is shown as transcribing to the right and GAL10 to the left. GAL7 is located further to the left and also transcribes to the left.
<--GAL7-promoter-UAS............<--GAL10-promoter-UAS...UAS-promoter-GAL1-->
Requirement for GAL4 protein: Transcription of all three of these genes (and also of the GAL2 gene, which is on a separate chromosome) requires the binding of GAL4 to the UAS to be activated above a very low basal level. The GAL4 protein has the expected properties of a transcription activating factor. These include a DNA-binding domain that is specific for UASG, and two separate transcriptional activation domains that must both be intact for full activity. However, the GAL4 protein does not interact directly with galactose. Instead, it has a binding site for the GAL80 protein, and is maintained in an inactive condition when galactose is not present by being bound to that protein.
Regulatory role of GAL80 protein: The GAL80 gene product is the only protein involved in induction of the lactose utilization genes that is capable of direct interaction with galactose. In a sense, the GAL80 protein functions as a repressor, but it does not do so by binding an operator site. Instead, in the absence of galactose, the GAL80 protein binds to the GAL4 transcriptional activator protein and blocks its ability to interact with the UASG and enhance transcription of the lactose utilization genes. When galactose enters the cell, it binds to the GAL80 protein and causes it to undergo an allosteric change, which in turn causes dissociation of the GAL 80 protein from the GAL4 protein and allows the GAL4 protein to function as a positive-acting transcription factor for the galactose utilization genes (figure 8.21). The increase in transcription is presumed to be accomplished by an interaction between the activation domains of GAL 4 and the RNA polymerase II initiation complex, as depicted in Figure 8.20 for classical enhancers.
Summary: In the absence of galactose, GAL80 binds to GAL4 and prevents it from acting as a transcription-enhancing factor for GAL1, GAL2, GAL7, and GAL10. When galactose is present, it binds to GAL80, causing it to dissociate from GAL4. GAL4 then binds to the UASG associated with each of GAL1, GAL2, GAL7, and GAL10 and stimulates transcription of all four of these genes.