Revised September 26, 2000

Lecture 12, MCDB 2150, Fall 2000

Control of eukaryotic gene expression

Textbook Assignment: Chapter 8, Pages 237 - 244. You may also want to review those parts of pages 58 - 73 (lecture 4) that describe transcription by RNA polymerase II in eukaryotic cells.

Major concepts

Introduction: This lecture builds on the basic information about transcription of protein-coding genes by RNA polymerase II that was presented in lecture 4 (textbook pages 58-73). This lecture focuses on mechanisms that control the extent of transcription of specific genes in various types of cells and under various environmental conditions. Control over expression of genes coding for the enzymes needed for utilization of galactose by yeast cells is examined as an example of such controls. MCDB 3500 will explore eukaryotic gene regulatory mechanisms in far greater depth than is possible in this initial survey course.

Limited levels of control over gene expression in prokaryotic cells: In a prokaryotic cell translation normally starts immediately after transcription, and relatively little modification of the proteins is needed to make them functional. Thus, with the exception of various types of allosteric modification of protein function, nearly all control over functional gene expression is at the transcriptional level. Allosteric controls that we have already observed include modification of function of operon repressor proteins by substrates (e.g. allolactose) or end products (e.g. tryptophan). Additional examples will be examined in the next lecture.

Selective gene expression in eukaryotic cells: Individual cells in multicellular organisms exhibit highly selective gene expression. Differential gene expression allows a complex organism to contain diverse types of differentiated cells, nearly all of which have identical genomes, but nevertheless display widely different biochemical and structural properties. Thus, the genes expressed as functional proteins are very different in a muscle cell than in a liver cell or a skin cell or a hormone-secreting cell of the anterior pituitary gland or the pancreas. In addition, there are many genes that are expressed only during specific parts of the life cycle, such as embryonic development. There are also numerous gene families whose members are expressed differentially during development and maturation, such as the globin gene family (described on page 313 of our textbook -- see figure 10.27). To achieve this level of sophistication in control of gene expression, it is necessary to have far more complex controls over gene expression in eukaryotic cells than in prokaryotic cells. This lecture examines transcriptional controls. Controls at diverse other levels between the gene and the final functional gene product will be examined in the next lecture.

Eukaryotic promoter sites: As described briefly in lecture 4, eukaryotic promoters for RNA polymerase II have highly diverse structures, such that it has not yet become possible to define generalized sequences that all promoters must have in order to function. Many promoters contain two consensus sequences: 1) a TATA box located about 30 bp upstream from the transcriptional start site, with a generalized consensus sequence of TATAAA that is rather variable from one promoter to another: and 2) a CCAAT box located somewhere around -75, with a consensus sequence of GGCCAATCT, again with substantial variation. However, as noted in lecture 4, there are numerous promoters that work perfectly well without either a TATA box or a CCAAT box.

Additional concensus sites: There are also numerous other consensus sequences that influence the expression of eukaryotic genes. These sequences are sometimes found as parts of the promoters for specific genes, and other cases may be located quite far from the immediate promoters. Sequences that must remain reasonably close to the transcriptional start site and that are inactive when reversed in direction are usually considered to be part of the promoter. Sequences that can stimulate transcription from a greater distance either upstream or downstream from the transcriptional start site and that remain active when reversed in polarity are called enhancers. Sequences with similar properties that serve to reduce or stop transcription are called silencers. There are also upstream activator sequences (UAS) in yeast that are very similar in function to enhancers except that they are inactive when placed downstream from the transcriptional start site.

Binding sites for activators and repressors: Many of the consensus sequences associated with promoters, enhancers, upstream activating sequences, and silencers have been identified as specific binding sites for regulatory protiens known as activators (also called transcription factors) and repressors. Much of the fine tuning of transcriptional control in eukaryotic cells is believed to be achieved in a combinatorial manner by complex interactions of multiple activators and repressors with multiple consensus binding sequences for those factors. The nomenclature for these factors is in flux. Our textbook frequently uses the newer term "activators" to describe protein factors that are not part of the basal transcription initiation complex, but serve instead to regulate transcription from sites outside the immediate minimal promoter (figure 8.20). Many older textbooks refer to these as a subclass of transcription factors, and our textbook also tends to revert to that nomenclature when describing the details of these factors. .

Eukaryotic transcription factors: The transcription factors for RNA polymerase II fall into two overall categories, commonly referred to as general transcription factors and gene-specific transcription factors. The general transcription factors interact to form a preinitiation complex which is needed for the RNA polymerase to attach to the DNA and initiate transcription (as shown in figure 3.10). In the case of RNA polymerase II, the amount of transcriptional initiation is quite low when only the preinitiation complex and RNA polymerase II are present. Gene-specific transcription factors (activators) are also needed to achieve a high level of transcription (figures 3.11 and 8.20). The gene specific factors are believed to function primarily through interaction with additional promoter-associated binding sites and with more remote enhancer sequences, although in many cases silencers or other down-regulatory elements are also involved in achieving the final level of transcription that is characteristic of the expression of a particular gene in a particular type of cell (figure 8.20). Note that a variety of proteins known as "coactivators" form links between the activators and the basal transcription complex.

Properties of transcription factors: As a minimum, activators and other transcription factors that interact with specific consensus sequences in DNA must possess two distinctly different domains. The first is a DNA binding domain, which must be capable of recognizing a specific DNA sequence and binding firmly to it. The second is a specific transcriptional activation domain that is capable of interacting with coactivators in the transcriptional initiation complex in a manner that increases the rate of transcription. These interactions often result in bending the DNA (figure 8.20), which is believed to help open the double helix and facilitate the initiation of transcription.

DNA binding domains: In order to read the sequence of a double helical DNA molecule, the DNA binding domain of a transcription factor must fit into the wide groove and interact with the exposed edges of the flat base pairs. This requires a projecting portion of the protein molecule that is small enough to fit into the wide groove and that possesses the right distribution of charges and hrydophobic groups to interact with the edges of a stack of base pairs arranged in a particular sequence. In many cases (but not all), the regulatory sequences on the DNA are at least partially palindromic or tandemly repeated. Such sequences are usually read by transcription factor dimers that have their two DNA binding domains positioned appropriately to read the two halves of the palindromic or repeated sequence (figures 8.24 and 8.25). Our textbook emphasizes three widely used types of DNA binding domains.

Helix-turn-helix (HTH): Variations of the helix-turn-helix domain are found both in prokaryotic cells and in eukaryotic cells. The lac and trp repressors that we have studied in previous lectures both have HTH DNA-binding domains. The configuration consists of two or more alpha-helical segments of protein, separated by sharp turns, such that the helices tend to lie across one another (Fig 8.23). This arrangement allows one of the helices to fit into the wide groove of the DNA, where its amino acid side chains can interact with the edges of stacked base pairs. A similar structure in eukaryotic cells is often referred to as a homeodomain. This name is based on a number of developmental regulatory proteins originally discovered in Drosophila as the products of genes whose mutations cause homeotic developmental changes, which alter the fates of particular tissues. The Antennepedia mutation that converts the antenna structures on the head of a Drosophila to legs is an example of a homeotic mutation.

Zinc fingers: Another common DNA binding domain is the so-called zinc finger. The zinc finger contains a cluster of histidine and/or cysteine residues that form a coordination complex with zinc ion to fold the protein into a short finger-like loop. The amino acids in the loop interact with base pairs in the wide groove of the DNA in a sequence-specific manner. The textbook illustrates this interaction in figure 8.22. The steroid hormone receptors are an example of transcription factors that contain zinc fingers. The yeast GAL4 protein, discussed later in this lecture also has zinc finger DNA-binding domains (see figure 8.25 for a space-filling model of the interaction.

Leucine zippers: The third DNA binding motif described in the textbook is the leucine zipper. In this case, two molecules of the transcription factor are bound together as a dimer by a series of leucine molecues spaced every second turn of alpha helices on the surfaces of the two protein moleculess. Hydrophobic interactions between the leucine side chains cause a strong dimerization, which aligns adjacent basic amino acid sequences such that they can fit into the wide groove of the DNA to read two adjacent and usually palindromic sequence segments (figure 8.24).

Activation domains: The activation domain must interact with the preinitiation complex to increase the frequency of transcriptional initiation. Our textbook emphasizes the role of glutamine-rich regions in the activation domains of transcription factors such as sp-1 (figure 8.26). There are also several other types of activation domains that at least superficially appear to be very different structurallly from one another. These include acidic activation domains,whose activities are sometimes controlled by phosphorylation and dephosphorylation as discussed later in this lecture.

Regulation of transcription factor activity: Two examples of modification of the activity of transcription factors by environmental signals are discussed below. The first is allosteric activation by the binding of specific ligands, such as steroid hormones, and the second consists of switching acidic activation domains on and off by phosphorylation and dephosphorylation.

Steroid hormone receptors: Our textbook uses the steroid hormone receptors as an example of a broader class of ligand-activated transcription factors. In addition to DNA-binding sites and transcriptional activator sites, these receptors also contain hormone-binding sites. When steroid hormones binds to their receptors, allosteric changes occurs that allow the DNA binding sites on the receptors, which form dimers, to interact with specific DNA sequences known as hormone-responsive elements (HRE). (Please note that the receptor for each type of steroid hormone binds to a different HRE. The HRE sequence presented in the textbook is one version of the glucocorticoid response element consensus sequence, and is not applicable to steroid hormones in general. The receptors for other steroid hormones, such as estrogens, progestins, and androgens, bind to different HREs, as described in the appendix to these notes). The DNA binding domains of steroid hormone receptors contain zinc fingers. Binding of the hormone-activated receptor to the appropriate HRE allows the transcriptional activating domain of the receptor to interact with coactivators in an initiation complex and greatly increase the amount of transcription of specific hormone-responsive genes (figure 8.21). The Theoretical Biophysics Group at the University of Illinois Urbana-Champaign has a very nice web site that provides many details about members of the steroid receptor superfamily, also known as nuclear hormone receptors. You can see many additional details by following the links that are provided. In addition, many of the small figures can be enlarged by clicking on them.

Acidic activation domains: (This topic is not covered in the textbook). Activation domains that carry a strong negative charge play important roles in a variety of gene regulatory mechanisms. In some of the best studied examples, there is a very specific spatial distribution of the negative charges on one side of a protein alpha helix. In some cases the negative charge is due to the presence of acidic amino acids. However, the more interesting cases involve selective phosphorylation of hydroxyl groups carried on serine, threonine or tyrosine. Transcription factors with this type of activation domain can be activated by kinases (enzymes that phosphorylate specific residues on proteins) and inactivated by phosphatases (enzymes that remove phosphates from proteins). This allows for regulated turning on and off of transcription in response to a variety of signals. Numerous intracellular signal transduction pathways that link cell surface receptors to transcriptional regulation operate in part by sequential activation of a series of kinases leading ultimately to the activation of specific transcription factors by phosphorylation.

Other control systems: There are also many other ways to modify the activity of specific transcription factors. One example is to dissociate an inhibitor from the transcription factor. The whole story is far more complex than can be presented in a first level course. You will hear a lot more about these topics in MCDB 3120, MCDB 3500, and MCDB 4650.

Galactose metabolism in yeast: The following descripiton includes several details about the regulation of galactose metabolism in yeast that are not included in our current textbook. In order for the yeast, Saccharomyces cerevisiae to utilize galactose, it must activate a set of four genes that remain inactive when galactose is not present. The genes and the funcitons of their enzymatic products are as follows:

GAL1: Converts galactose to galactose-1-phosphate

GAL2: Galactose permease, facilitates entry of galactose into the cells

GAL7: Converts galactose-1-phosphate to UDP-galactose

GAL10: Converts UDP-galactose to UDP-glucose.

UDP-glucose then feeds into a normal pathway of glucose utilization, with the first step being conversion to glucose-1-phosphate.

Upstream activating sequences: Three of the genes whose transcription is induced by the presence of galactose, GAL1, GAL7, and GAL 10, are clustered close together on chromosome II. However, they do not form an operon. Each is transcribed separately as a monocistronic mRNA. A galactose-specific upstream activating sequence (UASG) that serves as a binding site for a galactose-specific transcription enhancing factor (the GAL4 gene product) is associated with the promoter for each of these genes. The binding domain is a 17 bp palindrome, which is repeated four times in the UAS.

Bidirectional transcription: The GAL7 promoter and its UAS have no unusual features. However, the GAL1 and GAL10 genes are transcribed in opposite directions from divergent promoters and UAS sequences that are located between the transcriptional start sites for the two genes. In the usual depiction of the gene cluster, GAL1 is shown as transcribing to the right and GAL10 to the left. GAL7 is located further to the left and also transcribes to the left.

<--GAL7-promoter-UAS............<--GAL10-promoter-UAS...UAS-promoter-GAL1-->

Requirement for GAL4 protein: Transcription of all three of these genes (and also of the GAL2 gene, which is on a separate chromosome) requires the binding of GAL4 to the UAS to be activated above a very low basal level. The GAL4 protein has the expected properties of a transcription activating factor. These include a DNA-binding domain that is specific for UASG, and two separate transcriptional activation domains that must both be intact for full activity. However, the GAL4 protein does not interact directly with galactose. Instead, it has a binding site for the GAL80 protein, and is maintained in an inactive condition when galactose is not present by being bound to that protein.

Regulatory role of GAL80 protein: The GAL80 gene product is the only protein involved in induction of the galactose utilization genes that is capable of direct interaction with galactose. In a sense, the GAL80 protein functions as a repressor, but it does not do so by binding an operator site. Instead, in the absence of galactose, the GAL80 protein binds to the GAL4 transcriptional activator protein and blocks its ability to interact with the UASG and enhance transcription of the galactose utilization genes. When galactose enters the cell, it binds to the GAL80 protein and causes it to undergo an allosteric change, which in turn causes dissociation of the GAL 80 protein from the GAL4 protein and allows the GAL4 protein to function as a positive-acting transcription factor for the galactose utilization genes (figure 8.27). The increase in transcription is presumed to be accomplished by an interaction between the activation domains of GAL 4 and the RNA polymerase II initiation complex, as depicted in Figure 8.20 for classical enhancers.

Summary: In the absence of galactose, GAL80 binds to GAL4 and prevents it from acting as a transcription-enhancing factor for GAL1, GAL2, GAL7, and GAL10. When galactose is present, it binds to GAL80, causing it to dissociate from GAL4. GAL4 then binds to the UASG associated with each of GAL1, GAL2, GAL7, and GAL10 and stimulates transcription of all four of these genes.

Methylation of inactive genes: On a gross chromosomal level, regions that carry genes that are inactive in a particular tissue often have CG doublets (which are palindromic) that are heavily methylated at the 5'-position of both cytosines. There are enzymes that make it possible for methylation of palindromic sequences to be retained in a stable fashion as cells replicate their DNA. We have already seen an example in mismatch repair in bacteria where a deoxyadenosine methylase detects a methylated adenosine in a palindromic GATC sequence in the parental strand and places a methyl in the same position in the palindrome in the newly synthesized strand. A very similar phenomenon occurs in CG doublets, which are also palindromic. The textbook provides evidence in boxed example 8.4 that the inhibitory effect of methylation on gene expression is mediated indirectly, with a specific protein factor that binds the methylated cytosines functioning as an inhibitor of expression of the methylated genes. In chapter 5 (page 136 and figure 5.8), we saw that 5-methylcytosine in DNA is frequently converted by deamination to thymine, which tends to be relatively invisible to repair systems. Over time this results in depletion of CG base pairs that are not required to remain unmethylated as components of actively expressed genes.

APPENDIX

Hormone response elements: In the ultimate analysis, the response elements for the various steroid hormone receptorss are defined as those DNA sequences that the ligand-activated receptors bind to. A typical HRE is a consensus sequence derived from a variety of individual sequences that have differing strengths as functional HREs. Not all sources cite the same consensus sequence. The glucocorticoid response element (GRE) is a good example.

The Website from the Theoretical Biophysics Group at The University of Illinois at Urbana-Champaign describes a fully palindromic GRE with three spacer nucleotide pairs in the center:

5'-AGAACAnnnTGTTCT-3'
3'-TCTTGTnnnACAAGA-5'
A website from Colorado State University suggests that the GRE is an imprefect palindrome whose right side is identical to the University of Illinois sequence.
5'-GGTACAnnnTGTTCT-3'
3'-CCATGTnnnACAAGA-5'
Note that only the first and third base pairs on the left are different.

Our textbook does not deal with the nature of HREs very thoroughly and is not entirely accurate in what it says. On page 238, it states that "Steroid hormone receptors contain a conserved sequence TGATACAAATGTTCT".

Written as a DNA double strand, that sequence is partially palindromic and somewhat similar to the CSU GRE sequence (the right side is identical and the left side partially matches if only two of the central As are considered to be represented by n instead of three in the other depictions).

5'-TGATACAnnTGTTCT-3'
3'-ACTATCTnnACAAGA-5'

Spacing between halves of palindrome: The spacing between the halves of a palindromic HRE appears to be as important as the sequences within the halves of the palindrome. From the Illinois site, we see that the estrogen response element (ERE) has a palindromic sequence of

5'-AGGTCAnnnTGACCT-3'
3'-TCCAGTnnnACTGGA-5'
whereas the thyroid hormone reesponse element (TRE) has a sequence of
5'-AGGTCATGACCT-3'
3'-TCCAGTACTGGA-5'
which is identical except for absence of the three nucleotide "spacer".

Tandem repeats: The response elements for some members of the steroid hormone receptor superfamily contain imperfect tandem repeats, rather than palindromic sequences. The retinoic acid receptor response element (RARE) is an example (from the Illinois web site).

5'-AGGTCAnnnnnAGACCA-3'
3'-TCCAGTnnnnnTCTGGT-5'