Revised October 9, 2000
Lecture date: Friday, October 13, 2000

Lecture 18, MCDB 2150, Fall 2000

Eukaryotic Genomic Organization, Chromosomes

Textbook Assignment: Chapter 10, Pages 295-319. There is an error in the numbering of chromosomes in Figure 10.31. Those labeled 13-18 should be 7-12 and those labeled 7-12 should be 13-18. Also, in the legend, this figure provides data for problems 8 and 10, not 8 and 11.

Major concepts:

Sizes of eukaryotic genomes: Eukaryotic genomes are consistently larger than those of prokaryotic cells, but within the eukaryotes there is massive variability in genome size. The genome of the budding yeast Saccharomyces cerevisiae has been completely sequenced and is composed of about 13 million nucleotide pairs (as compared to 4.6 million for E. coli (the value for E. coli used in these notes is based on sequencing data (table 7.5 and the web pages cited in the notes for lectures 9 and 10). The value on page 295 appears to be a pre-sequencing estimate. The human genome is about 3 x 109 base pairs, and most other mammals that have been adequately studied seem to fall into the range of 2 - 3 X 109 base pairs. Some amphibians and some plants have genomes as large as 7.5 x 1010 base pairs (25 times the size of the human genome).

C-value paradox: The total amount of DNA contained in a haploid genome is sometimes referred to as the C value for the species. The extremely large amount of DNA in the genomes of complex eukaryotes, which far exceeds the amount that appears to be needed for protein coding, is sometimes referred to as the C-value paradox. In humans, for example, only about 5% of the genomic DNA appears to consist of protein-coding and other functional genes (such as ribosomal and transfer RNAs).

Repetitive DNA sequences: Some of the excess DNA is present as highly repetitive sequences of various types, including a number of so-called transposable elements, which have (or apparently once had) the ability to move to different locations in the genome, in some cases duplicating themselves in the process. These transposable elements will be studied in detail in MCDB 3500. If you are curious about them, Chapter 22 of our textbook, which will not be covered in this course, provides an overview of the various types and how they move to new locations. Many of the repetitive DNAs have no known function and are often referred to as "junk DNA" or as "selfish DNA" that seems to have no purpose other than its own reproduction. The Alu element, which is unique to the human genome and has no known function, is present in about 500,000 copies, dispersed rather uniformly throughout the entire genome. (For molecular biologists, the widely dispersed human Alu element often "functions" as an identifying marker for human DNA).

Some repetitive DNA is functional: Some of the repetitive DNA sequences in humans and other "higher" organisms do appear to have specific functions, such as those that make up the centromeres and telomeres of the chromosomes. In addition, the genes coding for ribosomal and transfer RNAs are present in muliple copies per genome. It should also be pointed out that a large fraction of the unique sequence DNA (sequences occurring only once per haploid genome) has no known genetic function.

DNA renaturation experiments: The textbook describes DNA renaturation experiments that permit the relative amounts of highly repetitive, moderately repetitive, and unique sequence DNA to be evaluated based on rates of renaturation. The basic principle is that after denaturation, there are more copies of highly repetitive sequences, such that they find their partners and anneal rapidly, whereas unique sequences require much longer to anneal and moderately repetitive sequences require an intermediate time. It is important to remember that we are talking about relative numbers of copies, because any sample of reasonable size contains many copies of all of the sequences. To provide some degree of standardization that allows for different sized DNA fragments, and different total concentrations of DNA, the renaturation experiments are described in terms of total initial concentration of nucleotides (not DNA molecules) multiplied by the amount of time that the renaturation requires (referred to as C0t values). The usual unit of measurement is the C0t value for renaturation of 1/2 of a particular frequency class of DNA. From such measurements, it becomes possible to calculate the complexity (the total number of nucleotides in non-repeated sequences) of a DNA preparation (figure 10.6). It will not be necessary to learn details of these procedures for this course.

Chromatin: Eukaryotic DNA normally associates closely with a variety of proteins to form a complex called chromatin. A group of strongly basic proteins called histones bind very tightly to the acidic DNA, forming nucleosomes, as described below. In addition, chromatin contains a wide variety of non-histone chromatin proteins (NCHP), including all of the enzymes and regulatory proteins needed for selectively controlled transcription, as described in previous lectures.

Nucleosomes and DNA packing: Two copies each of four different types of histones (H2A, H2B, H3, and H4) form the core of a beadlike structure with 140 - 180 nucleotide pairs of DNA wrapped around it, known as a nucleosome (figure 10.8). Nucleosomes occur relatively close together on a long DNA molecule, typically separated by about 50 base pairs of linker DNA. DNA that is optimally packed into nucleosomes forms a fiber about 10 nm in diameter. That fiber in turn is coiled into a helix to form a 30 nm fiber. Higher orders of packing also occur, particularly during the condensation of chromatin into dense chromosomes that are visible with the light microscope during cell division (figure 10.8). Additional "linker" histones are believed to be involved in higher order packing, and may also play a role in transcriptional control. Nucleosome structure is temporarily disrupted during transcription, but not totally lost except in very intensely transcribed genes, such as those coding for ribosomal RNA. The promoter must be free of nucleosomes, however, in order for the transcriptional initiation complex to form. There also must be some degree of dissociation of nucleosomes during DNA replication, but experimental data have shown that the histones do not totally separate from the DNA during replicaiton. .

Chromosomal structure: The text presents a rather detailed discussion of chromosomal structure as it is observed at the light and electron microscope levels. Most of this should be a review of material presented in introductory biology. Although we will not spend much time on this material in lecture, it is important to be sure that you understand it. In brief summary, each chromosome consists of one very long continuous DNA double helix, complexed with a variety of histone and non-histone proteins For those who might wish to pursue this issue further, there is a web page that presents the exact size of each of the human chromosomes in megabases. The data from this web page yield a total length of all human chromosomes of 3227 million base pairs for a haploid genome with an X chromosome and 3122 million with a Y chromosome, slightly larger than the 3000 million estimate used in our textbook, but well within the range that various investigators are estimating. If the numbers on the web site are valid, human chromosome 1 contains 263 million base pairs, corresponding to an extended DNA double helix 8.9 cm in length. Even the smallest human chromosome, which contains about 50 million base pairs would have a length of about 1.7 cm when fully extended as linear double helical DNA.

Packaging of eukaryotic DNA: The huge amount of DNA in eukaryotic cells makes very compact packaging necessary. Multiple higher orders of coiling and folding are needed to pack all of the DNA that is contained in a typical eukaryotic cell into the interphase nucleus. Even further condensation is needed to generate the compact chromosomes that are visible with a light microscope during cell division. If we accept values of 3 x 109 base pairs and 3.4 Å per base pair, the fully extended double helical DNA in a single copy of the human genome is just over one meter in length (even longer if the values from the web page are correct). Because human cells are diploid, each cell thus contains just over two meters of DNA in a nucleus that is on the order of 10 µm in diameter. Figure 10.7 shows the huge amount of DNA that spills out from a typical mammalian chromosome when the histone proteins that normally keep the DNA compacted are removed. You may want to look at this and related figures in the original journal article (Cell 12, 817-878, 1977), which is can be found in the Norlin Science stacks. The original electron micrograph is far more dramatic than the reproduction in our textbook.

Centromeres: Each eukaryotic chromosome has a dense constricted area called a centromere that serves as a point of attachment for spindle fibers during cell division. Centromeres contain repeated DNA sequences, but those sequences vary substantially from one species to another. Human centromeres contain a tandemly repeated sequence of about 170 bp called an alphoid sequence. A small portion of that sequence, called a CENP-box is conserved among other mammalian species. Prior to the development of modern banding techniques (figure 10.8), the position of the centromere was one of very few identifiable morphological characteristics that could be used to identify specific chromosomes (figure 10.31). Four classes of chromosomes are generally recognized with respect to centromere position (figure 10.15).

Kinetochores: Kinetochores are protein bodies that attach to the outside surfaces of the centromeres of condensed mitotic chromosomes. They are involved in the binding of spindle fibers to the chromosomes during mitosis, as described in the next lecture (figure 11.4). One should not confuse centromeres and kinetochores. Centromeres are integral parts of the chromosome, organized around specific repetitive DNA sequences, whereas kinetochores are attached protein bodies.

Chromatids: During the time period between replication of chromosomal DNA (which immediately adds proteins to form new chromatin) and the separation of daughter chromosomes during cell division (mitosis), there are two side-by-side copies of each chromosome, still attached to each other at their centromeres. The duplicated chromosome is still considered to be one chromosome until the centromeres separate and the two daughter chromosomes move toward opposite spindle poles during mitosis. Until this happens, each of the duplicated, but still attached, chromosomal structures is called a chromatid, and the attached pair are referred to as sister chromatids (figure 10.14). Because chromosomes are usually pictured as paired sister chromatids, based on their appearance during mitotic metaphase (figure 10.31), it is important to remember that prior to DNA replication and at all times in non-multiplying cells, each chromosome consists of a single chromatid. Because the single chromatids decondense rapidly after completion of cell division (see telophase in figure 11.4) and duplication of chromatids occurs prior to the beginning of condensation for the next round of cell division, chromosomes are seldom seen, photogaphed, or depicted as single chromatids. However, they exist in that state for a substantial part of the mitotic cell cycle, which we will discuss in the next lecture.

Telomeres: The DNA in eukaryotic chromosomes is linear, rather than circular. As discussed briefly in chapter 2, this creates special problems at the ends of the chromosome related to priming and primer removal in the lagging strand during DNA replication. These problems have been overcome by adding a repeated sequence to the ends of the chromosomes with an enzyme called telomerase. This enzyme has incorporated into its overall structure an RNA template for extension of the repetitive leading strand sequence at the end of the chromosome (figure 2.33 parts b through e). This is followed by primed synthesis of the final Okazaki fragment of the lagging strand (figure 2.33f) and a trimming of the end that removes the primer and a portion of the repetitive DNA template (figure 2.33g). The repeated sequences that this process generates at the ends of the chromosomes are called telomeres.

Chromosome banding: Through the use of appropriate staining techniques, it is possible to give each condensed mitotic chromosome a characteristic banded pattern, as shown for human chromosomes in figure 10.18 (and in negative contrast in figure 11.2). Banding makes it much easier to identify each chromosome and arrange individual chromosomes into numbered pairs. Such an arrangement is called a karyotype. A correctly labeled human karyotype is shown in figure 11.2. Figure 10.31 shows a karyotype prepared with a stain that does not produce banding (it also has the numbering of chromosomes 7-12 and 13-18 reversed). With chromosome banding, it is possible for an expert to identify a single human chromosome, or in some cases even a part of a human chromosome in a mitotic figure that otherwise contains only non-human chromosomes. As we will see in future lectures, this becomes very important when studying somatic cell hybrids that contain only a single human chromosome (pages 466-468 of our textbook).

Polytene chromosomes: Certain tissues in the larvae of Drosophila and certain other insect species contain giant polytene chromosomes that are particularly well suited for detailed examination with a light microscope. In such cases, homologous somatic chromosomes become paired and then undergo a process of repeated duplication of their DNA without strand separation or cell division. These giant polytene chromosome often contain 1000 or more parallel strands of DNA (figure 10.19) They exhibit characteristic banding patterns as well as enlarged areas called "puffs" that reflect regions with particularly high levels of transcriptional activity (figure 10.20). As noted in lecture 13, puffs are also sometimes sites of gene amplification.

Cytogenetics: Each region of each Drosophila polytene chromosome has a characteristic pattern of banding. Relativel;y small insertions and deletions are easily detected as changes in the pattern of banding of such chromosomes. Because x-ray treatment was widely used as a mutagen in early studies of the genetics of Drosophila, many of the resulting mutations were associated with small deletions or other chromosomal rearrangements that could be used to map their chromosomal locations quite precisely.

In situ hybridization: Techniques have been developed for denaturation of the DNA in chromosomes, followed by hybridization of radioactive or fluorescently labeled probes onto the chromosomal structures. Under optimal conditions, this makes it possible to identify which chromosomes carry a particular gene, as well as which part of the chromosome it is located in (figures 10.21 and 10.22). The acronym "FISH" is often used to describe fluorescent in situ hybridization, particularly when applied to human or other mammalian chromosomes.

Euchromatin and heterochromatin: During interphase (the time period when cells do not have their chromosomes condensed for division), most of the chromatin is in a highly dispersed state, called euchromatin, which has no clearly discernable structure in light microscopy (figure 10.9) and is only lightly stained in electron microscopy (figure 10.23). However, some parts of the chromatin remain more highly condensed. Highly repetitive sequences, such as those in centromeric and telomeric regions tend to remain always condensed as constituitive heterochromatin, whereas other chromosomal regions form facultative heterochromatin only under certain conditions, such as inactivation of one of the X-chromosomes in female mammals. Heterochromatin usually is not transcribed, although as illustrated in example 10.3, there are some unusual cases where certain genes are only transcribed when they are located in a heterochromatic region of a chromosome.

Yeast artificial chromosomes: One of the more convenient cloning vectors for large inserts is the yeast artificial chromosome (YAC). As a minimum, a yeast artificial chromosome must contain two telomeres, a yeast origin of replication, a yeast centromere, and a site for inserting a foreign DNA. In its usual form, the YAC is maintained as a bacterial plasmid prior to having a foreign DNA inserted into it. It therefore contains a bacterial origin of replication and a bacterial selectable marker such as ampicillin resistance. There is also typically a yeast selectable marker (for example the yeast URA3 gene, which permits the URA3 uracil auxotrophic mutant strain of yeast to grow on a medium without uracil). This makes it possible to select for yeast cels that have taken up YACs. A YAC vector is capable of carrying foreign DNA inserts up to 500 kb or more in length (some textbooks say up to 1.0 megabase).

Gene families: Clusters of closely related genes are observed quite frequently in eukaryotic genomes. Such clusters are thought to have resulted from gene duplication followed by evolutionary divergence. In some cases, this divergence has given rise to different functional forms of the gene, which are expressed in different tissues or at different stages of the life cycle. In other cases, the divergency has generated non-functional pseudogenes (identified by the Greek letter psi), which have sequences that are very similar to functional genes, but are not functional. Pseudogenes fall into two categories, processed pseudogenes, which lack introns and are sometimes followed by long stretches of adenines at the 3'-ends of their coding sequences, suggesting that they have arisen by reverse transcription of mRNA sequences, and unprocessed pseudogenes, which are similar to regular genes except that they are not expressed for various reasons. Boxed example 10.5 described a pseudogene that has a stop codon early in its open reading frame. Many others have promoter defects, such that they are not transcribed. Processed pseudogenes may totally lack promoters.

Globin gene family: The example cited in our textbook is the human globin gene family, which consists of two separate clusters (figure 10.26). The alpha globin cluster on chromosome 16 consists of three genes that are known to be transcribed (alpha 1, alpha 2, and zeta), three non-functional pseudogenes (psi-alpha 1, psi alpha 2, and psi zeta), and one whose status remains uncertain (theta). The beta globin cluster on chromosome 11 consists of 5 functional genes (beta, gamma-A, gamma-G, delta, and epsilon) plus one pseudogene (psi-beta). Human hemoglobin is always a tetramer, consisting of two proteins coded by genes from the alpha cluster and two from the beta cluster. In early embryonic development the alpha cluster is represented initially by zeta, with an early switch to the two alpha genes, which then persist into adulthood. The beta group is initially represented by epsilon, with an early switch the two gamma forms, which dominate most of the fetal period. Shortly before birth, beta begins to replace the two deltas, with gradual expression also of a small amount of delta. The relative expression of the various globin genes is illustrated in figure 10.27.

Genomic organization: Whole genome sequencing projects are beginning to provide extensive information about genomic organization on a larger scale. The textbook briefly discusses conclusions arising from sequencing the entire 13 million base pairs in the genome of the budding yeast Saccharomyces cerevisiae. There are many patterns suggesting that modern genomic sequences have arisen by duplication followed by divergent evolution, sometimes involving whole blocks of genes. In some cases duplication of a specific gene has been followed by functional divergence, including targeting the gene products to different intracellular organelles.