Revised October 12, 1999. This lecture is based on parts of 1998 lectures 2 and 33 plus a substantial amount of new material.

Lecture 19, MCDB 2150, Fall 1999

Eukaryotic Genomic Organization, Chromosomes

Textbook Assignment: Chapter 10, Pages 295-319. There is an error in the numbering of chromosomes in Figure 10.31. Those labeled 13-18 should be 7-12 and those labeled 7-12 should be 13-18. Also, in the legend, this figure provides data for problems 8 and 10, not 8 and 11.

Major concepts:

Sizes of eukaryotic genomes: Eukaryotic genomes are consistently larger than those of prokaryotic cells, but within the eukaryotes there is massive variability in genome size. The genome of the budding yeast Saccharomyces cerevisiae has been completely sequenced and is composed of about 13 million nucleotide pairs (as compared to 4.6 million for E. coli (the value for E. coli is based on table 7.5 -- the value on page 295 appears to be a pre-sequencing estimate). The human genome and that of most mammals is about 3 x 109 base pairs. Some amphibians and some plants have genomes as large as 7.5 x 1010 base pairs (some 25 times as large as the human genome).

C-value paradox: The total amount of DNA contained in a haploid genome is sometimes referred to as the C value for the species. The extremely large amount of DNA in the genomes of complex eukaryotes, which far exceeds the amount that appears to be needed for protein coding, is sometimes referred to as the C-value paradox. In humans, for example, only about 5% of the genomic DNA appears to consist of protein-coding and other functional genes (such as ribosomal and transfer RNAs).

Repetitive and unique sequence DNA: Some of the excess DNA is present as highly repetitive sequences of various types, including a number of so-called transposable elements, which have (or apparently once had) the ability to move to different locations in the genome, in some cases duplicating themselves in the process. These transposable elements will be studied in detail in MCDB 3500. If you are curious about them, Chapter 22 of our textbook, which will not be covered in this course, provides an overview of the various types and how they move to new locations. Many of the repetitive DNAs have no known function and are often referred to as "junk DNA" or as "selfish DNA" that seems to have no purpose other than its own reproduction. The Alu element, which is unique to the human genome and has no known function, is present in about 500,000 copies, dispersed rather uniformly throughout the entire genome. (For molecular biologists, it often "functions" as an identifying marker for human DNA). However, there are some repetitive sequences with apparent functions, such as those that make up the centromeres and telomeres of the chromosomes, as well as genes coding for ribosomal and transfer RNAs. It should also be pointed out that a large fraction of the unique sequence DNA (sequences occurring only once per haploid genome) has no known genetic function.

DNA renaturation experiments: The textbook describes DNA renaturation experiments that permit the relative amounts of highly repetitive, moderately repetitive, and unique sequence DNA to be evaluated based on rates of renaturation. The basic principle is that after denaturation, there are more copies of highly repetitive sequences, such that they find their partners and anneal rapidly, whereas unique sequences require much longer to anneal and moderately repetitive sequences require an intermediate time. It is important to remember that we are talking about relative numbers of copies, because any sample of reasonable size contains many copies of all of the sequences. To provide some degree of standardization that allows for different sized DNA fragments, and different total concentrations of DNA, the renaturation experiments are described in terms of total initial concentration of nucleotides (not DNA molecules) multiplied by the amount of time that the renaturation requires (referred to as C0t values). The usual unit of measurement is the C0t values for renaturation of 1/2 of a particular frequency class of DNA. From such measurements, it becomes possible to calculate the complexity (the total number of nucleotides in non-repeated sequences) of a DNA preparation (figure 10.6). It will not be necessary to learn details of these procedures for this course.

Chromatin: Eukaryotic DNA normally associates closely with a variety of proteins to form a complex called chromatin. A group of strongly basic called histones bind very tightly to the acidic DNA, forming nucleosomes, as described below. In addition, chromatin contains a wide variety of non-histone chromatin proteins (NCHP), including all of the enzymes and regulatory proteins needed for selectively controlled transcription, as described in previous lectures.

Nucleosomes: Two copies each of four different types of histones (H2A, H2B, H3, and H4) form the core of a beadlike structure with 140 - 180 nucleotide pairs of DNA wrapped around it, known as a nucleosome (figure 10.8). Nucleosomes occur relatively close together on a long DNA molecule, typically separated by about 50 base pairs of linker DNA. DNA that is optimally packed into nucleosomes forms a fiber about 10 nm in diameter. That fiber in turn is coiled into a helix to form a 30 nm fiber. Higher orders of packing also occur, particularly during the condensation of chromatin into dense chromosomes that are visible with the light microscope during cell division (figure 10.8). Additional "linker" histones are believed to be involved in higher order packing, and may also play a role in transcriptional control. Nucleosome structure is temporarily disrupted during transcription, but not totally lost except in very intensely transcribed genes, such as those coding for ribosomal RNA. The promoter must be free of nucleosomes, however, in order for the transcriptional initiation complex to form. There also must be some degree of dissociation of nucleosomes during DNA replication, but experimental data have shown that the histones do not totally separate from the DNA during replicaiton. .

Chromosomal structure: The text presents a rather detailed discussion of chromosomal structure as it is observed at the light and electron microscope levels. Most of this should be a review of material presented in introductory biology. Although we will not spend much time on this material in lecture, it is important to be sure that you understand it. In brief summary, each chromosome consists of one very long continuous DNA double helix, complexed with a variety of histone and non-histone proteins For those who might wish to pursue this issue further, there is a web page that presents the exact size of each of the human chromosomes in megabases. The data from this web page yield a total length of all human chromosomes of 3227 million base pairs for a haploid genome with an X chromosome and 3122 million with a Y chromosome, slightly larger than the 3000 million estimate used in our textbook, but well within the range that various investigators are estimating. If the numbers on the web site are valid, human chromosome 1 contains 263 million base pairs, corresponding to an extended DNA double helix 8.9 cm in length, and even the smallest human chromosome with 50 million base pairs contains about 1.7 cm of linear DNA double helix.

Packaging of eukaryotic DNA: The huge amount of DNA in eukaryotic cells makes very compact packaging necessary. Multiple higher orders of coiling and folding are needed to fit all of the DNA in a typical eukaryotic cell into the interphase nucleus, with even further condensation needed to form visible mitotic or meiotic chromosomes. If we accept values of 3 x 109 base pairs and 3.4 Å per base pair, the fully extended double helical DNA in a single copy of the human genome is just over one meter in length (even longer if the values from the web page are correct). Because human cells are diploid, each cell thus contains just over two meters of DNA in a nucleus that is on the order of 10 µm in diameter. Figure 10.7 shows the huge amount of DNA that spills out from a typical mammalian chromosome when the histone proteins that normally keep the DNA compacted are removed.

Centromeres: Each eukaryotic chromosome has a dense constricted area that serves as a point of attachment for spindle fibers during cell division. Centromeres contain repeated DNA sequences, but those sequences vary substantially from one species to another. Human centromeres contain a tandemly repeated sequence of about 170 bp called an alphoid sequence. A small portion of that sequence, called a CENP-box is conserved among other mammalian species. Prior to the development of modern banding techniques (figure 10.8), the position of the centromere was one of very few identifiable morphological characteristics that could be used to identify specific chromosomes (figure 10.31). Four classes of chromosomes are generally recognized with respect to centromere position (figure 10.15).

Kinetochores: Kinetochores are protein bodies that attach to the outside surfaces of the centromeres of condensed mitotic chromosomes. They are involved in the binding of spindle fibers to the chromosomes during mitosis, as described in the next lecture (figure 11.4). One should not confuse centromeres and kinetochores. Centromeres are integral parts of the chromosome, organized around specific repetitive DNA sequences, whereas kinetochores are attached protein bodies.

Chromatids: During the time period between replication of chromosomal DNA (which immediately adds proteins to form new chromatin) and the separation of daughter chromosomes during cell division (mitosis), there are two side-by-side copies of each chromosome, still attached to a single centromere. The duplicated chromosome is still considered to be one chromosome until the centromere divides and the two daughter chromosomes move toward opposite spindle poles during mitosis. Until this happens, each of the duplicated, but still attached chromosomal structures is called a chromatid, and the attached pair are referred to as sister chromatids (figure 10.14). Because chromosomes are usually pictured as paired sister chromatids, based on their appearance during mitotic metaphase (figure 10.31), it is important to remember that prior to DNA replication and at all times in non-multiplying cells, each chromosome consists of a single chromatid.. Because the single chromatids decondense rapidly after completion of cell division (see anaphase in figure 11.4) and duplication of chromatids occurs prior to the beginning of condensation for the next round of cell division, chromosomes are seldom seen, photogaphed, or depicted as single chromatids.

Telomeres: The DNA in eukaryotic chromosomes is linear, rather than circular. As discussed briefly in chapter 2, this creates special problems at the ends of the chromosome related to priming and primer removal in the lagging strand during DNA replication. These problems have been overcome by sealing the ends of the chromosomes with a repeated sequence generated by an enzyme knows as telomerase, which has incorporated in it an RNA template for synthesis of the repetitive leading strand (figure 2.23). The repeated sequences at the ends of the chromosomes are called telomeres.

Chromosome banding: Through the use of appropriate staining techniques, it is possible to give each condensed mitotic chromosome a characteristic banded pattern, as shown for human chromosomes in figure 10.18 (and in negative contrast in figure 11.2). Banding makes it much easier to identify each chromosome and arrange individual chromosomes into numbered pairs. Such an arrangement is called a karyotype. A correctly labeled human karyotype is shown in figure 11.2. Figure 10.31 shows a karyotype prepared with a stain that does not produce banding (it also has the numbering of chromosomes 7-12 and 13-18 reversed). With chromosome banding, it is possible for an expert to identify a single human chromosome, or in some cases even a part of a human chromosome in a mitotic figure that otherwise contains only non-human chromosomes. As we will see in future lectures, this becomes very important when studying somatic cell hybrids that contain only a single human chromosome (pages 466-468 of our textbook).

Polytene chromosomes: Certain tissues in the larvae of Drosophila and certain other insect species contain giant polytene chromosomes that are particularly well suited for detailed examination with a light microscope. In such cases, homologous somatic chromosomes become paired and then undergo a process of repeated duplication of their DNA without strand separation or cell division. These giant polytene chromosome often contain 1000 or more parallel strands of DNA (figure 10.19) They exhibit characteristic banding patterns as well as enlarged areas called "puffs" that reflect regions with particularly high levels of transcriptional activity (figure 10.20). Changes in the pattern of banding allow relatively small insertions or deletions of chromosomal material to be detected quite readily, as we will see in a future lecture.

In situ hybridization: Techniques have been developed for denaturation of the DNA in chromosomes, followed by hybridization of radioactive or fluorescently labeled probes onto the chromosomal structures. Under optimal conditions, this makes it possible to identify which chromosomes carry a particular sequence, as well as what part of the chromosome it is located in (figures 10.21 and 10.22). The acronym "FISH" is often used to describe fluorescent in situ hybridization, particularly when applied to human or other mammalian chromosomes.

Euchromatin and heterochromatin: During interphase (the time period when cells do not have their chromosomes condensed for division), most of the chromatin is in a highly dispersed state, called euchromatin, which has no clearly discernable structure in light microscopy (figure 10.9) and is only lightly stained in electron microscopy (figure 10.23). However, some parts of the chromatin remain more highly condensed. Highly repetitive sequences, such as those in centromeric and telomeric regions tend to remain always condensed as constituitive heterochromatin, whereas other chromosomal regions form facultative heterochromatin only under certain conditions, such as inactivation of one of the X-chromosomes in female mammals. Heterochromatin usually is not transcribed, although as illustrated in example 10.3, there are some unusual cases where certain genes are only transcribed when they are located in a heterochromatic region of a chromosome.

Yeast artificial chromosomes: One of the more convenient cloning vectors for large inserts is the yeast artificial chromosome (YAC). As a minimum, a yeast artificial chromosome must contain two telomeres, a yeast origin of replication, a yeast centromere, and a site for inserting a foreign DNA. In its usual form, the YAC is maintained as a bacterial plasmid prior to having a foreign DNA inserted into it. It therefore contains a bacterial origin of replication and a bacterial selectable marker such as ampicillin resistance. There is also typically a yeast selectable marker (for example the yeast URA3 gene, which permits the URA3 uracil auxotrophic mutant strain of yeast to grow on a medium without uracil). This makes it possible to select for yeast cels that have taken up YACs. A YAC vector is capable of carrying foreign DNA inserts up to 500 kb or more in length (some textbooks say up to 1.0 megabase).

Gene families: Clusters of closely related genes are observed quite frequently in eukaryotic genomes. Such clusters are thought to have resulted from gene duplication followed by evolutionary divergence. In some cases, this divergence has given rise to different functional forms of the gene, which are expressed in different tissues or at different stages of the life cycle. In other cases, the divergency has generated non-functional pseudogenes (identified by the Greek letter psi), which have sequences that are very similar to functional genes, but are not functional. Pseudogenes fall into two categories, processed pseudogenes, which lack introns and are sometimes followed by long stretches of adenines at the 3'-ends of their coding sequences, suggesting that they have arisen by reverse transcription of mRNA sequences, and unprocessed pseudogenes, which are similar to regular genes except that they are not expressed for various reasons. Boxed example 10.5 described a pseudogene that has a stop codon early in its open reading frame. Many others have promoter defects, such that they are not transcribed. Processed pseudogenes may totally lack promoters.

Globin gene family: The example cited in our textbook is the human globin gene family, which consists of two separate clusters (figure 10.26). The alpha globin cluster on chromosome 16 consists of three genes that are known to be transcribed (alpha 1, alpha 2, and zeta), three non-functional pseudogenes (psi-alpha 1, psi alpha 2, and psi zeta), and one whose status remains uncertain (theta). The beta globin cluster on chromosome 11 consists of 5 functional genes (beta, gamma-A, gamma-G, delta, and epsilon) plus one pseudogene (psi-beta). Human hemoglobin is always a tetramer, consisting of two proteins coded by genes from the alpha cluster and two from the beta cluster. In early embryonic development the alpha cluster is represented initially by zeta, with an early switch to the two alpha genes, which then persist into adulthood. The beta group is initially represented by epsilon, with an early switch the two gamma forms, which dominate most of the fetal period. Shortly before birth, beta begins to replace the two deltas, with gradual expression also of a small amount of delta. The relative expression of the various globin genes is illustrated in figure 10.27.

Genomic organization: Whole genome sequencing projects are beginning to provide extensive information about genomic organization on a larger scale. The textbook briefly discusses conclusions arising from sequencing the entire 13 million base pairs in the genome of the budding yeast Saccharomyces cerevisiae. There are many patterns suggesting that modern genomic sequences have arisen by duplication followed by divergent evolution, sometimes involving whole blocks of genes. In some cases duplication of a specific gene has been followed by functional divergence, including targeting the gene products to different intracellular organelles.