Revised September 7, 2000
Lecture date: Friday, September 8, 2000
This lecture combines parts of 1999 lectures 5 and 6.
The section on A and P sites was rearranged for clarity without any
change in content on September 11.
Please see class notices and updates page for details.
Correction: Under energy consumed during protein synthesis, fourth
bulleted item, the parenthetical statement in the second sentence should
read (ATP-->AMP and 2 GTP -->2 GDP). The produce of GTP hycrolysis in both
cases is GDP and not GMP. (Corrected September 18, 2000)
Lecture 5, MCDB 2150 Fall 2000
Genetic Code, Ribosomes, tRNA, Translation, Protein Structure, Prions
Textbook Assignment: Chapter 4, Pages 89-123.
Important concepts
- Genetic code
- Nucleotide triplet code (codons)
- Code is read in sequence without punctuation.
- Redundant code, 61 codons for 20 amino acids
- Wobble hypothesis allows third codon mismatches
- Start codon is usually AUG, but can be GUG or UUG
- The stop codons are UAA, UAG, UGA
- Exceptions to the "standard" code
- Overlapping codes
- Ribosomal subunits
- Transfer RNA (tRNA)
- Prokaryotic translation
- Formation of aminoacyl-tRNAs
- Initiation complex
- 70S ribosome complex
- A and P sites
- Elongation
- Termination
- Eukaryotic translation
- Energy consumption during translation
- Post-translational modification of proteins
- Prions
This is the last lecture in a series of four reviewing basic concepts of
molecular biology and the central dogma that are covered in
MCDB 1150.
Overview of translation
- Coded information: Translation is a polymerization
process in which amino acids are joined together by means of peptide
bonds to form proteins. The amino acid sequence of each protein
is determined by the sequence of ribonucleotides in messenger
RNA (mRNA), which in most cases has previously been transcribed
from DNA. The amino acid sequence is specified by a nucleotide
triplet code in the mRNA.
- Matching codons to amino acids:
The mRNA code is read by anticodons on transfer
RNA (tRNA) molecules. Specific types of tRNA molecules are charged
with specific amino acids by animoacyl tRNA synthetase molecules,
which are the only "bilingual" component of the translation
process. If a tRNA is charged with the wrong amino acid, that
amino acid will be inserted into the protein in place of the amino
acid that should have been on the tRNA. The joining of tRNAs to
their respective amino acids is an energy-requiring process driven
by hydrolysis of ATP, with an aminoacyl-AMP intermediate.
- Assembly of peptide chains:
Translation occurs on ribosomes, which are assembled from subunits
each time a new translation event is initiated. Hydrolysis of
GTP is required for the binding of each charged aminoacyl-tRNA to
the ribosome, and hydrolysis of a second GTP is required for a
translocation process that prepares the ribosome to accept the
next aminoacyl-tRNA. Termination of peptide chain growth occurs
at specific stop codons in the message and also requires hydrolysis
of GTP.
Amino acids and proteins: This lecture deals
with the translation of coded information contained in a linear RNA
molecule into a linear sequence of amino acids in a protein molecule. This
requires converting a message that is written with only four different
characters, A, C, G, and U, into a sequence with 20 possible alternative
amino acids at each position. The protein amino acids all have amino groups
attached to their alpha carbon atoms (the ones next to their carboxyl groups).
The alpha carbon also carries an attached side chain, ranging from a single
hydrogen in glycine to a complex double ring structure in tryptophan
(see figure 4.1 for details). You do not need to memorize the exact structures
of the protein amino acids, but you should be familiar with the names of
all 20 and have a general understanding of their properties, including the
ways in which the chemical properties of their side groups influence the
properties of the proteins that contain them (positive and negative
charges, polar and non-polar properties, presence of hydroxyl groups
capable of phosphorylation, presence of -SH groups that can form disulfide
crosslinks, etc.).
Gene-protein colinearilty: One of the earliest lines of
experimental evidence supporting the concept that genetic information
was in a linear array corresponding to the amino acid sequence
of a protein was provided by studies on the A subunit of tryptophan
synthetase from E. coli in the laboratory of Charles Yanofsky.
These studies verified that the relative map position of each
mutation that was analyzed corresponded accurately
to the relative position within
the protein of the resulting amino acid substitution (figures 4.5 and 4.6 ).
The genetic code: Our textbook provides a fairly extended
discussion of the history of how the genetic code was deciphered
on pages 103-106. For the current lecture, we will only deal with those
aspects of the code that are summarized in outline form below
(see pages 91 - 95 of the textbook for additional details).
- The code is read in units of three nucleotides, known as codons.
- The nucleotide triplet code allows 20 different amino acids
to be specified by appropriate combinations of four nucleotides
read three at a time.
- There is redundancy of the code with 61 of 64 possible codons
used to code for 20 amino acids.
- The number of codons per amino acid varies from one to six.
- The wobble hypothesis describes allowable mismatches in the
third nucleotide of some codon/anticodon pairs, such that less
than 61 different tRNAs can read all 61 codons. There are a number of
examples where the same amino acid is specified with any of the
four possible nucleotides in the third position of the codon.
- Codons are read in sequence with no punctuation.
- Mutations that add or subtract one nucleotide in a coding sequence
cause all of the RNA sequence downstream from the change to be read
in a different reading frame, resulting in a totally different amino acid
sequence. Such mutations are called frameshift mutations (see figure 5.2d
on page 127 for an example).
- There is a strict linear relationship between the position of the codon
1n the mRNA and the position of the amino acid in the protein.
- The 5'-end of the coding sequence corresponds to the amino-terminus
of the protein that is coded.
- The start codon is normally AUG, coding for methionine (or
N-formylmethionine in prokaryotic systems), but in rare cases,
GUG and UUG are used as start codons in the "standard" code.
(As explained later in this lecture, when this happens, the alternative
codons are misread by the initiator tRNA, such that the first amino
acid in the protein is still methionine).
- Many proteins lose the initial methionine
soon after they are synthesized.
- The stop codons are UAA, UAG, UGA.
- Overlapping codes for two different peptides are sometimes
seen in viral DNAs.
Code tables: Table 4.4 identifies the "standard"
coding functions of all
64 possible codons. A larger and easier-to-read version of that table
can be found inside the front cover of the textbook. It is accompanied
by a table showing the codons for all 20 mino acids. Notice that the number
of codons for individual amino acids varies from one to six.
One letter code for amino acids: The tables of codons also
present the one letter codes for protein amino acids. You will need to become
familiar with these abbreviations. For the most part, they are the first
letters of the names of the amino acids. However, in cases where several
amino acids start with the same letter, they are sometimes the nearest
available letter of the alphabet (K for lysine). In other cases they
attempt to mimic the sound of the name (F for phenylalanine,
N for asparagine, R for arginine, Y for tyrosine,
D for aspartic ("asparDic") acid).
The use of W for tryptophan invokes images of Elmer Fudd ("tWyptophan").
The use of E for glutamic acid probably reflects its chemical similarity
to aspartic acid (D). The use of Q for glutamine is said to be related
to its sound, but that one is a bit of a stretch of the imagination.
Compressed representation of code: The format reproduced below is
probably the most compact way to represent the code (in this case the DNA
code). You will encounter it on some web sites, including the one
referenced below for alternative codes. This is the standard code. * = stop
codon. The symbol "M" in the starts row suggests that synthesis begins
with methionine (carried on the initiator tRNA) even when a different
start codon is used. (Note: if the vertical columns of type do not line up
typewriter sytle on your screen, forget about this table and use the one
in the book. Not all web browsers support the HTML "pre" command).
AAs = FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
Starts = ---M---------------M---------------M----------------------------
Base1 = TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
Base2 = TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
Base3 = TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
Alternative codes: Minor variations from the
"standard" code (usually differing by just
a few codons) have been found in the mitochondrial genomes of many species,
including vertebrates, and also
in the primary genomes of some unicellular organisms such as
mycoplasma and ciliated protozoa. The most frequent change is use
of UGA (one of the stop codons in the "standard" code)
as a second codon for tryptophan, but there are also numerous
other differences. For a compilation of alternative
codes,
click here.
DNA and RNA codes:
You will need to be able to move back and forth freely
between the DNA code in the sense strands of genes and the RNA code in
messenger RNA molecules. Thus, you need to be equally comfortable with
ATG or AUG as the start codon and
TTA or UUA as a codon specifying leucine. Also, you need to remember when
to use T and when to use U. For example, if you were asked to write the
DNA code for the protein in figure 4.7, you would need to replace each U in
the code with T.
Full length mRNA sequence: In addition to nicely demonstrating
how a coding sequence is organized, figure 4.7 also provides an example of
a full-length mRNA sequence that includes the 5'-cap, the 5'-untranslated
sequence, the coding sequence from start codon to stop codon, the
3'-untranslated sequence (including the AAUAAA polyadenylation signal), and
the poly (A) tail.
Ribosomes: Ribosomes are complex molecular aggregates,
composed of large and small subunits, which assemble together only
for the purpose of protein synthesis and separate again as soon as
the process is completed. The ribosomes of
E. coli consist of a small subunit containing one 16S
RNA and 21 different proteins plus a large subunit containing
two RNAs (23S and 5S) plus 31 different proteins (figure 4.8).
Mammalian ribosomes consist of a small subunit containing
an 18S RNA and 30-35 proteins, plus a large subunit with 28S, 5.8S, and
5S RNAs and 45-50 different proteins (figure 4.9). The abbreviation "S"
refers to Svedberg units, a measurement of the rate of sedimentation during
high speed centrifugation, which reflects the relative sizes on the RNA
molecules, but not in a strictly linear relationship. As shown in figure 4.10,
ribosomes attach to messenger RNA and synthesize proteins as they progress
from one end of the coding sequence to the other. Many ribosomes can be
seen attached to a single actively translated mRNA, forming a complex
known as a polysome.
Major steps in translation:
Details of translation and the major molecular species
involved are summarized below in outline form, arranged in approximately
the same sequence as in the textbook (see figures 4.8, 4.14, 4.24,
4.25 and 4.26 for a visual summary of the main steps).
Except where specified otherwise, the descriptions are for E. coli,
which is representative of typical prokaryotic cells. Eukaryotic
translation is similar in principle, but differs in many of its
details.
Linking amino acids to the appropriate tRNAs
- Transfer RNA (tRNA) molecules are about 75 nucleotides in
length and fold by internal base pairing into a typical "cloverleaf"
configuration, which in itself is a flattened distortion of the
actual shape (see figures 4.11 and 4.12).
- There are about 50 different molecular species of tRNA in
E. coli, able to pair selectively with 61 different codons.
- tRNAs contain numerous modified ribonucleotide bases, which
have been altered post-transcriptionally (see figures 3.26 and 4.11).
- Each tRNA has an anticodon consisting of three nucleotides,
located at the end of its middle loop.
The anticodon forms base pairs with one or more codons for a particular
amino acid in the mRNA (figures 4.16 and 4.18).
- The wobble hypothesis explains how some tRNA anticodons can
pair with more than one codon (see table 4.2 and figure 4.17).
- Each aminoacyl-tRNA synthetase (tRNA charging enzyme) recognizes
a specific amino acid and also specific properties of the
tRNA molecules for that amino acid. The recognition sites on the tRNAs
appear to consist of more than just
their anticodons.
- Hydrolysis of ATP results in formation of an intermediate
aminoacyl-5'-AMP complex that is carried on the aminoacyl-tRNA
synthetase enzyme molecule (fig. 4.14)
- All tRNAs have a ...CCA-OH sequence at their 3'-ends.
In some cases, the CCA sequence is added after transcription
has been completed (figure 3.27c)
- An ester bond is formed between the carboxyl group of the amino acid and
a free 2'- or 3'- hydroxyl group at the 3' end of the tRNA (figure 4.13).
Formation of the prokaryotic initiation complex. (Figure 4.22)
- The small ribosomal subunit (30S) consists of a 16S rRNA plus
21 proteins.
- Initiation factor 3 (IF-3) binds to the 30S subunit.
- The mRNA initially attaches to the 30S ribosome plus IF-3.
IF-1 is also involved, but its exact role is not as clear.
- N-formylmethionine (fMet) binds to a special iniator tRNA,
whose recognition site is AUG (the normal methionine codon).
- Charged fMet-tRNA binds to initiation factor 2 (IF-2) and
GTP (note that our textbook says only that energy from GTP
hydrolysis is needed for initiation and does not sepcify in
figure 4.22 where GTP is bound)
- The fMet-tRNA/IF-2/GTP complex and the 30S/IF-3/mRNA complexes
join together to form the initiation complex. The fMet-tRNA recognizes
an AUG initiation codon on the mRNA, located just downstream from
a second recognition site known as the Shine-Delgarno sequence
(consensus sequence AGGAGG), which is believed to be complementary
to the 3'-end of the 16S ribosomal RNA. IF-3 is released.
- In cases where a different start codon is used, the initiator tRNA
mispairs with it, such that the first amino acid is still methionine.
Formation of the 70S ribosome complex (Figure 4.25d)
- The large ribosomal subunit (50S) consists of 21S and 5S RNAs
plus at least 31 proteins. The 50S subunit joins onto the
initiation complex to form a 70S ribosome complex. This step is
driven by hydrolysis of the GTP attached to the initiation complex.
IF-1 and IF-2 also dissociate at this step. (Note that 50S + 30S combine
to generate 70S. The designation
"S" refers to a Svedberg unit, which is a measure of
rate of sedimentation in a centrifugal field. Although S values are
related to the overall weights of sedimenting particles, they are not linear
measures of weight.)
A and P sites, recruitment of aminoacyl-tRNAs, elongation (Figure 4.25)
- The 70S ribosome complex contains two sites for attachment
of tRNAs and the amino acids (or peptide chains) that they carry. The
P site is initially occupied by the fMet-tRNA, and becomes the
attachment site for the growing peptide chain. The A site is the
initial attachment site for tRNAs that bring new amino acids to
the growing peptide chain.
- Elongation factor EF-Tu forms a complex with GTP.
- fMet-tRNA initially occupies the P site.
- A new charged tRNA attaches to the A site, guided by the codon-anticodon
match between the mRNA and the tRNA, with the attachment driven
by hydrolysis of the GTP attached to EF-Tu when the correct match
is found.
- GDP is displaced from EF-Tu by elongation factor Ts (EF-Ts). GTP
then replaces EF-Ts, regenerating the EF-Tu/GTP complex.
- fMet (or the growing peptide chain in subsequent cycles) is
transferred from the 3' end of the tRNA in the P site to the amino
group of the amino acid in the A site, forming a new peptide bond.
The enzymatic activity that catalyzes this step is called peptidyl
transferase.
- The empty tRNA is released from the P site.
- The tRNA with the growing peptide chain is then transferred
from the A site to the P site. This is catalyzed by elongation
factor G (EF-G) and driven by hydrolysis of another GTP. In the
process, the position of the mRNA is shifted by one codon (3 bases),
such that the codon for the next amino acid is now in the A site.
- The charged tRNA bearing the next amino acid is recruited
and the cycle is repeated.
Termination of translation
- New amino acids continue to be recruited and added to the
growing peptide chain until a termination codon (UAG, UAA, UGA)
enters the A position. Termination requires the participation
of at least three release factors, RF-1, RF-2, and RF-3. RF-1
recognizes stop codons UAA and UAG, and RF-2 recognizes
UGA. This leaves the finished peptide still attached at its carboxyl
end to the tRNA in the P position. The completed protein is then
released, catalyzed by RF-3 and driven by the hydrolysis of yet
another GTP. Note that our textbook does not mention RF-3 or the
need for expenditure of energy to terminate translation.)
- IF-3 then binds the 30S subunit, causing dissociation of the
70S ribosome into 30S and 50S subunits. Thus, complete 70S ribosomes
are transient structures that are formed at the initiation of
translation and dissociated as soon as translation ends. .
Eukaryotic translation
- Eukaryotic translation is similar in principle to prokaryotic, but
varies in many of its details. Some of the more important differences are
summarized briefly below.
- In eukaryotic cells, the initiation complex is formed with an
initiator tRNAthat carries methionine (MET), rather than n-formylmethionine.
- The eukaryotic initiation complex forms initially at the mRNA
5'-cap, and then scans along the mRNA to find the right start sequence.
- Eukaryotic translation starts at an AUG codon that must be embedded
in a larger consensus initiation sequence(figure 4.24 and boxed example
4.5) to be recognized. Sometimes the first AUG in the mRNA is skipped
over in favor on a later one with the right consensus sequence.
- The large ribosomal subunit is not added to the rest of the complex
until the MET-initiator RNA and the small subunit have arrived at the
AUG start codon.
- Eukaryotic termination is done with a single termination factor
for all codons.
Energy consumed during protein synthesis.
- Initiation of a peptide chain requires hydrolysis of ATP during
formation of fMET-tRNA as well as hydrolysis of GTP to GDP
during formation of the
initiation complex. Termination requires another hydrolysis of GTP.
However, the major expenditure of cellular
energy during protein synthesis is for elongation of the growing peptide
chain, as described below.
- Formation of each aminoacyl-tRNA complex requires hydrolysis of
ATP to AMP plus pyrophosphate. Subsequent hydrolysis of pyrophosphate
helps "pull" the reaction forward by mass action.
- For each amino acid added to an elongating peptide chain,
two GTPs are hydrolyzed to GDP, one during the attachement of
aminoacyl-tRNA to the A site, and one during the translocation
step.
- The "cost" to the cell of adding one amino acid to a
growing polypeptide chain is four phosphate bonds,
which directly describes the amount of work that the cell must
do to regenerate one ATP from AMP and two GTPs from GDPs. In terms
of the actual amount of energy put into the synthetic process,
three bonds are directly hydrolyzed (ATP --> AMP and 2 GTP
--> 2 GDP). However, because the ATP-driven reaction is also
pulled forward by hydrolysis of pyrophosphate that is derived
from the ATP, it is probably more correct to say that four high
energy bonds are hydrolyzed in order to add one amino acid to
a growing peptide chain. A previous text used in this course
a few years ago estimated that 90% of the total energy production
of an E. coli cell goes into protein synthesis.
Protein structure: Section 4.9 on protein structure and function should
be read as additional bakcground information that is essentially
a review of material from MCDB 1150. Four levels of structural
information are commonly recognized.
- Primary structure
refers to the genetically determined amino acid sequence of the
protein.
- Secondary structure refers to regularly repeated
configurations such as the alpha helix or beta-pleated sheet structures
that proteins can form.
- Tertiary structure refers to the
overall folded
three-dimensional configuration of the protein, which
is stabilized by disulfide bonds, by polar interactions with water,
and by hydrophobic interactions within the protein molecule itself.
- Quaternary structure refers to interactions among peptide
chains to generate oligomers consisting of more than one subunit.
Interaction with so-called chaperone proteins are sometimes required
for a protein to achieve its final properly folded configuration.
Post-translational modification: Proteins are subject to
a variety of post-translational modifications, including frequent
removal of N-terminal methionine, removal of other N- or C- terminal
sequences, removal of internal sequences, removal of signal or
targeting sequences, modification of specific amino acids (such
as conversion of proline to hydroxyproline), phosphorylation of
hydroxyl groups, addition of carbohydrate side chains (glycosylation),
complexing with metals or other prosthetic groups, and a long
list of other possibilities that are not discussed particularly
well in the textbook. Some of these modifications are illustrated
in a section on enzymes and enzyme activity at the end of the chapter.
Prions: A boxed section at the end of chapter 13 of Klug and Cummings,
Concepts of Genetics, 5th edition (the previous text for this course, available
at the Norlin reserve desk) discusses
an unusual pathogenic unit called a prion (proteinaceous infective
agent). Although the prion theory remains controversial, a very
large amount of evidence has accumulated showing that prion proteins
are coded by the host, and subsequently modified to function as
pathogens. The modified proteins accumulate in aggregates that
cause degenerative diseases of the brain. The best available evidence
seems to indicate that a conformational modification of the normal
host protein gives it pathogenic properties plus the ability to
catalyze similar modification of additional normal proteins, such
that the pathology is infectious. The prion disease that has received
the most attention recently
is the mad cow disease , which apparently
got its start when proteins derived from sheep infected with a
similar disease, scrapie, were used in cattle feed.
Ordinary sterilization techniques do not inactivate the prion
infectivity. Similar diseases are known in humans, including Kuru
and Creutzfeld-Jacob disease. There is also
considerable evidence suggesting that some "atypical" cases of
Creutzfeld-Jacob disease may have been caused by animal to human
transmission of mad cow disease through meat from infected animals.
Stanley Prusiner was awarded the Nobel Prize in 1998 for his
extensive study of prions (which included coining the term "prion").
New material ahead! This lecture marks the end of the brief
"review" of material from MCDB 1150. Starting with the next lecture
we will begin to examine "new" material in greater detail than
has been possible in these "review" lectures.