Rob KNIGHT
Advances in high-throughput sequencing and in computational techniques allow us to address large-scale questions about evolution that have never before been accessible. Our research combines computational and experimental techniques to ask questions about the evolution of the composition of biomolecules, genomes, and communities.
RNA composition: An experimental technique called SELEX, or in vitro selection, allows functional RNA molecules to be isolated from large pools of random RNA sequences. Typically, these pools are designed to have equal compositions of the four nucleotides. However, it is unclear whether this is the best region of the space of possible compositions to search for functional RNA molecules.
We are comparing RNA molecules isolated from SELEX to biological RNA molecules to test whether there are general rules that govern the nucleotide composition of specific RNA structural features. Several researchers, including Erik Schultes and Donald Forsdyke, have shown that biological RNAs are specifically biased towards purines. We are testing whether functional molecules of defined overall composition differ statistically from random molecules of the same composition, and whether there are rules that govern how many of the A's, C's, G's, and U's in a random sequence end up in different structural categories such as stems, loops, bulges and junctions [1]. We expect that these rules will help us improve our RNA secondary structure prediction software, BayesFold [2]. BayesFold uses the information contained in an alignment of sequences that share the same function, and therefore presumably share the same structure, to provide highly accurate secondary structure predictions for alignments of short RNA sequences. We also expect that we will find general rules that influence the assembly of particular RNA architectures.
We are also testing whether the information contained in minimal functional RNA motifs is sufficient, as well as necessary, for function. SELEX experiments typically isolate short, degenerate sequences that are necessary for function from many different random-sequence backgrounds. Continuing work in Michael Yarus's lab has shown that the minimal motif that performs a particular task, such as binding or catalysis, can be found by "squeezing" the random region into shorter and shorter lengths. If these sequences and their specific secondary structure configuration are sufficient for activity, we should be able to obtain functional sequences by embedding them in longer, random sequences. We are currently determining whether this is the case, or whether additional identity elements are needed. Because we can accurately predict how many random sequences are required to obtain a specified sequence and secondary structure motif [3,4,5], this work is crucial for estimating the information required to perform different catalytic or binding functions.
Collaborators on this project include Michael Yarus, Hans De Sterck, Meredith Betterton, and Manuel Lladser.
Genome composition: We are exploring genome composition at two levels: nucleotide composition and gene content. Differences in nucleotide composition in different genes and species are interesting because they provide fundamental insights into mutational processes, and because composition can be a guide to horizontal gene transfer. Differences in gene content are interesting because they can help us understand the selective pressures that different organisms experience, and can help identify novel enzymes that fill in "pathway holes" in completely-sequenced genomes.
Different genomes vary widely in GC content, a process that is driven by
different patterns of mutation in different species. Variation in GC content is sufficient to explain
most of the differences between codon and amino acid usage in different species [6]. We are currently exploring
measures based on GC content and codon usage as methods for detecting
horizontal gene transfer. We are also developing new methods for detecting horizontal gene transfer
based on the nucleotide substitution rate matrix, which summarizes the pattern of mutation in each sequence.
We are applying these methods to study the molecular evolution of type III secretion in
Collaborators on this project include Noboru Sueoka, Meredith Betterton, Corrella Detweiler, Natalie Ahn, Katheryn Resing and Jeffrey Gordon. [7-8]
Community composition: We are developing new methods to test factors that make environments more or less similar in terms of the phylogenetic diversity of the organisms they contain. For example, for hot springs in Yellowstone, the driving factors might be temperature, pH, hydrogen sulfide, or any of a number of other physical and chemical factors.
We recently developed UniFrac [9], a clustering metric that uses a phylogenetic tree to measure the biological distance between each pair of environments represented in the tree. We can then use clustering methods, such as hierarchical clustering, and ordination methods, such as PCA, to identify environments that are more similar or different, and to correlate these differences with physical and biological properties of the environment. Recently, we found that microbial diversity in the mouse gut is primarily inherited by parent-offspring contact, but that the relative abundance of different taxa depends on the host genotype [10]. UniFrac allows all the information in a phylogeny to be brought to bear on the clustering problem, allowing new insights into the factors that govern community assembly. We expect UniFrac to have a wide impact in a range of environmental and medical applications.
Collaborators on this project include Norman Pace, Jeffrey Gordon, Frederick Bushman, Scott Kelley and Noah Fierer.
[1] Smit, S., Yarus, M, and Knight, R (2006). "Natural selection is not required to explain universal compositional patterns in rRNA structural categories." RNA 12:1-14.
[2] Knight, R., Birmingham, A. E., and Yarus, M. (2004). "Bayesfold: Rational secondary folds that combine thermodynamic, covariation and chemical data for aligned RNA sequences". RNA 10(9):1323-36.
[3] Knight, R. and Yarus, M. (2003). "Finding specific RNA motifs: Function in a zeptomole world?" RNA 9:218-230.
[4] Knight, R., De Sterck, H., Markel, R., Smit, S., Oshmyansky, A., and Yarus, M. (2005) "Abundance of correctly folded RNA motifs in sequence space, calculated on computational grids". Nucleic Acids Research 33:5924-35.
[5] Legiewicz, M., Lozupone, C., Knight, R. and Yarus, M. (2005). "Size and constant sequences alter selection". RNA 11:1701-9.
[6] Knight, R. D., Landweber, L. F., and M. Yarus (2001). "How mitochondria redefine the code." J. Mol. Evol. 53:299-313.
[7] Resing, K.A., Meyer-Arendt, K.E., Alex M. Mendoza, A.M., Aveline-Wolf, L.D., Jonscher, K.R., Pierce, K.G., Old, W.M., Cheung, H.T., Russell, S., Wattawa, J.L., Goehle, G.R., Knight, R.D., and Ahn, N.G. (2004). "Improving reproducibility and sensitivity in identifying human proteins by shotgun proteomics." Anal Chem. 76(13):3556-68
[8] Ruth, M.C., Old, W.M., Emrick, M.A., Meyer-Arendt, K., Aveline-Wold, L.D., Pierce, K.G., Mendoza, A.M., Sevinsky, J.R., Hamady, M., Knight, R.D., Resing, K.A., and Ahn, N.G. "Analysis of Membrane Proteins from Human Chronic Myelogenous Leukemia Cells: Comparison of Extraction Methods for Multidimensional LC-MS/MS". Journal of Proteome Research 5:709-19.
[9] Lozupone, C.A. and Knight, R. (2005). "Unifrac: A New Phylogenetic Method For Comparing Microbial Communities." Appl Envrionm Microbiol 71:8228-35.
[10] Ley, R.E., Backhed, F., Turnbaugh, P., Lozupone, C.A., Knight, R.D., and Gordon, J.I. (2005) "Obesity alters gut microbial ecology." Proceedings of the National Academy of Sciences. 102:11070-11075.
home | undergraduate | graduate | research | people | facilities | news & events | courses