David Haussler (PhDCompSci’82) is well known for his work with the Human Genome Project – he and his team posted the first publically available human genome sequence on the Internet in 2000.
And he says that interest in the genome started right here at CU Boulder, where he had come to do his doctoral work with Distinguished Professor Emeritus Andrzej Ehrenfeucht.
Haussler, who currently serves as a faculty member and director of the Genomics Institute at the University of California Santa Cruz, was elected to the National Academy of Engineering (NAE) earlier this year. He was recognized “for developments in computational learning theory and bioinformatics, including first assembly of the human genome, its analysis and data sharing.”
“Andrzej had a seminar every week, where we would discuss wide-ranging topics,” Haussler said. “There’s no limit to his creativity. … When Andrzej would pause and look up, you knew there was some great new idea coming.”
In one of those sessions, he and his classmates – many of whom also went on to study at the intersection of biology, math and computer science – got interested in problems of interpreting DNA sequences. They wrote a number of papers together on theoretical topics related to reading signals in DNA, including some early work in machine learning and neural networks.
After earning his PhD, Haussler explored various applications of machine learning, but eventually came back to how it could be used to sequence and understand genes.
“Genes are kind of like words or sentences written in DNA,” he said. “Being able to read and understand them is a worthy challenge.”
To understand the language of genes, Haussler pioneered the use of hidden Markov models (HMMs), stochastic context-free grammars and discriminative kernel methods in molecular biology. That work helped to crack the code of how genes were written and put the pieces of the genome together, which he said was a very big moment in his career.
A year after he was invited to join the Human Genome Project in 1999, Haussler’s lab, driven by graduate student Jim Kent, published the first computational assembly of the human genome on the Internet, and later developed the UCSC Genome Browser, which is used extensively in biomedical research.
Today, his team is working on new experimental questions. They have a paper coming out soon on genes specific to the human genome that seem to have made a difference in making human brains larger but are also occasionally associated with autism and schizophrenia.
“We’ve been trying to read more carefully and understand more deeply this message that has been passed down to us from our ancestors over the eons,” he said. “In a struggle for survival for billions of years, the DNA messages carried from parent to offspring have contained the secret of success. I’m happy to spend all my time reading the wisdom in that script.”
Haussler, who believes strongly that more data sharing will aid genetic research, said he sees election to the NAE as a way to continue addressing the problem of “greed and fear” that prevents the sharing of information about people’s DNA.
“Being elected to the academy helps me shine a light on this problem,” he said. “I hope through the academy I can increase awareness of the importance of data sharing and start to change the culture of biomedicine to encourage it.”