Computers are generating massive amounts of information in the modern era, but how do we make sense of it all? Uncovering the scientific knowledge hidden in the sea of data requires careful sifting, powerful tools, and an eye for patterns.
“My work emphasizes a data-driven approach to developing new scientific ideas about how complex social and biological systems work,” says Aaron Clauset, an assistant professor of computer science who analyzes terabytes of information to understand topics ranging from biological evolution to human behavior.
Some would call this work mere statistics, but he doesn’t see it that way. To him, a computer is a virtual laboratory where he uses mathematical models to develop and test theories about the patterns hidden in complex and massive data sets. The approach has led to some surprising results.
Clauset, who is a member of CU’s BioFrontiers Institute, has applied novel statistical tools and algorithms to fossil and ecological data, and developed a theory that explains with remarkable accuracy exactly why most mammals are small and only a few grow to be giants like elephants or whales. The theory is based on a trade-off between the short-term benefits and the long-term risks of species evolving to larger sizes over millions of years.
He also is a leader in the new field of computational social science, which combines theories and mathematical models of human social dynamics with big data sets and clever algorithms to uncover hidden patterns. If government or for-profit companies extract such information, it raises privacy concerns, but Clauset is interested in the data for what it says about social behavior within groups or entire populations.
Records showing the frequency and severity of terrorist attacks is another area of interest for Clauset. He began looking at the data after the U.S. invasion of Iraq in 2003, and has been studying it ever since. “While conflict studies often focus on the psychology that leads to wars and terrorism, I aim to understand the global patterns in what can be directly measured, like the number of casualties and how often these events occur,” he says.
In a paper published last November, Clauset and political scientist Kristian Skrede Gleditsch of the University of Essex, England, studied the attacks of nearly 1,000 terrorist organizations worldwide over a 40-year period, 1968 to 2008. They found that terrorist organizations behave a lot like factories, increasing their "production rate" of attacks in a mathematically predictable way, even as the deadliness of their attacks did not increase with more experience.
In a new paper to be published this year, Clauset and Ryan Woodard at the Swiss Federal Institute of Technology Zurich develop a new method for robustly estimating the historical and future probabilities of large terrorist events on the scale of 9/11. Their research will be the subject of a dedicated session at the American Statistical Association’s flagship conference in August.
Although this approach to understanding social systems is somewhat unconventional, Clauset says, identifying and understanding human social processes on a global scale offers a complementary perspective to the traditional work of social scientists, and a new way to understand what regularities may exist.
Graduate students Abigail Jacobs and Sears Merritt are also working on a variety of computational projects, including studies of ecological data and trying to ferret out the social relationships from video game player behavior.
“We’re an interdisciplinary group with broad research interests,” says Merritt, adding that he enjoys the challenge of applying computer science to understand social systems versus the technological ones. “There’s a lot more we don’t know about social systems because we can’t engineer them like software. It’s the emergent properties that are interesting.”
“There are some nice carry-overs between the projects,” Jacobs adds, “and the cross-fertilization is useful in bringing new ideas to other fields.”
Harnessing the power of big data is a 21st-century challenge that requires multidisciplinary collaboration and advancements in the technology to handle large amounts of data. Associate chair Ken Anderson is thus creating a multi-departmental Center for Big Data and Applied Science.
“I aim to unify the big data work and research expertise here at CU and make it more visible here in Colorado, and at the national and international levels,” Anderson says. “Besides Aaron’s excellent analytical work, there is work at CU on the software engineering of big data systems, the use of cloud computing for scientific workflows, the design of operating systems for cluster computing, and the like.”
The center’s mission will be the democratization of big data-producing frameworks and tools that allow more organizations to make sense of large data sets, Anderson says. The center will pursue federal research funding, partner with local industry to help them tackle their big data problems, and develop educational programs that produce graduates with knowledge and skills relevant to big data domains.