Skip to main content

How social media data could help predict the next COVID-19 surge

How social media data could help predict the next COVID-19 surge

In the summer of 2021, as the third wave of the COVID-19 pandemic wore on in the United States, infectious disease forecasters began to call attention to a disturbing trend.

The previous January, as models warned that U.S. infections would continue to rise, cases plummeted instead. In July, as forecasts predicted infections would flatten, the Delta variant soared, leaving public health agencies scrambling to reinstate mask mandates and social distancing measures.

“Existing forecast models generally did not predict the big surges and peaks,” said geospatial data scientist Morteza Karimzadeh, an assistant professor of geography at CU Boulder. “They failed when we needed them most.”

New research from Karimzadeh and his colleagues suggests a new approach, using artificial intelligence and vast, anonymized datasets from Facebook could not only yield more accurate COVID-19 forecasts, but also revolutionize the way we track other infectious diseases, including the flu.

Their findings, published in the International Journal of Data Science and Analytics, conclude this short-term forecasting method significantly outperforms conventional models for projecting COVID trends at the county level.

Karimzadeh’s team is now one of about a dozen, including those from Columbia University and the Massachusetts Institute of Technology (MIT), submitting weekly projections to the COVID-19 Forecast Hub, a repository that aggregates the best data possible to create an “ensemble forecast” for the Centers for Disease Control. Their forecasts generally rank in the top two for accuracy each week.

“When it comes to forecasting at the county level, we are finding that our models perform, hands-down, better than most models out there,” Karimzadeh said.

Morteza Karimzadeh

AI has revolutionized everything, from the way we interact with our phones to the development of autonomous vehicles, but we really have not taken advantage of it all that much when it comes to disease forecasting.”
–Morteza Karimzadeh

Analyzing friendships to predict viral spread

Most COVID-forecasting techniques in use today hinge on what is known as a “compartmental model.” Simply put, modelers take the latest numbers they can get about infected and susceptible populations (based on weekly reports of infections, hospitalizations, deaths and vaccinations), plug them into a mathematical model and crunch the numbers to predict what happens next.

These methods have been used for decades with reasonable success but they have fallen short when predicting local COVID surges, in part because they can’t easily take into account how people move around.

That’s where Facebook data comes in.

Karimzadeh’s team draws from data generated by Facebook and derived from mobile devices to get a sense of how much people travel from county to county and to what degree people in different counties are friends on social media. That matters because people behave differently around friends.

“People may mask up and social distance when they go to work or shop, but they may not adhere to social distancing or masking when spending time with friends,” Karimzadeh said.

All this could influence how much, for instance, an outbreak in Denver County might spread to Boulder County. Often, counties that are not next to each other can heavily influence each other.

In a previous paper in Nature Communications, the team found that social media data was a better tool for predicting viral spread than simply monitoring people’s movement via their cell phones. With 2 billion Facebook users worldwide, there is abundant data to draw from, even in remote regions of the world where cell phone data is not available.

Notably, the data is privacy-protected, stressed Karimzadeh.

“We are not individually tracking anyone.”

The promise of AI

The model itself is also novel, in that it builds on established machine-learning techniques to improve itself in real-time, capturing shifting trends in the numbers that reflect things like new lockdowns, waning immunity or masking policies.

Over a four-week forecast horizon, the model was on average 50 cases per county more accurate than the ensemble forecast from the COViD-19 Forecast Hub.

“The model learns from past circumstances to forecast the future and it is constantly improving itself,” he said.

Thoai Ngo, vice president of social and behavioral science research for the nonprofit Population Council, which helped fund the research, said accurate forecasting is critical to engender public trust, assure that communities have enough tests and hospital beds for surges, and enable policy makers to implement things like mask mandates before it’s too late.“The world has been playing catch-up with COVID-19. We are always 10 steps behind,” Ngo said.

Ngo said that traditional models undoubtedly have their strengths, but, in the future, he’d like to see them combined with newer AI methods to reap the unique benefits of both.

He and Karimzadeh are now applying their novel forecast techniques to predicting hospitalization rates, which they say will be more useful to watch as the virus becomes endemic.

“AI has revolutionized everything, from the way we interact with our phones to the development of autonomous vehicles, but we really have not taken advantage of it all that much when it comes to disease forecasting,” said Karimzadeh. “There is a lot of untapped potential there.”

Other contributors to this research include: Benjamin Lucas, postdoctoral research associate in the Department of Geography, Behzad Vahedi, Phd student in the Department of Geography, and Hamidreza Zoraghein, research associate with the Population Council.