Combination of Social Media and EHR Data Enables Real-Time Flu Tracking
Combined data from social media and internet sources as well as electronic health record information can accurately track and predict flu outbreaks in real time, according to a study published in PLOS Computational Biology.
For the study, researchers at Boston Children’s Hospital used ensemble modeling which uses different sources of information and predictive analytics to predict influenza activity in the U.S.
According to the study authors, flu outbreaks cause up to 500,000 deaths a year worldwide and an estimated 3,000 to 50,000 deaths a year in the U.S. The study authors state that frequently the severity of flu outbreaks cannot be assessed in a timely manner, and systems capable of providing estimates of influenza incidence are critical to allow health officials to properly prepare for and respond to outbreaks.
The researchers leverage data from real-time hospital EHRs provided by athenahealth, Google searches, Twitter posts, Google Flu Trends and data from Flu Near You, a participatory surveillance system to predict flu symptoms for particular populations. And, the researchers evaluated the predictive ability of its ensemble approach during the 2013-2014 and 2014-2015 flu seasons.
“Our results show that our real-time ensemble predictions outperform every real-time flu predictor constructed independently with each data source. This fact suggests that combining information from multiple independent flu predictors is advantageous over simply choosing the best performing predictor. This is the case not only for real-time predictions but also for the one, two and three week forecasts presented,” the study authors wrote.
The combined model correlated almost perfectly with the CDC’s reports of annual flue activity for real time estimates, and reached a 90 percent correlation for a two-week horizon. The study’s methodology also produced predictions one week ahead of Google Flu Trends real-time estimates with comparable accuracy and
“Our findings suggest that the information from multiple data sources such as Google searches, Twitter microblogs, nearly real-time hospital visit records, and data from a participatory surveillance system, complement one another and produce the most accurate and robust set of flu predictions when combined optimally,” the study authors concluded.