Leaders at the Biocomplexity Institute of Virginia Polytechnic Institute and State University (Virginia Tech), in Blacksburg, Virginia, have been busy working on an exciting initiative: they have been developing an analytic modeling platform and simulation environment that make it possible to prepare for and prevent the spread of common infectious diseases such flu as well as rare diseases like Ebola and Zika. Among the questions being explored: how to address the challenges of tracking constantly changing data to identify patterns and stop the spread of infectious diseases; how to understand the spread of diseases by gaining an understanding of where patients live and work, and their behavioral patterns; how to determine the advisability of mass inoculations in cases of disease outbreak. One technology partner in this work has been Persistent Systems, whose U.S. headquarters are in Santa Clara, California (with global headquarters in Pune, India).
Recently, Healthcare Informatics Editor-in-Chief Mark Hagland spoke with Chris Barrett, a Virginia Tech professor and the executive director of the university’s Biocomplexity Institute, about his and his colleagues’ initiative in this area. Below are excerpts from that interview with Professor Barrett.
Professor Barrett, could you walk me through some of your work on this initiative, and explain to me what is at the core of what you and your colleagues are doing?
We’re developing a system of tools that allow for a very, very granular access to entire populations, in the analysis of infectious disease, as well as other phenomena, such as environmental impacts around air quality, ozone, etc. In terms of infectious disease, the big-data world has allowed us a lot of individual access to psychosocial data—who you are, what you do, your demographics, etc. So we can build synthetic databases that provide for realistic patterns of movement and interactions in time and space. Say we’re talking about an aerosol-borne disease. We know where people are, and who’s close to whom for how long for a variety of reasons, so that we can analyze potential exposures in a very detailed way. So we’ve developed an infectious disease diffusion model, but built from the bottom up, detailed people and places, so that we can understand what aspects of behavior, what components of activities in a population, can lead to spreading infection; and that can guide decision-making to stop or mitigate that.
And you are drilling down on the levels of data you’re able to access, correct?
Yes, we have systems involving “avatars” for all 7.5 billion individuals on the planet and a billion-and-a--half activity locations. The United States can be pretty detailed, in terms of the data we can access; China is less detailed, but even in such places it is more detailed than you might suppose. So, to use the example of the United States, it’s a very, very large country. Most infectious disease epidemics will run for a couple of hundred days, and we can run a simulation of the activities of individuals, moving around, spreading the disease, let’s say, if it’s an aerosol-borne disease, figuring out who’s going where, based on activity patterns. We can run a 200-day epidemic projection on a population the size of the entire United States, in five to nine seconds. These things run on very, very fancy, high-end computers, supercomputers, and yet the analytics are delivered by web services. So we can provide you web access via your cell phone or laptop or tablet; and we’re designing interfaces, so that you can ask a question, and the program will answer it. So if you’re Johnny’s mom and dad, you can ask, how many kids are likely to have flu today? And what’s the likelihood that he’ll get sick if he goes to school? Those kinds of apps are available to everybody. And by having people asking those questions, we can adjust the underlying data representation, to use them like human sensors, and it becomes more of a data library, than a traditional human calculation.
So the technology allows for predictive and explanatory analytics that can be delivered to researchers, public health officials, and regular people on the street, and can guide their behavior and decision-making, whether we’re talking about a public health official or pharma manufacturer, or little Johnny’s mom and dad. And we can use this technology for a variety of uses, including infectious disease.
What’s novel about this approach?
What’s different in this from traditional modeling is that we’re using really a lot of personal and social information, as well as enterprise information in businesses. And it’s never really been possible to use data at this level of granularity. We can use synthetic populations and models to give us the capacity to generate the phenomena at the population level. In the past, we’ve had aggregate-level data. But this way, we can calculate new dynamics to generate new models.
We were supporting the DoD [Department of Defense] in its deployment of intervention resources to Liberia during the Ebola epidemic. We also did a lot of work with DoD interagency partners and with other interventions. The thing is that there was never a pharmaceutical or medical intervention to that epidemic, it was fundamentally a social intervention. And we were able to understand what people were doing that was most dangerous, around the spread of the disease. There were funerary practices involved, hygiene practices at home and in medical environments; and when you’re placing tent clinics, you want to place them in a way so that you’re not causing a huge influx of people in one area, causing more infection; and yet they have to be placed in places that don’t require people to strongly change their routes. So we had to do analysis, and develop human sensing tools around human movement. The roads were often not where the maps said they were. So you have to understand the background movements of these people in the first place. And you can’t put all the emergency treatment in place at the same time, as it will alter movement and behavior patterns. So you have to repeatedly optimize analysis, in order to manage movements.
And all of these things together were instrumental in breaking down the transmission structures—between hygiene, funerary practices, patterns around food, around water, and around the placement of emergency interventions; that helped break down patterns of transmission and of social networks. That really required individualized behavioral models at the level of individual people, and pretty detailed habitat representations as well. And that’s the sort of thing that this kind of technology supports. Because it’s web-delivered, and because you can interact with it and access it anywhere in the world, and because we can develop it and focus it, this technology allowed us to intervene.
Broadly speaking, what has been the timeframe around this initiative?
Our work began during the entire Ebola exercise, during 2015; and well into 2016, after it was over, we were helping them analyze things likes reservoirs, so that they didn’t have a rebound of it. So the initial work evolved over a period of about 18 months.
Have you done any projects yet in the United States?
Yes, we’ve been involved in research around every major outbreak—H1N1, H5N1, Legionnaire’s disease, pertussis outbreaks, dozens over the past five years. There were smallpox scares and anthrax scares. Any cases.
When was this technology ready to use?
This very large-scale technology has been under development since 1989 in many labs. It started at Los Alamos Laboratory, for a variety of reasons that were related to national security. And then in the use of artificially intelligence machines in the context of combat aircraft, to make them more effective. And that led to a project that the Department of Transportation funded. In all of these situations, such as using state estimation and decision support for war-fighting systems, you’re confronted with trying to gather data on non-cooperative entities. So you ended up creating an embedded simulation that you manage, to maintain a reasonable facsimile of the situational assessment, that drives the rules structure you need. So, you need to refuel at a certain point, or focus your combat in a certain way. But it’s modeling situation, and scaling that up to large numbers of items in the 1980s and 1990s, was very challenging. So we were extremely happy with ourselves when we had scaled, say, 10,000 items. You can exceed that number of items in a war-fighting environment, much less a global pandemic.
Meanwhile, the Clean Air Act, under the EPA as the regulatory agency, and the Department of Transportation, implemented a piece of legislation called the Intermodal Surface Transportation Efficiency Act. They called it “ice tea,” because nobody could manage to say it. What this act required—when they wrote it, they thought somebody knew how to do this, but nobody did. The Clean Air Act said that if you were going to change a transportation mode, for example, by building more roads, you had to demonstrate that it wouldn’t increase air pollution. And they would define air pollution by ozone particulates, oxygen to nitrogen ratio, etc., And you and to demonstrate before building that you wouldn’t increase air pollution. And the Intermodal Surface Transportation Efficiency Act worked such that the Federal Highway Administration administered this around roadways. But there was no technical capacity to measure the potential impact.
So in terms of the details around elements like vehicular traffic, are dominated by elements like what type of automobile you have and what year and model, etc., and acceleration and deceleration, and the year your car was built, and many other elements. And all of a sudden, you look at one flow of traffic, and it’s incredibly complex. And it turns out that when you put a road in, it changes people’s movement and use patterns. So we developed what’s called activity-based modeling. This provided for every single driver, car, passenger, railroad, train, etc., represented in terms of the number of activities during the day—we worked on it between 1990 and 2002. And that was the first time that anybody had built an entire regional-scale, bottom-up model. And it was during the time period of this development that we had the sarin attacks in Tokyo, and so on. And so we ended up developing health effects modeling as well as transportation effects modeling. And more data became available. And then social media erupted, so there were all new things coming into this. There was a smallpox scare, and the anthrax scare. And there was a thing called MIDAS, an infectious disease model. We were one of the organizations with the NIGMH. Infectious diesel modeling, state-of-the-art infectious disease modeling. We were part of that for 11 years, working with the DoD.
That was a great deal going on all at once in a relatively short period of time, correct?
Yes. And, from our point of view, what was important was the scalability of the information systems that were being developed. We’ve also looked at the immune system in the human body, and have looked at immune-informatics at the cell level. And now we’re getting into medical informatics, creating personal health histories, and mapping potential interventions.
As you’re developing this, public health departments around the country could make use of this, correct?
Yes, and that’s why we’re making it web-available, for easy access and use.
It seems that there might be real application of some of this research around the gathering of data around the social determinants of health, in the context of population health management in the U.S. healthcare system, correct?
Yes. There are very few people who can map social behaviors. And you can get a lot of insight into social patterns using this kind of data, for population health.
So this could be powerful, allying information on the social determinants of health with clinical and care information?
Yes, that’s right.
Who might your customers be for that?
It will be healthcare delivery systems, and the components of those delivery systems.
What would you say about the future capabilities of this to really manage the health of populations?
First, the patients will be a good deal more empowered, because some of these applications will be in the patient or consumer realm. Second, physicians and other clinicians will be able to connect to vast ranges of information. And because this is web-delivered, and because of the fact that the data is inherently anonymized, that is helpful. And the data keeps getting better and better. So this will provide a kind of Google-style relationship for end-users. We’re not going to be making clinical decisions, of course; we’ll just be providing clinicians with information based on huge amounts of information that could be made available to them. A lot of it will be shaped by the query itself, and a lot will be shaped by the focus of the efforts involved. And it will help the entire organization, not just individual clinicians. Any number of things could be developed.
How are you interacting with the people at Persistent Systems?
We’re not directly using Persistent Systems’ solutions. They don’t have a platform that we just design this stuff into. We’re a research lab; we develop technologies. We are scientists, not clinicians. So what we do is that our research involves a lot of system-level, high-performance software, development. And as we get to the place where the research software protocol level of development is not enough to guarantee the product-ization at the level of the medical space, we hand things to Persistent, and Persistent develops this further. They’re like an extension of our software systems. The capability they bring to this is at the first stage of transitional software from what you might call bench science, to the first stage of product-ization. You take systems to where they’re hardened, they work, and they’re more reliable. But one of the things that’s really good about working with them is that they know and keep up with, changes in mobile platforms. So if we want to make absolutely certain that these interfaces are compatible with a wide range of mobile platforms, on PCs and Apple, etc., Persistent Systems has an incredible toolkit and knowledge base. And those kinds of things are incredibly helpful.
So they’re a technology partner, then.
Yes, they’re part of our team, in that way. We just have certain numbers of man-years on our team targeted towards the tasks and expertise involved. And they’re critical to making these things real. Bringing this kind of technology towards product-readiness is a huge undertaking, and requires a serious level of knowledge. They know a lot, and have a lot of people who are highly skilled in very specialized areas. It’s a remarkable relationship.
How do you see your work evolving forward in the next couple of years?
There’s no end to the work that’s possible. We’re working on lots of other things, a lot of translational work for many of our sponsors. And one thing that distinguishes our work is that we have a theory group working on the mathematics of things, focused on really improving these kinds of systems. So I see that continuing, I see the scope of what we work on expanding. I see a continuation of the commercialization of a lot of things we’ve worked on. But the basic research—we do things at the molecular level, in ecologies, in many areas. And I personally think that infectious disease, because of the small of the global population, the magnitude of disease outbreaks—I really personally think that pandemic disease will continue to be a very, very large problem going forward. There’s a lot of stuff going on here. And this work has involved a couple of hundred people. Right now, the team involves about 80 people. It’s a very large group of people working on this.