At the University of California Irvine, a Big-Data Revolution

Change is in the air at the 350-bed University of California Irvine Medical Center (UCI Medical Center) in Irvine, California. There, Charles Boicey, information solutions architect at UCI Medical Center and the University of California Irvine School of Medicine, is helping to lead a broad data strategy that is applying big data analytics strategies to clinical operations and care delivery, and in the process, is leveraging open-source Hadoop technology to create for complete clinical data searchability and availability.

The data initiative is focusing on reducing avoidable readmissions, speeding new research projects, and tracking patient vital statistics in real time. Among other elements in the data initiative, Boicey and his colleagues are using Hadoop technology (specifically the Hortonworks Data Platform) to access more than 20 years of legacy data, covering 1.2 million patients, and more than 9 million records. In that context, one of the major sub-initiatives has been a project to predict the likelihood of hospital readmission within 30 days of discharge, for patients with congestive heart failure (CHF). Working with a medical device integration partner, the hospital has developed a program that sends CHF patients home with a scale to weigh themselves daily and that automatically and wirelessly transmits that weight data to Hadoop, where an algorithm determines which weight changes indicate risk of readmission, notifying clinicians about those cases.

Boicey, who has been in his position for four years, spoke recently with HCI Editor-in-Chief Mark Hagland regarding the work that he and his colleagues have been doing at UCIM, and his perspectives on the current initiative. Below are excerpts from that interview.

Tell me about your organization’s big data strategy?

Let me articulate it in the context of the CCD [continuity of care document]. Back in 2010, I had a hypothesis that I could store CCD documents by the hundreds of thousands and make them available for clinicians to do simple queries against. I was looking at NoSQL technologies, and would store the documents in Mongo DB (for database)—that’s the name of a NoSQL database solution. And what we were able to do was to ingest these CCD documents in their native form. Usually, you break things up inside a database. But we built a database on top of it so the physician could type in, for example, “My patients who haven’t had an a1c within six months.” The clinician can’t go up to the EMR and scan the EMR for analytics; so this created that. And then on the research side, this allows a researcher to say, “45-50-year-old male with prostatectomy,” with identifiers removed. We were able to do that successfully in 2011. And we presented at the Health Care Data Warehouse Association; we presented it to their meeting in the summer of 2011.

Charles Boicey

So then I looked at other NoSQL environments, including Twitter. The Twitter environment is a lot like a laboratory information system. And your LinkedIn profile has sections and subsections not unlike a radiology or pathology report. And then I looked at Facebook, which shares the same underlying architecture—you do all your postings within a month or a year, and Facebook stores them temporally, as in, for a year; so you can retrieve them temporally. And I found out that Apache Hadoop is the underlying technology for all of this. So I went to Yahoo, which is where all the Apache Hadoop architecture originated back in 2006; it all came out of Yahoo. And in reaching out to them, I wanted to understand scalability; and I learned that Yahoo has over 60,000 servers, with over 160 pedabytes of data; that was six to seven months ago.

And I started all this work in January 2012. Yahoo created this architecture and put it out in open source. Some have commercialized it; but you can go to Apache and get it in its complete open-source form. So I was pretty happy with that. So I had to find a use case for it, to get the UCI to fund it. UCI has actually been on an EMR since 1988—the TDS system.

But the old legacy EMR was in view-only form. So I knew I could print that data to text and then ingest it into this Hadoop environment, which reads and stores the data in its native form. So I ingested 9 million records of 1.2 million patients, from over 22-year period of time. So that is now searchable and viewable. And now, whatever information was available in the legacy system is now viewable within the current EMR. The key to using this Hadoop architecture is that it allows for complete viewability and searchability of data, while also allowing an organization to retain its legacy information in its entirety. The reality is that the complete backloading or migration of one system to another doesn’t usually work.

So when did this go live to be viewable and searchable?

June 2013.

Are you the very first organization to ever do this?

Yes, I’ve been talking about this for over a year or so; but yes, we were the first to go live. I’m one of the first to work within this environment. There are a couple of commercial vendors to work within the Apache Hadoop world; one is Explorys from the Cleveland Clinic; the other is Apixio.

How many times has the legacy system been viewed or searched since June?

Any patient newer than October 2009 would not be involved. We’re an Allscripts Sunrise client. So newer patients would not have anything within the TDS environment.

What has physicians’ reaction been to this innovation?

For those physicians with a research need, it’s been great. But the real benefit of this has been not so much on the clinical care side, but on the research side, making the data available for our readmission algorithms.

And also, I have all of the HL7 messages coming from all the source systems—laboratory, radiology, molecular pathology, transcribed results, any system that feeds the EMR, anything that passes through the interface, is now in this environment. So basically, all the EMR source systems are involved. All of the physiological monitoring data from the hospital is there, so all the monitoring information, in one-minute intervals, and ventilator and device data, are part of this. All of the devices in the hospital that put out a signal, we bring into this environment. And all of the data that’s generated within the EMR goes into this environment. All of the home monitoring information also goes into this environment; and we have several projects going on in terms of home monitoring.

We all know that this is where we’re headed. But our primary providers don’t want to be overwhelmed or inundated by results early in the morning. So the monitoring of that data happens within this environment, and then we’ll message the EMR and the clinician if something is out of whack. If a CHF patient had undergone a sudden change, the system will send a message to the physician. All social media related to UCI Medical Center goes into this environment as well. Apache Hadoop can really store everything.

What progress has been made in readmissions as a result of this?

So, our way of going about readmissions work is a little bit different from most. We did our research and took information about what others are doing around readmits. But we also had 25 years of data on patients who had actually been readmitted. So we combined older and newer information; so our readmit algorithm is very UCI-centric. And not in production yet; we’re validating it with the other four UCs, before we go live with it. We’re hoping for a go-live probably in April 2014.

Will the other UC hospitals participate as well?

Possibly.

What lessons have you learned so far in this initiative?

One of the things I’ve really learned about this is that it’s really important who you pick for that Hadoop distribution. You need a good partner, to understand how this stuff is priced, so that you don’t get the organization in trouble. So I used Hortonworks; that’s a spinoff from folks who developed this at Yahoo. They haven’t made this proprietary. I use their distribution when I need it, but I’m not encumbered by a large license fee. They’re 100-percent open-source. Secondary to that, the skill set around this in the U.S. is somewhat limited. There are a lot of folks who say they know how to work within the Hadoop environment, but few really do. So I’m partnering with CMC Americas—it’s an Indian company with whom we’re co-developing software. On the US side, I put it out in an open-source format, and they in turn take this technology to India and put it out there. They’re part of the Tata Group, which is an philanthropic group. The platform they’re using is called Saritor. Over time, and soon, physicians will be able to monitor patients in real time, using this technology.