Managing the Data Explosion

As healthcare becomes more and more of a data-driven endeavor, it is opening up vast frontiers that promise to improve patient outcomes and extend research into whole new areas. That’s an exciting prospect in terms of patient care; yet underlying those potential gains is a nuts-and-bolts issue that cannot be ignored: how to manage and store all of this data, which is growing at a significant rate for provider organizations across the board.

That growth is coming from a variety of sources: organic growth from patient enrollment and the transition to electronic records, as well as technology advancements, particularly in imaging, where storage requirements are driven by both the number and the density of images. Add to this the advent of “Big Data,” which is driving major capacity expansions at large medical centers, and provider organizations of all kinds will have their work cut of them for the foreseeable future.

Fortunately, the cost of storage media is coming down, which will help organizations keep pace with their expanding requirements. The availability of the cloud is another option that is getting serious attention, while storage technology advancements are providing better tools for provider organizations to manage data in their own data centers. How are they meeting their growing data storage requirements? A variety of leading organizations offered insights into their strategies.

The Push to Accommodate ‘Big Data’

This month marks one year since the University of Pittsburgh Medical Center (UPMC) health system launched its enterprise analytics initiative, a five-year plan that it says will foster personalized medicine. Part of that plan is to build an enterprise data warehouse for the 20-plus hospital system that will bring together various types of data that so far have been difficult to integrate and analyze.

UPMC's Chris Carmody

Chris Carmody, UPMC’s vice president of enterprise infrastructure services, says the overall volume growth of data storage requirements in the healthcare sector is doubling every 18 months. UPMC currently stores five petabytes of enterprise data, a figure he expects to grow to 20 petabytes by the year 2016. That growth encompasses all types of data, from structured data in the electronic medical record (EMR) to unstructured data and imaging data, he says.

“As a technologist supporting that environment, my focus is on the end-user—the doctor, the nurse, the researcher. That’s what we are preparing for and planning, to enable them as we have these new sets of applications and insights from our enterprise analytics program in our environment,” Carmody says.

Children's Hospital of Pittsburgh, part of the 20-hospital UPMC

system. Photo: UPMC

To support its data analytics initiative, UPMC is building an enterprise data warehouse that will store data from many sources, including the EMR, laboratory systems and radiology systems. “We will pull that data in, and apply algorithms and analytics programs over that data to provide insights into what is happening with a specific patient or what’s happening with an entire population,” he says. The initiative will bring together data from sources that have never before been in one place, he says. The cloud is also part of UPMC’s strategy to meet its requirements, Carmody says, noting that there is organizational support for moving to the hybrid cloud model, which today UPMC uses only minimally.

A similarly ambitious data initiative is taking place at Memorial Sloan-Kettering Cancer Center (MSKCC) in New York, where vice president and CIO Pat Skarulis says the hospital is gearing up for the arrival of genomic data. “We are going to the next generation of sequencers, and they will put out a huge amount of data, which we need to save,” she says.

DNA sequence data will be processed by Memorial Sloan-Kettering in its laboratories, and will be saved at every step of the way, Skarulis says, who notes that the sequences themselves are getting faster and bigger. According to David Barry, MSKCC’s manager of storage, the processed genomic data are conservatively projected to be a terabyte a week.

Patrick Carragee, Memorial Sloan-Kettering’s director of infrastructure and data center, says the organization plans to store the data on tape in its own data center.

The prospect of housing such large genomic data sets has resulted in some changing strategies. One, according to Barry, is a return to using tape for this type of data, which is more economical than disk-based systems for long-term accessibility. While the high-speed computational work that needs to be done on sequencing will be stored on higher speed media, the derived data that will come from the processing will be stored on archival disk or tape, he explains.

Coping With Steady Organic Growth

Even excluding genomic data, Skarulis says data volume at Memorial Sloan-Kettering has been growing at a healthy 30 percent a year, which she attributes to just normal business, such as adding patients to its databases. As a research institution, “all of the data is extremely valuable,” she says.

One driver behind that growth is pathology imaging, in which digital images of a pathology slide that can go to very great depth, taking up to two gigabytes per slide, Skarulis says. Accounting for comparable growth are dermatology images, which have progressed from the use of a single single-lens reflex camera to using a bank of cameras set up in an enclosure. “Resolutions are much higher, and you are storing many more images,” Barry says. He estimates that storage requirements have quadrupled for that modality, and the images will be kept for the life of the patient.

The good news, according to Carragee, is that, with the exception of solid-state disks that would be used only for selected applications, the costs of storage media generally has been declining. That has helped storage costs, while significant, to remain fairly steady, he says. Nonetheless, he adds that this year in particular, he has seen an escalation in storage costs, driven by specialty applications such as imaging and genomic data. Skarulis agrees, saying: “We are at a tipping point, where we are growing faster than the storage is improving in capacity.”

One ongoing challenge is migrating data from old media to new media, which is time-consuming and can be disruptive to the workflow. Barry says that virtualization can help in the storage environment, by using plug-in arrays on the back end of the virtualization appliances to transfer data from one system to another, thus avoiding disruptive downtime.

Using Multiple Strategies to Keep Pace

Another major hospital system that is managing very large volumes of data is Intermountain Healthcare, a 22-hospital healthcare system based in Salt Lake City, and with care sites across the state of Utah. Don Franklin, Intermountain’s assistant vice president of infrastructure and operations, estimates the data volume now at 4.7 petabytes. He notes that the volume of data will grow at about 25 to 30 percent each year for the foreseeable future, and estimates that the health system will be responsible for 15 petabytes in another five years.

Franklin is optimistic that Intermountain will be able to meet those challenges, citing declining costs of some storage disks and technology innovations. While Intermountain has explored the possibility of using the cloud, it has not moved in that direction regarding storage, he says. With that said, however, the health system has embraced other technologies to help it manage its data storage effectively.

Data volumes are growing at an estimated 25 to 30 percent a

year at Intermountain Healthcare. Photo: Intermountain Healthcare

Among them, Franklin says tiering has enabled Intermountain to make data available at the appropriate speeds. Tiering is currently done manually, in terms of looking at the characteristics of the data and storing it appropriately at the beginning. The health system is exploring the use of auto-tiering, which would automatically stores data on the appropriate media according to its availability needs.

Intermountain has used storage virtualization technology for the last several years, which Franklin says allows movement of data without downtime. Because the virtualization engine abstracts the server from the specific type of storage, it also eliminates concern about specific drivers, he adds. The result has been “a lot of efficiency and uptime; and if it serves the right need at the lowest appropriate cost, then that works for you,” he says.

Intermountain Healthcare's Don Franklin

In addition, Franklin says that de-duplication technology has proved to be a valuable way to save storage space, by providing the intelligence within the storage subsystem to detect duplicate data, and storing it only once. He says de-duplication has saved Intermountain about 40 percent of its storage space.

Franklin says Intermountain’s IT department monitors its data use closely, and is aware of the characteristics of its users and applications. This has allowed it to apply spin provisioning technology that provisions a smaller amount of data in the storage subsystem than what is requested by a user. “It’s a bet, and the intelligence of the subsystem protects us. We’ve been doing this for a while and have never had an issue,” he says.

Streamlining Data Storage

Samaritan Medical Center, Watertown, N.Y., a 294-bed community hospital, has seen its volume of data to grow significantly in the last few years. That, in turn, has prompted the hospital to streamline the ways it manages and backs up its data from a business continuity perspective.

Jeff Woods, the hospital’s technical services manager, says the volume has grown from about 12 terabytes seven years ago to roughly 120 terabytes today, and that the volume of data has spiked during the last two years.

Woods notes that Samaritan uses a virtualized environment, so its clinicians can stay mobile, moving from computer to computer, having the desktop “follow” them throughout the facility. “We use solid-state disks to support that, because solid state disks are very fast,” Woods says. “We have 2 terabytes of data on just solid-state disks for our virtual desktops,” he says.

The large growth in the volume of data from various applications has led Samaritan to consolidate its backup operations. Since 2009 the hospital had been using the Integrated Serverless Backup system (supplied by Bridgehead Software) for its Meditech EHR. The backup system had been run alongside three other backup systems for non-Meditech applications, including enterprise services, Windows file servers and picture archiving and communication system (PACS) data.

In June, the hospital consolidated its four disparate backup systems into a single Healthcare Data Management platform (also supplied by Bridgehead). According to Woods, before the consolidation, the task of monitoring and managing four different systems, which previously took two technicians three hours a day, can now be handled in 15 or 20 minutes a day.

Austin Radiological Association (ARA), a privately owned radiology group with more than 85 physicians, has a different sort of challenge. Based in Austin, Texas, it operates 15 imaging centers in central Texas, and provides professional reading services to 19 area hospitals. In addition, the group hosts medical images for outside clients, orthopedic practices and other area hospitals. It operates two data centers to store the images of its images as well as its clients.

R. Todd Thomas, the physician group’s CIO, says that currently its data centers store about a half petabyte of imaging data, but the bigger challenge is managing the storage needs of its own radiologists as well as the group’s outside clients, which can be can be difficult. For example, it has just added a client that specializes in mammogram studies—big in terms of storage volume—and is in talks with two other clients that interested in signing on as PACS clients. “Bringing in clients that we don’t anticipate, going into our fiscal year, adds to the surprise growth for us,” he says.

To help it adjust to that uncertainty, it uses a clustered storage system (Isilon, supplied by EMC Corp.), which allows the group to expand storage on an as-needed basis without having to buy large monolithic arrays to handle its storage needs, Thomas says. In addition, because its IT staff is relatively small, it has decided to consolidate around a single vendor rather than trying to maintain a heterogeneous storage environment, he says.

Separately, ARA plans to upgrade its PACS (supplied by Fujifilm Medical Systems U.S.A.) to allow the group to delete images that no longer have to be stored in the system. “It’s our hope that at some point we will get to a steady state, where the amount of cases I’m ingesting is equal to the amount of cases I’m legally allowed to delete,” Thomas says. “That will help us rein in the ever-expanding storage cluster we have for medical images.”