As a major academic medical center and cancer research institution, New York-based Memorial Sloan-Kettering Cancer Center (MSKCC) must contend with steadily growing volumes of electronic data that that must be stored over the long term, both for the purpose of patient care and for research purposes. On the research front, vice president and CIO Patricia C. Skarulis says MSKCC is gearing up for the arrival of massive amounts of genomic data. As a research institution, “all of the data is extremely valuable,” she says.
According to Skarulis, “We are going to the next generation of sequencers, and they will put out a huge amount of data, which we need to save.” DNA sequence data will be processed by Memorial Sloan-Kettering in its laboratories, and will be saved at every step of the way, Skarulis says, who notes that the sequences themselves are getting faster and bigger.
David Barry, MSKCC’s manager of storage, says the processed genomic data are conservatively projected to be a terabyte a week. Patrick Carragee, Memorial Sloan-Kettering’s director of infrastructure and data center says the organization plans to store the data on disk and tape in its own data center.
The prospect of housing such large genomic data sets has resulted in some changing strategies. One, according to Barry, is a return to using tape for this type of data, which is more economical than disk-based systems for long-term accessibility. While the high-speed computational work that needs to be done on sequencing will be stored on higher speed media, the derived data that will come from the processing will be stored on archival disk or tape, he explains.
Even excluding genomic data, Skarulis says data volume at Memorial Sloan-Kettering has been growing at a healthy 30 percent a year, which she attributes to just normal business, such as adding patients to its databases.
One driver behind that growth is pathology imaging, in which digital images of a pathology slide that can go to very great depth, taking up to two gigabytes per slide, Skarulis says. Accounting for comparable growth are dermatology images, which have progressed from the use of a single single-lens reflex camera to using a bank of cameras set up in an enclosure. “Resolutions are much higher, and you are storing many more images,” Barry says. He estimates that storage requirements have quadrupled for that modality, and the images will be kept for the life of the patient.
The good news, according to Carragee, is that, with the exception of solid-state disks that would be used only for selected applications, the costs of storage media generally has been declining. That has helped storage costs, while significant, to remain fairly steady, he says. Nonetheless, he adds that this year in particular, he has seen an escalation in storage costs, driven by specialty applications such as imaging and genomic data. Skarulis agrees, saying: “We are at a tipping point, where we are growing faster than the storage is improving in capacity.”
One ongoing challenge is migrating data from old media to new media, which is time-consuming and can be disruptive to the workflow. Barry says that virtualization can help in the storage environment, by using plug-in arrays on the back end of the virtualization appliances to transfer data from one system to another, thus avoiding disruptive downtime.
For more coverage of data storage, stay tuned for the October issue of Healthcare Informatics.