Researchers in pediatric medicine see huge opportunities for precision medicine, but because many pediatric cancers and conditions fall into the rare disease category, individual institutions struggle to accumulate enough data to constitute a “big data” landscape.
I recently interviewed Adam Resnick, Ph.D., director of the Center for Data Driven Discovery in Biomedicine at the Children’s Hospital of Philadelphia (CHOP), about the collaborative approach it is taking to this problem.
Resnick said pediatric researchers started thinking about an approach to the challenges they face in the context of what is taking shape at the National Institutes of Health (NIH).
The NIH has been thinking about big data, particularly in the context of cancer, for many years, and in 2006 began large-scale efforts to sequence adult cancers, under a project called the Cancer Genome Atlas, which is designed to generate large amounts of cancer data for public consumption, Resnick explained. This has been a successful project that provides a very large platform for the adult enterprise to explore big data on behalf of patients. Last year a Genomic Data Commons was launched at the University of Chicago. “That platform highlights the power of big data in cancer,” Resnick said. “The challenge in the pediatric and rare disease space is that there has been no real equivalent of that,” he said. “We end up faced with a decision about how to participate in these big data enterprises, where biospecimens are prevalent and data is being generated by NIH. How can we integrate into that space and also live up to that same promise?”
No single institution in the pediatric space can collect enough specimens to really drive big data discovery. In contrast to some adult cancers, where you may have sufficient samples at one institution, in the pediatric realm, that would take a very long time, he said. In addition, there are cultural constraints to collaboration that center on primacy of discovery and publication, which are the traditional routes of acquiring funding, promotion and individual career success.
“Here at CHOP we began thinking about this more critically. How do we overcome these challenges?” They decided they needed to create a national network of centralized biospecimens to drive collaboration. “We decided to begin modeling with brain tumors and created the Children’s Brain Tumor Tissue Consortium, which began with four institutions and has grown to 15.”
They realized they had to build an “open science” platform in a way that is radically transparent, so no single institution has any privilege to that information. “Anybody in the consortium can log on to the platform and see how much of a biospecimen is left, what project it is being used for, what were the decisions made about it, and the data being generated returns to the platform in a way that is accessible to all,” Resnick explained.
Another opportunity is to marshal the energy of patient advocacy groups. “That is a unique resource,” he said. “We can rapidly engage a community that is highly informed and motivated.” In addition to the researchers, the patient and family groups become key stakeholders. “We can fulfill a contract with them to maximize discovery from each biospecimen,” Resnick said. “Having them as an informed community provides the necessary tailwind to share data.”
Run out of the Children’s Hospital of Philadelphia, the Children’s Brain Tumor Tissue Consortium is partnering with the Pacific Pediatric Neuro-Oncology Consortium to share their data collections on rare pediatric disease and cancer using CAVATICA, a cloud-based biomedical data analysis platform.
One reason this collaboration can happen in the pediatric space is that it is already struggling to retain NIH funding, Resnick said. Much of the funding that supports it comes from foundations and philanthropic sources, so there is an alignment of stakeholder support for the efforts in this arena. “When you create a consortium, you can do things that are more highly structured,” he said. “In a consortium setting, you are able to implement very standardized operating procedures for the collection of biospecimens, and what format they are collected in, what phenotypic data are collected, and we are able to enable longitudinal data collection. These are things that were very challenging for the NIH to implement initially,” he added. Resnick said that because pediatrics sits outside the mainstream of big data research, it is able to be more nimble in driving new practices in ways that the existing status quo community cannot.
Still another challenge, he said, is that at a certain scale, it is difficult for many institutions to store, process or analyze that data locally. In other research settings there have been efforts to centralize data in a cloud-based environment, where people no longer have to download data but can compute in a setting like Amazon or Google.
Because there was no such solution in pediatrics, they launched CAVATICA, which they said would enable researchers to access, share and rapidly analyze data collected about diseases impacting children, including pediatric cancers, congenital disorders and rare diseases such as epilepsy and autism.
“We decided to learn from the mistakes of the past,” Resnick said. We have tended to create disease-specific portals or resources that often silo it from other disease research and data,” he said. Siloing that data creates challenges for discovery. “We purposely intersected it, so adult data and pediatric data can be analyzed together in a cloud-based environment. Even though institutions may treat patients of different ages separately, there is sufficient evidence that we should be cross-comparing diseases — brain tumor vs. kidney tumor — but also across ages, adult vs. pediatric tumor.”
The consortium is grappling with data standards. “The ontology defining phenotypic data is one of the biggest issues,” he said. Although genomic and molecular data are challenging, phenotypic data is much more challenging. “EHR data is collected and stored in infrastructure that support billing, not discovery,” Resnick noted. To empower research, you have to change how you collect the data to help answer the right questions. This takes effort by stakeholders and by disease experts to determine what data should be collected, and in what format. “We had to coordinate that effort in the consortium There are automated practices for data abstraction from EHRs. But in our experience, these still don’t meet the standards for quality control, so there is still significant activity that has to take place at the human level to assure data quality is aligned with what you want to ask.”