The National Institutes of Health (NIH) has doled out 12 awards totaling $9 million to launch its Data Commons pilot phase, which will explore the feasibility and best practices for making digital objects available through collaborative platforms.
A data commons is a shared virtual space where scientists can work with the digital objects of biomedical research, such as data and analytical tools. According to NIH officials, the Data Commons four-year pilot phase will be done on public clouds, which are virtual spaces where service providers make resources, such as applications and storage, available over the internet. The overall goal of the NIH Data Commons pilot phase is to accelerate biomedical discoveries by making biomedical research data findable, accessible, interoperable, and reusable (FAIR) for more researchers.
The recipients of the 12 awards will form the nucleus of an NIH Data Commons pilot phase consortium in which researchers will start developing the key capabilities needed to make an NIH Data Commons a reality. These key capabilities, which were identified by NIH, collectively represent the principles, policies, processes, and architectures of a data commons for biomedical research data. Key capabilities include making data transparent and interoperable, safe-guarding patient data, and getting community buy-in for data standards.
“Harvesting the wealth of information in biomedical data will advance our understanding of human health and disease,” said NIH Director Francis S. Collins, M.D., Ph.D. “However, poor data accessibility is a major barrier to translating data into understanding. The NIH Data Commons pilot phase is an important effort to remove that barrier.”
Three NIH-funded data sets will serve as test cases for the NIH Data Commons pilot phase. The test cases include data sets from the Genotype-Tissue Expression and the Trans-Omics for Precision Medicine initiatives, as well as the Alliance of Genome Resources (link is external), a consortium of model organism databases established in late 2016. These data sets were chosen based on their value to users in the biomedical research community, the diversity of the data they contain, and their coverage of both basic and clinical research. While just three datasets will be used at the outset of the project, it is envisioned the NIH Data Commons efforts will expand to include other data resources once the pilot phase has achieved its primary objectives, according to officials.