The All of Us precision medicine initiative has opened its Researcher Workbench for beta testing. Researchers can begin using the initial dataset and tools in studies and provide feedback on what can be improved.
The program launched national enrollment two years ago and continues to enroll new participants each week, working toward its goal of one million.
In a blog post, Josh Denny, M.D., M.S., the recently named CEO of All of Us, noted that the early version of the Researcher Workbench includes data shared by nearly 225,000 participants, 75 percent of whom are from communities that are historically underrepresented in research, and more than 45 percent of diverse races and ethnicities.
Researchers will find information from electronic health records (leveraging the OMOP Common Data Model); six initial surveys covering demographics, lifestyle factors, and overall health; and baseline physical measurements taken by program staff.
The platform uses a Jupyter Notebook environment to power in-depth analyses, with tools to help researchers set up collaborative workspaces and build customized cohorts. Researchers (or their team members) will need experience with R or Python programming languages to conduct analyses on the platform. All of Us does not yet support integrations with other statistical programs or software, but it is working to expand analysis tools for future iterations of the Researcher Workbench.
All of Us has adopted a “data passport” model to make the data broadly accessible. After researchers register with the program, agree to its rules, and complete training on the responsible conduct of research, the program will grant them permission to explore All of Us data for a wide range of studies, rather than determining access for all studies on a project-by-project basis.
The platform will grow more robust over time with additional data and tools, including genomics, wearable device data, and linkages to other data sets, with regular releases of new data, Denny said.
Denny noted that the current version has some key limitations. “Because participants take part in the program at different paces and we are still enrolling, we don’t have variables for all participants; in particular, survey completion rates vary, and the collection and harmonization of electronic health record data remain a work in progress,” he wrote. “We have done some preliminary testing on biological plausibility of the data; other curation efforts are still underway.”
Denny explained that they have blurred some of the data to protect participant privacy. “While we already strip out names and other identifiers from participant data at the outset, we’ve made additional adjustments in the curation process,” he wrote. “These include shifting dates and hiding or grouping the records of small clusters of participants to further reduce the risks of re-identification. These modifications may pose challenges for epidemiological studies or research on specific subcategories of people.
As another privacy measure, All of Us requires researchers to analyze data within the secure cloud-based All of Us platform. Researchers may never download individual-level program data on local computers.
Currently, researchers with NIH eRA Commons accounts may apply for access if their institutions have signed a data use agreement with the program. Any U.S.-based academic, nonprofit, or healthcare organization can enter into the data use agreement.
Denny said that bioinformatics and health services researchers will likely find the most value in the initial dataset, particularly for studies that evaluate the frequency of certain diseases or conditions. Researchers with a focus on health disparities and underrepresented populations will also find the current dataset useful, he added, given its size and diversity.
After this initial stage of the beta phase, they will add other means of identity verification beyond eRA Commons and open the platform more broadly.