The 1000 Genomes Project, the largest set of data on human genetic variation, has been made publicly available on the Amazon Web Services (AWS) cloud, according to the National Institutes of Health. The project is a consortium of 75 companies and organizations to establish the most detailed catalogue of human genetic variation.
"The explosion of biomedical data has already significantly advanced our understanding of health and disease. Now we want to find new and better ways to make the most of these data to speed discovery, innovation and improvements in the nation’s health and economy," NIH Director Francis S. Collins, M.D., Ph.D, said in a statement.
Thus far, the project has grown to 200 terabytes of genomic data including DNA sequenced from more than 1,700 individuals. The project aims to include the genomes of more than 2,600 individuals from 26 populations around the world, and the NIH will continue to add the remaining genome samples to the public data set this year.
“This process took a long time, and that’s assuming a lab had the bandwidth to download the data and sufficient storage and compute infrastructure to hold and analyze the data once they had it,” said Lisa D. Brooks, Ph.D., program director for the Genetic Variation Program, National Human Genome Research Institute, a part of NIH, said in a statement. “We are happy that the 1000 Genomes Project data are on AWS to give researchers anywhere in the world a simple way to access the data so they can put the data to work in their research.”