Managing and analyzing one genome’s worth of data is no trivial task, when you are working towards sequencing and analyzing 10,000 genomes and making this data broadly available then a whole new set of challenges arise. BioTeam has been working on just such a project in collaboration with Autism Speaks, The Center for Advanced Genomics at the Hospital for Sick Kids in Toronto and Google’s Genomics group in support of Autism Speaks’ MSSNG project.
MSSNG aims to make available whole-genome data derived from over 10,000 families affected by Autism. The scale of this sequencing effort combined with MSSNG’s goal to foster open science by making this data broadly available required rethinking and reengineering the annotation and data access approaches. This work has culminated in the initial release today of the MSSNG Portal. Created by BioTeam, this web application integrates with Google Genomics, Google BigQuery and other elements of the Google Cloud Platform to provide access to DNA variants and associated annotations from over 1,700 genomes with data from another 1,800 genomes to be released shortly.
Designing and implementing a genomics project on this scale, entirely within the cloud, has been a fun and very interesting experience. Now that the project has gone live we are looking forward to writing up some of the observations we’ve made while building the associated variant annotation pipeline and the main MSSNG Portal web application. In the meantime, many congratulations to Autism Speaks and the MSSNG team on the initial release and here are some links that introduce the portal and the goals behind the project.