Reading time ( words)
As a result of an ongoing collaboration with Intel and Google Cloud to accelerate genomic research, the Broad Institute of MIT and Harvard has optimized its workflows for fast, cost-effective Google N1 and N2 instances. Compared to the initial deployment of workloads on Google Cloud, the collaboration resulted in an 85% reduction in cost of data processing after optimization.
“We knew the cloud would allow a whole new level of data federation and collaboration, and we could work with others to create a cloud-based data ecosystem, where researchers could combine their workflows on more than the data they generated with other datasets into richer, more powerful computational experiments,” says Geraldine Van der Auwera, Data Sciences Platform director of outreach and communications, the Broad Institute of MIT and Harvard.
To adapt to a dramatic increase in genomics data generation and computational research demand, the Broad Institute migrated its workloads to Google Cloud N2 instances. By modularizing its pipeline workflows, right-sizing cloud instances based on the needs of the workload and optimizing for Intel® Xeon® Scalable processors, the Broad Institute users can run its genomics workflows on Google Cloud about 25% faster and at 34% lower cost by deploying on N2 instances with Xeon Scalable processors.
Intel has had a partnership with the Broad Institute since 2017, helping optimize the institute’s pipelines and Genome Analysis Toolkit (GATK) with Intel libraries, including the Intel® Genomics Kernel Library. Together, they also manage the Intel-Broad Center for Genomic Data Engineering, a project that enables researchers and software engineers worldwide to build, optimize and widely share new tools and infrastructure that will help scientists integrate and process genomic data.
Intel worked with the Broad Institute to help optimize its pipelines on Google Cloud. For example, specific kernels in the Genome Analytics Toolkit are optimized for vector operations with Intel® Advanced Vector Extensions 512 (Intel® AVX-512). Some optimized storage functions use the Intel® Intelligent Storage Acceleration Library (Intel® ISA-L).
Seeking to realize a wider life sciences ecosystem vision, the Broad Institute, Microsoft and Verily co-developed the Terra platform, a scalable and secure platform for biomedical researchers worldwide to access data, run analysis tools and collaborate. Terra is built on cloud infrastructure, allowing the Broad Institute to scale easily and empower the research community with new capabilities for the benefit of research into solutions for human disease.
Genomics has changed how biological science is performed. With the help of Intel and Google Cloud, the Broad Institute is at the forefront of innovation, enabling and helping accelerate genomics research. By migrating to the cloud and optimizing workloads for Google Cloud instances, the Broad Institute solved its storage capacity and computational capability challenges in a scalable, forward-looking way. Co-building the Terra platform further enabled the Broad Institute to empower not only its research teams, but life scientists around the world, to enable them to take advantage of these optimized tools and pipelines, and to enable a federated data ecosystem that opens many exciting new possibilities for biomedical research.