By Aaron Krol
February 16, 2016 | This morning, Seven Bridges Genomics announced the opening of its Cancer Genomics Cloud, an online platform for accessing and analyzing public data from The Cancer Genome Atlas (TCGA). The release comes fifteen months after Seven Bridges was chosen by the National Cancer Institute (NCI) as one of three developers of these pilot clouds, which are intended to make the vast tumor sequencing data in TCGA easier for scientists to access, without downloading reams of raw data or writing their own scripts to search through the Atlas.
“The TCGA project has lasted over a decade,” says Andrew Gruen, Product Marketing Director at Seven Bridges. “It’s a massive investment, with 11,000 patients and 33 different tumor types and subtypes. Making it usable, and efficiently usable, has been a high priority [for the NCI].”
Seven Bridges is a creator of cloud environments for managing genomic data at an industrial scale, primarily for pharma companies and large research collaborations. Its customers include Genomics England, a state-owned company working to sequence 100,000 whole human genomes within the UK’s National Health Service. Nevertheless, the scale of TCGA, which contains over a petabyte of data, made building the Cancer Genomics Cloud a major endeavor.
Brandi Davis-Dusenbery, Senior Scientist at Seven Bridges, led the project, collaborating regularly with representatives from the NCI. The platform can be accessed at www.cancergenomicscloud.org.
The Cancer Genomics Cloud is similar to Seven Bridges’ standard commercial product, in which popular open source informatics tools―including BWA, the GATK suite, and the Tuxedo suite―can be arranged into reproducible pipelines for custom analysis of sequencing reads. Users can also place their own tools in these pipelines and transfer them across projects.
“Someone who’s used the Seven Bridges Platform in the past will find it very familiar, but there are a number of cancer-specific tools, especially on the visualization side,” says James Sietstra, President of Seven Bridges. His company has previously unveiled specialized tools for cancer analysis that take into account known oncogenic variants, tumor-normal comparisons, and pharmacogenetics.
Seven Bridges also worked with the NCI to create a metadata schema in which genomic data from TCGA is labeled with detailed clinical information, such as the type of tumor that was sequenced, and the patient’s demographics and treatment history. With this system, users of the Cancer Genomics Cloud can search the entire TCGA database for tumors that meet highly specific criteria, something that has been difficult and time-consuming for researchers working directly with TCGA.
“We’ve allowed you to create really detailed queries visually,” says Gruen. “You can very easily define the cohort of folks you’re looking for in your experiment.”
The NCI had other requirements for its partners on the Cancer Genomics Cloud pilots, including that all TCGA data be co-located with the analysis platforms, to minimize bottlenecks associated with transferring data. Seven Bridges, whose commercial platform can be deployed in either the Amazon or Google clouds, worked with Amazon Web Services to meet these requirements within a virtual private cloud. Users can also upload their own data to analyze alongside TCGA data.
While the Seven Bridges Cancer Genomics Cloud is the first pilot to go live, the Broad Institute of MIT and Harvard and the Institute for Systems Biology in Seattle are actively working on their own solutions for expanding access to TCGA data. All three organizations have worked together on containers to transfer custom informatics tools across systems, so that users will be able to recreate their pipelines inside each platform.
The NCI has earmarked $1 million to pay for the use of Amazon Web Services compute time inside the Cancer Genomics Cloud, a portion of which can be claimed by any scientist with plans to conduct studies using TCGA data. Additional credits will be made available to groups who upload their own data or analysis tools, with no responsibility to make those resources public. “Any researcher with an Internet connection, including the small academic lab that potentially can’t afford to store TCGA on their local cluster, can access public data at the click of a mouse through the Cancer Genomics Cloud,” says Sietstra.
At the same time the Cancer Genomics Cloud is going live, Seven Bridges has also announced the completion of a Series A funding round totaling $45 million, led by Kryssen Capital.
For a biotechnology firm, Seven Bridges has waited a remarkably long time to bring in private funding. The company was founded by CEO Deniz Kural in 2009, and now has 200 employees across four offices in Cambridge, Mass.; San Francisco, Calif.; London; and Belgrade, Serbia. Until now, Seven Bridges has relied for its operating expenses on revenue from its platform, and occasional grants from government partners―such as a $6 million commitment from the NCI for the Cancer Genomics Cloud pilot.
“We took the long view with Seven Bridges,” says Sietstra. “Before taking on institutional capital, we really wanted to build the company into something long-term and stable, and that takes a while.”
Sietstra says the new funding will go toward hiring additional engineers, and ongoing development of the core Seven Bridges Platform.