The Seven Bridges CGC: Boon for Bioinformatics, Success for a Startup
By Joe Stanganelli
July 20, 2016 | In May 2014, Lu Zhang, director of R&D cancer informatics at Seven Bridges, spoke at the Biology of Genomes conference at Cold Spring Harbor. Seven Bridges was introducing its open source cancer informatics pipeline: open source algorithms running on the proprietary Seven Bridges cloud infrastructure. Less than six months later, the company was awarded one of three National Cancer Institute Cloud Pilot contracts.
Earlier this year, the company launched its Cancer Genomics Cloud (CGC). The platform is populated with The Cancer Genome Atlas (TCGA) data, to which more than 11,000 patients have contributed their own medical data related to 33 different types and subtypes of cancer.
And at the end of June—just over two years after Zhang introduced the company’s cancer informatics pipeline—MIT’s Technology Review recognized Seven Bridges in its 2016 list of the 50 smartest companies in the world. Tech Review ranked Seven Bridges #42 on its list (and included the company in the general "Computing & Communications" category; indeed, Andrew Gruen, the startup's Director of Marketing, told Bio-IT World, "We refer to ourselves as Seven Bridges, not Seven Bridges Genomics"—though there’s been no corporate name change). The magazine specifically pointed to the company's CGC efforts, noting that its "bioinformatics software platform runs one of the world’s largest genomic data sets."
If businesses, like men, are judged by the company they keep, Seven Bridges has chosen well. Seven Bridges was the only commercial entity to be selected for the NCI Cloud Pilot; it was joined by the Broad Institute and the Institute for Systems Biology. Close by on Tech Review’s Top 50 list was another Cloud Pilot hopeful. Intel was not awarded a contract, but decided to build out its own Collaborative Cancer Cloud anyway. Intel was featured on Tech Review’s list at #49.
It would appear that, with the help of its CGC, Seven Bridges has arrived.
A $45 Million Lesson
After a series of CGC-related interviews, I followed up with Seven Bridges via email to ask the company how the CGC has contributed to the company's success. The company modestly sidestepped, playing its cards close to the vest.
"[Regarding] the CGC itself, there is clear value to working for one of the preeminent research institutes in the world with one of the largest genomic datasets ever assembled: the opportunity to learn," read the email reply from Brandi Davis-Dusenbery, Scientific Program Manager at Seven Bridges and head of the company's CGC project.
"Learn[ing]," it turns out, is pretty darned lucrative. According to Davis-Dusenbery, on February 16, 2016, the Seven Bridges CGC was launched worldwide. On the same day, Seven Bridges announced a $45 million Series A fundraising round, with an undisclosed valuation.
Some context: In 2014, Seven Bridges' pre-CGC headquarters were barely in Cambridge—right along the border of far less sexy Watertown, practically next door to a Star Market grocery store (on the parking-lot side, no less), far from the action in Kendall Square (dubbed by MIT and local business leaders as "the most innovative square mile on the planet"). The company also had an office in Belgrade, Serbia, and a modest presence in London.
Meanwhile, the company's PR and marketing department was in its nascency. At the time, the startup was interviewing to fill a previously nonexistent Head of Marketing and PR position, its legal department consisted of a single law student, and its entire staff numbered fewer than 90.
Since its contract award from the NCI, lots of good things have happened for Seven Bridges—learning opportunities aside.
The company has moved its Cambridge offices at least twice since 2014 and expanded substantially; it is now located in the far more clout-wielding and ritzier office building at 1 Main Street in Kendall Square—minutes on foot from the heart of the MIT campus, the Microsoft New England Research & Development (NERD) Center, and Boston's Massachusetts General Hospital. The company has also built out its European presences including a recently opened office in Istanbul, and now has an office in San Francisco as well (joining the geographical ranks of other hot tech startups). It has a well-built marketing team, actual General Counsel (the former law student, now graduated and a full-fledged member of the Bar), and has outsourced its outreach efforts to San Francisco PR and social media firm, Kickstart Consulting.
And it’s still expanding.
"We have more than 230 [employees] now!" beamed Gruen in a recent email interview. "I’ve asked for the exact number, but honestly we’re hiring so fast I don’t know off the top of my head."
The Submission Process
It all began in 2013, when the US government, overwhelmed by big data, issued a request for information related to what interested parties in the private sector believed the needs of the cancer bioinformatics computing field to be—and, naturally, how said interested parties could meet those needs. An RFP followed in January 2014.
"Before even [accepting an] initial project request or submission… the government solicited feedback from the research community through a variety of methods," explained Davis-Dusenbery in a separate interview with Bio-IT World. "[There] was a very traditional RFI procedure but then there was also… a forum where people… from all over could give feedback about the deficiencies of computing with genomic data—and, following that, the NCI issued requests for proposals through a broad agency announcement for the Cloud Pilot."
Davis-Dusenbery called special attention to the funding mechanism the government chose: a "broad agency announcement." Under this funding mechanism, Davis-Dusenbery said, the proposal process has, "allow[ed] each contract winner to set their own budget, statement of work, and milestones [related to a] large number of criteria and requirements."
"The point of the process was to come up with really innovative approaches to solving a problem," Davis-Dusenbery added. "We were thrilled to be awarded the opportunity to build a system that reflected our vision for the future of genomics research. One of our strengths in developing the [CGC] is our considerable expertise and experience in the underlying methodology of how to build cloud-based systems for research that are usable, collaborative, scalable, and reproducible."
The Practical Side
This four-pronged methodology (which Davis-Dusenbery argued in a May 6 NCI blog post is essential to good cloud-computing practice in general) seems to have formed a sound basis in the real world of cancer research via cloud-based collaboration. According to Davis-Dusenbery approximately 625 people worldwide have used the CGC to analyze cancer genomics data. Davis-Dusenbery prides herself on the platform's speed and accessibility as one particular example stands out in her mind—that of Jeff Chuang, an investigator at the Jackson Laboratory.
"[Chuang's] group ran a pipeline [on the CGC] comparing 3 somatic variant callers across multiple diseases," Dusenbery told me. "They ran more than 4,800 analyses (3 weeks of computation time in total) over the course of about 3 days after learning the system."
The NCI, too, appears duly impressed with the CGC's practical functionality.
"Users [of the CGC] can add their own tools using their simple SDK [software development kit], which is based on Docker and Python. They have graphical tools for data mining, a genome graph, [and support for] versioning and reproducibility; task outputs are linked in [the] workflow...and [the CGC] has an innovative genome browser," Tony Kerlavage, chief of cancer informatics at the NCI's Center for Biomedical Informatics and Information Technology, told an audience of workshop attendees at this year's Bio-IT World Conference & Expo—going on to note that Seven Bridges has deployed more than 30 public pipelines on the CGC. "Pipelines can be customized using graphical editors [on the CGC]; you can actually drag and drop these things into place and create your own pipelines."
An Outsider's Perspective
Considering these powerful functionalities built upon her four thoughtful use-case principles, I asked Davis-Dusenbery if it would be fair to say that Seven Bridges and the work and philosophies underlying the CGC represent a "more practical side" of genomics and bioinformatics compared to their institutional NCI Cloud Pilot brethren.
"I think that there [are] certainly some aspects of that," Davis-Dusenbery agreed. "We [have done] a lot of interesting, innovative things with the project so far… Because we were coming from outside of the TCGA community—unlike the Broad Institute and the Institute for Systems Biology—we were able to take a different look at the way TCGA data was described, and in doing that we built out what we called a metadata browser, [allowing] researchers to build complex queries either visually or programmatically in a way that was really never possible before."
But What If…?
Seven Bridges presently keeps very busy with several other projects too, including a collaborative R&D agreement with the Department of Veteran Affairs developing analytic and hybrid cloud initiatives for the VA's Million Veteran Program, participation in the development of the Common Workflow Language project for enhancing portability for analytic workflows, and several other projects supporting genomics and bioinformatics initiatives.
It leaves one to wonder, then, what would have happened if Seven Bridges had not been selected for the NCI Cloud Pilot. Would it have taken the same path as NCI-rejected Intel and developed its own collaborative cloud platform anyway? Or would the company have moved on to focus on other endeavors?
I laid my what-if thought experiment out for Gruen, asking him what the contingency plan might have been; his reply was upbeat, if evasive.
"[W]e applied just as we would go after any potential project: hoping we’d win, but continuing to do many other things," he responded.
Davis-Dusenbery, meanwhile, could only offer her own personal outlook when asked the same question about the CGC—her pet project at Seven Bridges.
"I’ve been, like, living this for the past two years," replied Davis-Dusenbery. "It’s hard to imagine an alternative reality." Moments later, she added: "I probably would have slept a lot more in the past two years."
Correction: We originally misspelled Brandi Davis-Dusenbery's surname. It's been corrected.