DNAnexus debuts cloud computing resource for next-gen sequence data at Bio-IT World Expo.
May 18, 2010 | BOSTON—DNAnexus, a Stanford start-up that aims to marry two of the hottest Bio-IT trends—next-generation sequencing data analysis and cloud computing—formally unveiled its debut offering at the 2010 Bio-IT World Expo.
According to co-founder and chief executive Andreas Sundquist, DNAnexus can be thought of as “a genome browser crossed with Google Maps,” a cloud- and web-based solution for next-gen sequence analysis. The other two co-founders are Stanford pathology professor Arend Sidow, and Sundquist’s former PhD advisor, Stanford computational biologist Serafim Batzoglou.
Sundquist and Sidow said that DNAnexus grew out of the observation that as researchers grappled with the flood of next-generation sequencing (NGS) data, computation would be an issue. “I’m reasonably computer savvy,” said Sidow, who is on the faculty in Stanford’s pathology department, “but many colleagues using next-gen sequencing in cell biology don’t have the expertise to deal with really large data sets. We saw an opportunity to make a company out of it.”
“There are a lot of companies doing bioinformatics,” added Sundquist. “But what’s new here is the next-gen sequencing and the arrival of cloud computing as a real infrastructure you can leverage. We put two and two together, and thought this was a great opportunity to help a lot of people.” DNAnexus was founded in early 2009 with funding from First Round Capital and some angel investors.
DNAnexus is setting out to address familiar problems facing researchers handling the flow, analysis and storage of NGS data. Sidow summarized the workflow problem as follows: “Sequencing machines generate raw data. Next, you need the computer hardware, you need to be software savvy, and you need a visualization tool. Then get the results. DNAnexus obviates the need for the hardware, all you need is a desktop computer. It obviates the need for algorithms and open-sources software—we give you that. And it obviates the need for visualization, because there’s a browser associated with the data.”
DNAnexus provides a Web 2.0 app that allows users to pan around, zoom in and out, providing essentially instant access to their data sitting in the cloud. “We have in essence this platform that runs completely in the cloud,” says Sidow. “You upload data directly through the Web site, it gets mapped, and quality checking goes on in the background without you having to do anything.”
Because the data sit in a central server, data transfer is handled by DNAnexus. “We’re storing tens of Terabytes of data in the cloud right now,” says Sundquist. “We spin up hundreds of CPUs that do all the work in parallel, and bring them down when we’re done. So it’s a very cost-effective solution. You don’t have to purchase a cluster to get this capacity.”
And in contrast to the UCSC genome browser, which hosts researchers’ data for a limited time (even though it’s not their mission to do so), Sidow says, “our genome browser is for your own data. It’s as if you had Google Maps.”
DNAnexus is using Amazon Web Services, which Sundquist says is the most flexible cloud platform available. “A lot of people talk about using the cloud, and even though it’s an easy resource—anyone can use the cloud instantly—it’s a non-trivial thing to set up an infrastructure that integrates not just the backend for storage and compute but also all the tools, from uploading the data to mapping to visualization of results, to sharing datasets and doing analysis on the results. The cloud is like a nice flexible framework to build interesting things, but it doesn’t do it by itself.”
“We’re going to store the data forever,” Sundquist continues. “People don’t have to worry about how they’re going to manage their data. We’re storing in Amazon’s reliable redundant storage, which means they can be confident it’ll be there forever. There’s no need to store in a backup system.”
Sundquist says DNAnexus is engineered to scale in the cloud. “Compared to the cost of a sequencing run, it’ll be quite a bit cheaper to get the analysis done.” The platform is designed for large data sets, tens or hundreds of millions of reads, belonging to any organism. The initial focus is more on functional genomics—ChIP-seq (chromatin immunoprecipitation), RNA-seq, RNA splicing—than whole-genome data.
DNAnexus uses its own mapping tools, which Sundquist claims are more accurate than tools such as the Bowtie short-read aligner. “Because we use the cloud, we don’t have to use shortcuts to get the answer quickly,” says Sundquist. “We have a very accurate aligner but still ID all the variants.”
More features will be added in the next few months—single-nucleotide polymorphisms, small indels, then larger indels, SVs. The platform isn’t perfect at launch, but Sundquist says it will add new features quickly, including more help features, just like Gmail. •
This article also appeared in the May-June 2010 issue of Bio-IT World Magazine. Subscriptions are free for qualifying individuals. Apply today.