DNAnexus Debuts Cloud Computing Resource for Next-Gen Sequence Data at Bio-IT World Expo



By Kevin Davies

April 20, 2010 |

BOSTON – DNAnexus, a Stanford start-up that aims to marry two of the hottest Bio-IT trends – next-generation sequencing data analysis and cloud computing – formally unveiled its debut offering today.

According to co-founder and chief executive Andreas Sundquist, DNAnexus can be thought of as “a genome browser crossed with Google Maps,” a cloud- and web-based solution for next-gen sequence analysis.

Sundquist provided details of the DNAnexus offering in a talk on the opening day of the 2010 Bio-IT World Conference & Expo. The other two co-founders are Stanford pathology professor Arend Sidow, and Sundquist’s former PhD advisor, Stanford computational biologist Serafim Batzoglou. (Fun fact: Sidow’s lab was the first to have an Applied Biosystems next-generation SOLiD sequencer.)

In an earlier briefing with Bio-IT World, Sundquist and Sidow said that DNAnexus grew out of the observation that as researchers grappled with the flood of next-generation sequencing (NGS) data, computation would be an issue. “I’m reasonably computer savvy,” said Sidow, who is on the faculty in Stanford’s pathology department, “but many colleagues using next-gen sequencing in cell biology don’t have the expertise to deal with really large data sets. We saw an opportunity to make a company out of it.”

“There are a lot of companies doing bioinformatics,” added Sundquist. “But what’s new here is the next-gen sequencing and the arrival of cloud computing as a real infrastructure you can leverage. We put two and two together, and thought this was a great opportunity to help a lot of people.” DNAnexus was founded in early 2009 with funding from First Round Capital and some angel investors.

DNAnexus is setting out to address familiar problems facing researchers handling the flow, analysis and storage of NGS data. Sidow summarized the workflow problem as follows: “Sequencing machines generate raw data. Next, you need the computer hardware, you need to be software savvy, and you need a visualization tool. Then get the results. DNAnexus obviates the need for the hardware, all you need is a desktop computer. It obviates the need for algorithms and open-sources software -- we give you that. And it obviates the need for visualization, because there’s a browser associated with the data.”

DNAnexus provides a Web 2.0 app that allows users to pan around, zoom in and out, providing essentially instant access to their data sitting in the cloud. “We have in essence this platform that runs completely in the cloud,” says Sidow. “You upload data directly through the Web site, it gets mapped, and quality checking goes on in the background without you having to do anything.” Users can visualize their data, add tracks of other people’s data in parallel, as well as collaborate and share data.

Because the data sit in a central server, data transfer is handled by DNAnexus. “We’re storing tens of Terabytes of data in the cloud right now,” says Sundquist. “We spin up hundreds of CPUs that do all the work in parallel, and bring them down when we’re done. So it’s a very cost-effective solution. You don’t have to purchase a cluster to get this capacity.”

And in contrast to the UCSC genome browser, which hosts researchers’ data for a limited time (even though it’s not their mission to do so), Sidow says, “our genome browser is for your own data. It’s as if you had Google Maps.”

Cloud Capacity

DNAnexus is using Amazon Web Services, which Sundquist says is the most flexible cloud platform available. “A lot of people talk about using the cloud, and even though it’s an easy resource -- anyone can use the cloud instantly -- it’s a non-trivial thing to set up an infrastructure that integrates not just the backend for storage and compute but also all the tools, from uploading the data to mapping to visualization of results, to sharing datasets and doing analysis on the results. The cloud is like a nice flexible framework to build interesting things, but it doesn’t do it by itself.”

“We’re going to store the data forever,” Sundquist continues. “People don’t have to worry about how they’re going to manage their data. We’re storing in Amazon’s reliable redundant storage, which means they can be confident it’ll be there forever. There’s no need to store in a backup system.”

Sundquist says DNAnexus is engineered to scale in the cloud. “Compared to the cost of a sequencing run, it’ll be quite a bit cheaper to get the analysis done.” The platform is designed for large data sets, tens or hundreds of millions of reads, belonging to any organism. The initial focus is more on functional genomics – ChIP-seq (chromatin immunoprecipitation), RNA-seq, RNA splicing – than whole-genome data.

DNAnexus uses its own mapping tools, which Sundquist claims are more accurate than tools such as the Bowtie short-read aligner. “Because we use the cloud, we don’t have to use shortcuts to get the answer quickly,” says Sundquist. “We have a very accurate aligner but still ID all the variants.”

More features will be added in the next few months --– single-nucleotide polymorphisms, small indels, then larger indels, SVs. The platform isn’t perfect at launch, but Sundquist says it will add new features quickly, including more help features, just like Gmail. The main sequence data sources will be Illumina, Life Technologies, and Pacific Biosciences. Sundquist says the platform isn’t quite so compatible with 454 or Complete Genomics.

Price is Right

DNAnexus “flipped the switch” in early April to the public, but alpha users at Stanford and elsewhere have been using the service for 2-3 months. While it might not immediately interest the largest genome centers, Sidow is optimistic that it will meet the unmet needs of the long tail of experimentalists, most of whom need an on-call bioinformatics collaborator. “The democratization of sequencing has been so immense, [we] don’t have to focus on the large centers.”

Sundquist says the pricing is designed to make the system “a no-brainer.”

“If you sequence a lane of [the Illumina] GA – typically reagent costs are $500. We wanted to price an analysis of that same lane of data much lower than the sequencing itself -- $95.” Prices for an Illumina GA lane range from $55-95, and for a HiSeq 2000 lane or a SOLiD quarter slide, $20-30/Gigabase, depending on total volume. Life Technologies’ SOLiD data are supported transparently. “We want color space to be an intermediate part of the data,” says Sundquist.

Sundquist says DNAnexus will be very competitive compared to other next-gen data analysis offerings such as CLC bio and GenomeQuest. “It’s astounding how expensive [the competition] is. I know our own costs… it’s not that much.”

Many of the DNAnexus team – currently less than 10 -- comes from an engineering or computer science background. “We’re engineers at heart,” says Sundquist. “We build things to be scalable.” One of the first recruits is Daryl Thomas, a bioinformatician who was previously at Navigenics, a personal genomics company.

Eds Note: The 2010 Bio-IT World Conference & Expo runs from April 20-22, attracting close to 2000 attendees.

 

Click here to login and leave a comment.  

0 Comments

Add Comment

Text Only 2000 character limit

Page 1 of 1





For reprints and/or copyright permission, please contact  Tim McLucas, (781) 972-1342, tmclucas@healthtech.com .