Complete Genomics Details Its First Human Genome Assembly



By Kevin Davies

February 6, 2009 | MARCO ISLAND, Florida – Four months after stunning the next-generation sequencing community with its ambitious plans to launch a cut-price genome sequencing service, Complete Genomics CEO Clifford Reid presented details of its first human genome assembly on Thursday evening at the Advances in Genome Biology and Technology conference.

Reid’s much anticipated talk focused on the company’s first human genome assembly – a project that was disclosed last October, even though the firm’s scientific advisory board had yet to review it. In total, Reid said Complete had sequenced the genome of an anonymous Caucasian male (one of the HapMap samples) to 90-fold coverage, generating a total of 630 gigabases (Gb) of sequence, of which about 250 Gb have been mapped to the reference genome sequence. “Not an interesting individual, but one where we had some good HapMap data so we could at least measure ground truth,” Reid told Bio-IT World in a briefing prior to his talk. The proportion of mapped data would have been larger but for a power cut that temporarily shut down the firm’s sequencing facility last Christmas.

Reid said the company has submitted a disc drive with the raw data earlier this week to the National Center for Biotechnology Information (NCBI), and invited fellow researchers to perform their own genome assemblies. If they can’t download the full file (20 days on a T1 line!), Complete Genomics will FedEx a disc.

The sequence covered 92% of the genome. “The 8% we didn’t sequence is the 8% you’d expect – the long repeats, the telomeres and centromeres, the places short-read sequencing doesn’t reach into very well,” said Reid. The single-read accuracy was “about 0.34% discordant.” That is, Complete sees a base at odds with the reference genome about 1 in 300.  “That’s pretty good for raw read accuracy,” he said. The assembled discordance is tougher to measure, said Reid, but after selecting a high-quality subset of the HapMap database, they found 170 discordances, which were confirmed by Sanger sequencing. Reid said the assembly concordance exceeds 99.99%. “It’s obviously a very high quality genome,” he said.

Co-founder and chief science officer Rade Drmanac told Bio-IT World the quality of the assembly compares favorably to other published genomes. “The 92% [coverage] is exactly what can be expected,” he said. “The [missing] 8% is long repeats and segmental duplications.” Drmanac added that his group is improving the long-fragment read technology, simplifying the sample preparation, and initiating automation. “We’re excited about the progress with library preparation, how much more efficient things are.”

The analysis revealed 3.3 million single nucleotide polymorphisms (SNPs), including 350,000 novel variants. “This one has the exact same number of SNPs as Watson’s genome – 3.3 million – unbelievable!” said Drmanac. He added, “This is either the fifth or sixth genome with published reads in the NCBI. I think in addition to demonstrating our technology, it’s a great contribution.” A manuscript is in preparation.

Boys of Summer

Reid insists that Complete remains on course to meet the initial goals outlined last October: to sequence 1,000 human genomes in 2009 and 20,000 in 2010, although he admits the schedule will be tough. “Our plan is still to release genomes at 40X coverage,” Reid said. “Our pricing remains the same. We plan to ship genomes at $5000 each.” However, Reid said the exact price schedule would not be announced until June. “We’ll figure out exactly what’s included, the volume discounts, that sort of thing,” said Reid. “We know the target price is $5000. Our materials cost of the last genome was just under $4000. The materials cost is coming down nicely as we automate and ring out the waste in the system.” The company has just started to sequence samples in collaboration with Lee Hood, and Reid also announced an agreement with the Broad Institute to sequence five genomes.

“There’s this big Gantt chart on the wall, a lot of things have to happen. [It will take] a combination of more instruments and further speed improvements. The first commercial generation of instruments we ship should be able to do 1000 genomes this year… The jeopardy would be we don’t have floor space to put them in!” To that end, Complete Genomics just leased an additional 32,000 square feet of lab and office space to house the genome center, and aims to have the center operational in August. It will include a new generation of faster sequencing instruments. Said Reid: “The current instruments we’re running are R&D boxes. They’re not as fast as the commercial systems. We‘ve got two prototypes of commercial systems coming up.” The R&D instruments are running at about 70 Gb/run. “For a $5000 genome, these boxes need to get faster still,” said Reid.

“Once we have this genome center operational, we remain committed to the plan of building additional genome centers around the world,” said Reid. “We expect to put those all over the place. That’s how we’ll expand the organization, rather than trying to build one monolithic genome center in Mountain View, California – a reasonably dumb place to put a genome center given the costs down here.”

Assembly Required

Complete identified some 400,000 short indels [insertions/deletions] using its own proprietary software, but Reid admits there is room for improvement. “The assembly software does not today call large structural variations,” he acknowledged. “That’s one of our next high priority projects -- to tease out of the datasets major structural rearrangements, inversions, translocations etc.” Reid calls it “a strategic commitment to write the assembly software that spans the spectrum of variance detection from SNPs to assembling a cancer genome.”

From a current throughput of 70 Gb/run, Reid said in his talk he is aiming for 200 Gb/run by this June, and 600 Gb/run by the end of the year. Sequencing costs would come down with scale, but the largest line-item was currently computing, eventually becoming imaging. Reid showed a movie of the DNA nanoballs dropping into the gridded wells like tiny roulette balls. The final capacity will be 1 billion wells. By 2010, the data center would contain 60,000 cores and 30 petabytes of disk drive. Reid said it would be half the size of the largest computing center in the world.

In a statement, Complete Genomics advisor and Harvard Medical School professor George Church said the genome assembly was “a major achievement” that “surpassed expectations.” Having reviewed the data, Church said his team had “confirmed that it falls in line with what is expected of an individual genome. It is highly concordant with previously published work on this genome and with data from public variation repositories."

Reid concluded his presentation by stating his goal was to offer “a turnkey solution for the scientific community.” The focus would be entirely on human genomes – not mouse or any other model organism. “Send us your samples,” he told the audience. “We’ll sequence them, we’ll assemble them, we’ll generate the variants list, and we’ll send it back to you quickly. When we’re doing your assembly, 60,000 processors are going to light up!”

“We’re a wholesaler of complete human genomes to the scientific community. We’re have no intention  of writing NIH grants,” said Reid, adding he planned to partner with genome centers such as the Broad, research centers such as the Institute for Systems Biology, and the direct-to-consumer companies. The five-year mission was to build ten genome centers around the world that would sequence 1 million genomes in that period.  

“We’re trying to make sequencing completely ubiquitous.”

 

Click here to login and leave a comment.  

0 Comments

Add Comment

Text Only 2000 character limit

Page 1 of 1



White Papers & Special Reports

sgi - whp 1
Turning Genomics Data into Practical Insight
Sponsored by SGI

With worldwide sequencing capacity approaching 13 quadrillion DNA bases annually turning genomics data into knowledge is a true computational challenge. Read this paper and learn how the SGI UV coherent shared memory platform can:  

  • Speed results time while cost competitively tackling the most difficult computational problems across all omics disciplines. 
  • Push performance by scaling to extraordinary levels, up to 256 sockets (2,560 cores, 4,096 threads) per single system (one OS image). 

Provide support for up to 16TB of coherent shared memory in a single system image enabling extreme efficiency across a wide range of compute demands. 



accerlys-logo_2012_wh
New Complimentary Market Survey…
Collaborations and Communications Within Drug Discovery Research
Sponsored by Accelrys
This survey was conducted by the Cambridge Healthtech Media Group in January, 2012. It was sponsored by Accelrys related to their HEOS initiative to gather valid information around externalizing collaborative research while improving communications in the cloud. With 310 qualified industry respondents the survey findings reveal useful usage and trends patterns.  An insightful follow-on discussion and webinar related to this survey, and the HEOS by Scynexis SaaS portal is also available on the Bio-IT World website for complementary viewing.
 


Job Openings

tessella logo 
Scientific Software Engineer
Boston MA
$70,000 to $95,000
 

Tessella delivers software engineering and consulting services to leading pharmaceutical and biotech companies. We are recruiting Software Engineersto work with skilled bioinformaticians and scientists to identify business needs and recommend and develop technical solutions. Applicants require BS, MS or PhD in bioinformatics, biology or chemistry and 2+ years of software development in either: Java, C#, C++, C or VB.NET. 

Apply at http://jobs.tessella.com   

 

oxford nanopore logo 


 Early Access Collaborations Managers
Oxford Nanopore Technologies is developing a novel technology, GridIONTM for the direct, electronic analysis of DNA/RNA and other analytes.  As the system approaches the market, we are building a team of technically knowledgeable, highly motivated candidates with excellent customer service and facilitation skills to join our company as Collaboration Managers.  This is a unique opportunity to work with world-leading genomics customers throughout the early adoption phase of a new generation of DNA sequencing technology.. This is a facilitative, enabling role with responsibility for managing technology development collaborations with key customers at leading genomics institutions.  It will include long term management of the collaboration plan and milestones and associated meetings and documentation. Click here to find out more and apply   

Oxford Nanopore's GridION technology, VP, Sales and Marketing Oxford Nanopore Technologies is a fast-moving technology company that is developing a novel electronic molecular analysis technology. The technology is adaptable for the analysis of DNA/RNA, proteins, chemicals and other molecules.  It is therefore suitable for use in a variety of markets including scientific research and clinical applications.  As the technology approaches the market, Oxford Nanopore is seeking a visionary VP of sales and marketing to join the senior team.  The candidate will embrace the opportunities afforded by entering the market with a truly disruptive technology that has the potential to expand the number of users and the variety of applications in each target market.  This is a rare opportunity to influence the commercial strategy at an early phase of its commercial lifetime, in a well funded company.  Oxford Nanopore welcomes applications from candidates with a track record of high-level strategic commercial  leadership, who wish to apply a fresh approach to existing markets.  Experience in Life Sciences/DNA sequencing is central to this role, however we will consider your application if you have experience of disruptive technologies in other related industries.  We are particularly interested in candidates with strong expertise in the use of digital technologies for sales and marketing of scientific/technical products.  Click to  Apply  


 

For reprints and/or copyright permission, please contact  Tim McLucas, (781) 972-1342, tmclucas@healthtech.com .