German sequence contractor deploys three rival next-gen systems.
By Kevin Davies
Feb. 1, 2008 | Many executives would envy GATC Biotech CEO Peter Pohl. His company, based in Constance, Germany, near the Swiss border, has enjoyed a 200-fold increase in productivity in the past 12 months, courtesy of the boom in next-generation sequencing systems.
Pohl co-founded the high-throughput sequencing firm in 1990, with his two brothers and father, a former professor of molecular biology at the University of Constance, who originally conceived the idea. The Pohls launched the company with a mere $12-13,000 in starting capital. “It’s possible!” says Pohl.
During the past 15 months, Pohl says GATC has installed “all three validated sequencing systems” — Roche, Illumina, and Applied Biosystems’ (AB) SOLiD — along with what he calls the “gold-standard” 3730 instruments from AB. “We have all four leading technologies under one roof,” says Pohl. “We are the only commercial sequencing provider to my knowledge that can offer all these.”
GATC began life as a service company for research groups and industry. The firm’s first contract was a fragment analysis project for a big pharma for almost $37,000. Today, Pohl says that same work would be around $588! The next contract was for the European Union, and GATC has been profitable ever since. Much of its work has centered on microbial genome sequencing, having worked on about 100 projects. Another major focus is the potato genome project, still using standard shotgun cloning and Sanger sequencing (with four AB 3730 instruments).
By embracing new sequencing technologies, GATC has boosted its annual sequence output to some 350 gigabases. The company offered its first 454 sequencing in November 2006, installing the machine in January 2007, followed by an Illumina 1G Analyzer in early 2007, and an AB SOLiD machine last November. “We’re still evaluating the different technologies,” says Pohl, adding that it’s unlikely GATC will focus on one single system. “We’ll use the advantages of all the technologies,” he says (see sidebar: “Data Analysis” below).
Looking ahead, he says, “The whole era of single molecules sequencing technology will be very interesting. Speed is really taking off.” Purchasing a HeliScope is a possibility. “We wouldn’t be one of the leadings services providers in the world if we weren’t considering it,” says Pohl.
This year, Pohl expects GATC to work on a pair of disease-related genomics projects sequencing up to ten full human genomes.
Pohl says priorities for the instrument makers must be to reduce costs in chemistry and increase capacity. “It comes down to price per base pair,” says Pohl. “Then we have to see which technology is progressing with the most professional team in order to get capacity up, costs per base down, and the best technical support.”
Back in 1990, Pohl says cost of sequencing a base pair was $25. “Today, the cost per base pair is definitely less than 0.1 cent/base pair. This is 30,000 times less than just 16 years ago! If we look forward another 16 years, we come to a price for onefold coverage of a human genome of [about $99].” Within a decade, Pohl firmly predicts that whole genome sequencing for medical applications will be less than $1000. But ensuring sufficient data quality to use as a diagnostic tool is a key question.
Sidebar: Data Analysis
Christopher Bauser, GATC’s head of bioinformatics, sees first hand the strengths and weaknesses of the new sequencing platforms.
“The 454 system has relatively long read lengths, so it’s an advantage for de novo sequencing,” says Bauser. “Coupled with paired end reads, it’s a very powerful system for assembly. But the number of reads or bases is significantly less than the SOLiD or Illumina systems per run. The cost of a single run on each of the systems is about the same, but a  GS FLX run is only 7-10 hours, whereas the Illumina GA is 3 days. SOLiD is also 3-4 days. The costs/base for SOLiD and Illumina are probably lower in that regard, but the reads are relatively short.”
Meshing data output from different platforms “is an issue, but it’s not a problem,” says Bauser. “Most of the analysis methods we’re using work just fine with text files or FASTA formats, it’s a trivial little program that we use to transform sequencing output into these text files, which we can analyze with standard software.”
Bauser says the method of analysis hinges on the underlying technology. “If most of the data we’re analyzing for a specific project is in color space, then we use programs best suited to color space, and just translate any Sanger sequences we have into color space/FASTA format.” In such cases, it often makes sense to use one of the AB software packages, “and translate other technologies into color space.” Roche and Illumina likewise provide analysis software that is suited to their specific technology, but also compatible with standard data formats.
Nor is handling the data output proving too difficult — at least for now. Storing all the image files would be “a huge amount of data, but the information in the pictures, as soon as they’ve been analyzed, is superfluous. So eight hours after the run is finished, you can dispose of about 90% of the data you’ve generated. The sequences and quality scores need to be saved, and that’s going to be a problem eventually. But at the moment, it’s reasonable to keep these and just get rid of anything else.”
Some of GATC’s 60 employees are heavily involved in developing software tools to help render the information into a form that scientists can analyze quickly. “There’s an entire zoo of programs that Ph.D. students and postdocs have been writing to analyze this information. The next big thing is going to be assembling all of these into a small set of analysis programs that will be suitable to do the different types of sequencing project — de novo sequencing, resequencing, transcriptome research, and so on.” — KD
This article appeared in Bio-IT World Magazine.
Subscriptions are free for qualifying individuals. Apply Today.