Nov. 15, 2006 | With the human genome sequenced several years ago, the challenge for biopharma organizations mining this invaluable trove of data is evolving.
New questions are emerging:
• What is the patent landscape of human genome data?
• How can I search GenBank faster and more effectively?
• How can I produce accurate maps of SNPs and exons?
• How can I annotate resequenced genome data?
Many of these questions can be addressed by GenomeQuest, the flagship product of Gene-IT. The company was founded in 1998 by bioinformatician Jean-Jacques Codani, the Paris-based chief science officer. President and CEO Ronald Ranauro, a software engineer and founder of Blackstone Computing, took over in 2002. GenomeQuest debuted in 2004.
GenomeQuest 3.0 is a biological sequence search product that creates integrated views of genome data, allowing biologists and intellectual property lawyers to evaluate sequence data and their associated patent status. The product includes an indexed archive of GenBank, EMBL databases, and 30 million patented sequences from the US and abroad, which are updated daily. “IP is where the rubber meets the road,” says Ranauro, “which programs to advance, which experiments to fund?”
The service can be hosted on a client’s in-house servers or accessed as a secure, hosted Internet service if customers don’t have the requisite IT capacity or comfort. Often customers start by using the hosted service, then deploy the system in-house. The hosted “GenomeQuestLive” resource consists of 20-node, 80 Opteron CPUs, with 80 GB RAM. Gene-IT can run three jobs simultaneously across the entire resource, with users able to filter and refine the results using sequence, alignment, and annotation properties. Alerts can be set up as new sequences are uploaded, and results further mined.
Among the latest additions to Gene-IT’s 60-strong customer roster is Biogen Idec. Others include international patent offices; biotechs such as Celera, Millennium, and Roche Diagnostics; big pharmas including Pfizer, Novartis, and Sanofi-Aventis; and ten biotechnology law practices such as Foley Hoag. Several customers come from the diagnostics area, where FDA approval moves faster yet can still consume $10 million in 6 months. “This is what keeps product managers up at night,” says Ranauro.
As Ranauro sees it, the virtues of GenomeQuest aren’t so much about raw speed as offering a unique view of the genome and patentome that affords scientists, business developers and lawyers the ability to view and mine the same data. The Biogen Idec deal, he says, underscores “our eminence as the leading solution provider for IP sequence search.”
More than two thirds of Gene-IT customers use GenomeQuest for IP-related searches including genes, proteins, probes and primers, helping to prioritize research products or abandon programs where rivals may have greater IP. Other search applications include high-throughput annotation of resequenced genomes, and validating and aligning SNP and exon-intron data over public databases such as dbSNP.
Ranauro says Gene-IT is evolving GenomeQuest from an application to a platform. The company is focusing on three major enhancements to the product: simplifying access; adding diverse biological archive information to the patent content; and most importantly, providing simple web-level API access to initiate searches via URLs.
For sheer speed in alignment analysis, few can surpass New Zealand’s Cartesian Gridspeed, which is preparing to release its SLIM Search software. SLIM Search, which does sequence alignments thousands of times faster than BLAST, just completed an international beta phase.
Two months ago, Cartesian signed Agencourt as a major customer. “Agencourt has been extremely happy,” says company founder and CEO Leonard Bloksberg. As part of Agencourt’s contract sequencing service, it provides pre-analyzed results to their customers. “They were scheduled to buy another $1-million rack to keep up with the growing volume of searches. Instead, they ended up purchasing the SLIM Search software,” says Bloksberg.
Agencourt’s beta evaluation also provided a couple of key enhancements to the product. One was the need for cluster compatability, so “We wrote a distribution mode to run across a cluster,” says Bloksberg, noting that this will be a standard feature in a future release. “Some companies just want you to sit on their clusters,” says Bloksberg. “There are big jobs where you can require large amounts of RAM at peak performance that could take just minutes on a cluster.”
Cartesian also incorporated a module to format search results in a manner identical to the familiar BLAST output. Additional functionality includes adding Mac user support for the first time.
Bloksberg recounts a demo he gave for a prospective customer a few months ago. He was running the software live on his two-year-old Dell laptop (1.4 GHz Pentium M, with 1 GB of RAM), on battery power. Asked to do a comprehensive search using the entire C. elegans EST transcriptome, Bloksberg started to sweat as his laptop’s system monitor showed all resources running at maximum capacity. After four and a half minutes, the result sputtered out. Bloksberg was embarrassed, until the customer said, “Oh my God, the same search takes 5-10 hours on our 700-node Opteron cluster!” The electricity savings alone could buy the laptop to conduct the search.
So far, however, the response at major biopharmas “has been somewhat varied,” Bloksberg acknowledges. At one firm, the evaluator was not allowed to change the way the sequences were put through the system to take advantage of SLIM Search’s improvements. “In general, big pharmas move slower, so if they have a system to deal with sequence data, they can’t change that quickly.” New features are in development to conform to biopharma preferences, which Bloksberg says should be ready before the end of the year.
After years of being sidetracked working on forensic software, Ann Arbor-based Gene Codes Corporation is enhancing its flagship Sequencher desktop DNA sequence assembly and analysis product. In recent years, Gene Codes has made headlines for its efforts in forensic identification database after 9/11 and other disasters (see “Soul Searching,” Bio-IT World, Sept. 2003).
Sequencher version 4.7 features an enhanced Variance Table to allow editing of data with table cells, with those changes automatically updated to the samples sequence and chromatograms, making the identification of SNPs and heterozygotes even easier. Further improvements include enhanced GenBank feature handling; updated HTML help; and expanded file export capabilities. The new version also offers improved forensic capabilities for mtDNA analysis.
Gene Codes founder and President Howard Cash says the response to the latest release has been outstanding. “Nobody can really touch Gene Codes in the small-to-medium sized sequencing market and I have to say I take some personal pleasure in watching other companies scramble to try to copy last year’s functions. We keep our upgrades dirt cheap so no customer ever feels penalized for buying a version too soon, and our users really appreciate that,” Cash told Bio-IT World.
Cash maintains that Sequencher is the community’s “premier DNA sequencing tool,” just as the company's forensic DNA tools are “several generations ahead” of anything else. “One thing that sets us apart is we don’t scour the free tools on the web and add odds and ends to our programs just to announce a new “feature.” We’re not in the business of adding features. We’re in the business of meeting clearly definable needs in the laboratory.”
Email Kevin Davies.
Subscribe to Bio-IT World magazine.