The Quest to Make Sequence Sense


By Kevin Davies

Nov. 15, 2006 | With the human genome sequenced several years ago, the challenge for biopharma organizations mining this invaluable trove of data is evolving.

New questions are emerging:

•           What is the patent landscape of human genome data?

•           How can I search GenBank faster and more effectively?

•           How can I produce accurate maps of SNPs and exons?

•           How can I annotate resequenced genome data?

GenomeQuest
Many of these questions can be addressed by GenomeQuest, the flagship product of Gene-IT. The company was founded in 1998 by bioinformatician Jean-Jacques Codani, the Paris-based chief science officer. President and CEO Ronald Ranauro, a software engineer and founder of Blackstone Computing, took over in 2002. GenomeQuest debuted in 2004.

GenomeQuest 3.0 is a biological sequence search product that creates integrated views of genome data, allowing biologists and intellectual property lawyers to evaluate sequence data and their associated patent status. The product includes an indexed archive of GenBank, EMBL databases, and 30 million patented sequences from the US and abroad, which are updated daily. “IP is where the rubber meets the road,” says Ranauro, “which programs to advance, which experiments to fund?”

The service can be hosted on a client’s in-house servers or accessed as a secure, hosted Internet service if customers don’t have the requisite IT capacity or comfort. Often customers start by using the hosted service, then deploy the system in-house. The hosted “GenomeQuestLive” resource consists of 20-node, 80 Opteron CPUs, with 80 GB RAM. Gene-IT can run three jobs simultaneously across the entire resource, with users able to filter and refine the results using sequence, alignment, and annotation properties. Alerts can be set up as new sequences are uploaded, and results further mined.

Among the latest additions to Gene-IT’s 60-strong customer roster is Biogen Idec. Others include international patent offices; biotechs such as Celera, Millennium, and Roche Diagnostics; big pharmas including Pfizer, Novartis, and Sanofi-Aventis; and ten biotechnology law practices such as Foley Hoag. Several customers come from the diagnostics area, where FDA approval moves faster yet can still consume $10 million in 6 months. “This is what keeps product managers up at night,” says Ranauro.

As Ranauro sees it, the virtues of GenomeQuest aren’t so much about raw speed as offering a unique view of the genome and patentome that affords scientists, business developers and lawyers the ability to view and mine the same data. The Biogen Idec deal, he says, underscores “our eminence as the leading solution provider for IP sequence search.”

More than two thirds of Gene-IT customers use GenomeQuest for IP-related searches including genes, proteins, probes and primers, helping to prioritize research products or abandon programs where rivals may have greater IP. Other search applications include high-throughput annotation of resequenced genomes, and validating and aligning SNP and exon-intron data over public databases such as dbSNP.

Ranauro says Gene-IT is evolving GenomeQuest from an application to a platform. The company is focusing on three major enhancements to the product: simplifying access; adding diverse biological archive information to the patent content; and most importantly, providing simple web-level API access to initiate searches via URLs.

SlimSpeed
For sheer speed in alignment analysis, few can surpass New Zealand’s Cartesian Gridspeed, which is preparing to release its SLIM Search software. SLIM Search, which does sequence alignments thousands of times faster than BLAST, just completed an international beta phase.

Two months ago, Cartesian signed Agencourt as a major customer. “Agencourt has been extremely happy,” says company founder and CEO Leonard Bloksberg.  As part of Agencourt’s contract sequencing service, it provides pre-analyzed results to their customers. “They were scheduled to buy another $1-million rack to keep up with the growing volume of searches. Instead, they ended up purchasing the SLIM Search software,” says Bloksberg.

Agencourt’s beta evaluation also provided a couple of key enhancements to the product. One was the need for cluster compatability, so “We wrote a distribution mode to run across a cluster,” says Bloksberg, noting that this will be a standard feature in a future release. “Some companies just want you to sit on their clusters,” says Bloksberg. “There are big jobs where you can require large amounts of RAM at peak performance that could take just minutes on a cluster.”

Cartesian also incorporated a module to format search results in a manner identical to the familiar BLAST output. Additional functionality includes adding Mac user support for the first time.

Bloksberg recounts a demo he gave for a prospective customer a few months ago. He was running the software live on his two-year-old Dell laptop (1.4 GHz Pentium M, with 1 GB of RAM), on battery power. Asked to do a comprehensive search using the entire C. elegans EST transcriptome, Bloksberg started to sweat as his laptop’s system monitor showed all resources running at maximum capacity. After four and a half minutes, the result sputtered out. Bloksberg was embarrassed, until the customer said, “Oh my God, the same search takes 5-10 hours on our 700-node Opteron cluster!” The electricity savings alone could buy the laptop to conduct the search.

So far, however, the response at major biopharmas “has been somewhat varied,” Bloksberg acknowledges. At one firm, the evaluator was not allowed to change the way the sequences were put through the system to take advantage of SLIM Search’s improvements. “In general, big pharmas move slower, so if they have a system to deal with sequence data, they can’t change that quickly.” New features are in development to conform to biopharma preferences, which Bloksberg says should be ready before the end of the year.

Sequencher Thirst
After years of being sidetracked working on forensic software, Ann Arbor-based Gene Codes Corporation is enhancing its flagship Sequencher desktop DNA sequence assembly and analysis product. In recent years, Gene Codes has made headlines for its efforts in forensic identification database after 9/11 and other disasters (see “Soul Searching,Bio-IT World, Sept. 2003).

Sequencher version 4.7 features an enhanced Variance Table to allow editing of data with table cells, with those changes automatically updated to the samples sequence and chromatograms, making the identification of SNPs and heterozygotes even easier. Further improvements include enhanced GenBank feature handling; updated HTML help; and expanded file export capabilities. The new version also offers improved forensic capabilities for mtDNA analysis.

Gene Codes founder and President Howard Cash says the response to the latest release has been outstanding. “Nobody can really touch Gene Codes in the small-to-medium sized sequencing market and I have to say I take some personal pleasure in watching other companies scramble to try to copy last year’s functions. We keep our upgrades dirt cheap so no customer ever feels penalized for buying a version too soon, and our users really appreciate that,” Cash told Bio-IT World.

Cash maintains that Sequencher is the community’s “premier DNA sequencing tool,” just as the company's forensic DNA tools are “several generations ahead” of anything else. “One thing that sets us apart is we don’t scour the free tools on the web and add odds and ends to our programs just to announce a new “feature.” We’re not in the business of adding features. We’re in the business of meeting clearly definable needs in the laboratory.”

Email Kevin Davies.

 Subscribe to Bio-IT World  magazine.

 

Click here to login and leave a comment.  

0 Comments

Add Comment

Text Only 2000 character limit

Page 1 of 1

White Papers & Special Reports

isilon white paper

“Storage for Science – Methods for Managing Large and Rapidly Growing Data Stores in Life Science Research Environments” sponsored by Isilon
Large and rapidly growing stores of file-based and other data are a hallmark of life science research and bioinformatics. Determining how best to manage those data stores has become a significant challenge for Researchers and IT Pros alike.

This paper is intended to:

  • Provide guidance on the many storage requirements common to Life Science research;
  • Explain the evolution of modern storage architectures;
  • Summarize the major data storage architectures currently in use.

Additionally, it will present the Isilon IQ clustered storage product as a strong and flexible solution to those needs. Download now



definiens briefingon-76Next-Generation Technologies Revolutionizing Oncology and Diagnostics
underwritten by Definiens

This “Briefing On” collection of Bio-IT World features, commentaries and analysis, presents some of the latest thinking on high-throughput technologies that are being applied to the fields of research and drug discovery, with particular emphasis on oncology, diagnostics and imaging technologies. Download now at no charge compliments of the underwriting sponsor, Definiens. Download This Free Paper



metaminer image(1)

MetaMiner™ Cystic Fibrosis Report,  Sponsored by GeneGo
This paper discusses the MetaMiner™ (CF) data analysis platform for a broad range of CF researchers designed to: 1. Easily assemble important biological and chemical experimental data available today in cystic fibrosis research. 2. Visualize key mechanisms leading to the disease through pathway maps and network models 3. Provide the CF community a “one stop shop” tool for uploading and analyzing experimental data in a disease-centered interface.  Download now 



Life Science Webcasts & Podcasts

Storage for Science
Methods for Managing Large and Rapidly Growing Data Stores in Life Science Research Environments

Sponsored by Isilon

Isilon webcast1

Large and rapidly growing stores of file-based and other data are a hallmark of life science research and bioinformatics environments. Determining how best to manage those data stores has become a significant challenge for the Researchers and IT Professionals that support them.

This webcast is intended to: 

  • Provide guidance on the many storage requirements common to Life Science research; 
  • Explain the evolution of modern data storage architectures; 
  • Summarize the major data storage architectures currently in use;
  • Present the Isilon IQ clustered storage product as a strong and flexible solution to those needs.

    Download this webcast

More Podcasts

Job Openings

Isilon Systems ~ Senior Marketing Communications Manager
Isilon Systems is the worldwide leader in clustered storage systems and software for digital content and unstructured data. We seek an experienced marketing communications professional/writer expert in creating and delivering effective and persuasive business communications. The ideal candidate can think at the strategic and conceptual level and act, simultaneously, as a highly-effective and productive individual contributor. The position is based in Seattle, WA. For additional information click here:
 

Lilly Singapore Center for Drug Discovery (LSCDD) - Associate Director of Informatics
Lead and mentor a strong team for the Bioinformatics group at the Integrative Computational Sciences (ICS) department at LSCDD towards the development of novel algorithms, data analysis methods and software tools for drug discovery. Work closely with the Software Engineering group at ICS, and collaborate with the Discovery IT organization in Europe and USA. For additional information, or to apply visit: LSCDD 

For reprints and/or copyright permission, please contact RMS, 1808 Colonial Village Lane, Lancaster, PA;

(717) 399-1900 ext. 125 or via email to bio-itworld@theygsgroup.com.