Building a Google for Bioinformatics


By Malorye Allison

August 8, 2007 | It’s not getting the data but making sense of it that is the hard part in genomics. “One of the big challenges is combining data sets, such as metabolomic and gene expression data. You have to do a lot of manual manipulation,” says Alan Higgins, senior director of Translational Medicine at Cogenics (formerly Icoria), a division of Clinical Data (CLDA). Then there’s the complexity of adding information from external sources, typically in different formats.

These challenges have been dogging genomics researchers for years, with the only recourse for biologists being either to learn how to do all this data handling themselves, or lean on their IT guys for it.

That’s one reason CLDA’s Cogenics unit teamed up with IO Informatics on an $11.7 million, five-year, Advanced Technology Program grant, funded in 2002 by the National Institute of Science and Technology. CLDA was searching multiple data types to find biomarkers that predict disease or response to therapy. IO informatics brought something new to the table — intelligent multidimensional object (IMO) database records. 

IMOs are based on the same principles underlying the Semantic Web (see “Masters of the Semantic Web,Bio•IT World, Oct. 2005) Like PDFs, IMOs are portable and can be easily shared. Unlike PDFs, however, IMOs are created so that specific data types are turned into freeform relational objects: Within the platform, these discrete data types are still distinguished from each other, but now they can also be manipulated, integrated, and compared. Users can thus work with specific data within records as well as pass the whole record easily between them. 

CLDA’s researchers became collaborators and beta testers for a new IO Informatics software platform, providing queries and other input to the product’s development. The result is Sentient, which lets researchers, “look at all the data related to their field of interest, all at the same time, all in the same place, and regardless of type of information or where the data is located,” says IO Informatics CEO Robert Stanley. In short, people can ask complex research questions in a “Google-like” environment. 

Sentient Life
The platform was built to accommodate the breadth of data types that constitute the field of systems biology. IO Informatics has also added features to assist scientists in tackling various forms of analysis. Regardless of whether the data sit in a spreadsheet or a complex image, they can be easily moved, integrated, and analyzed. Data can be viewed through the Web Query — a browser that lets researchers peek at a variety of types of data — or the Knowledge Explorer, which lets them search and relate data. 

For example, researchers can select an interesting dataset, then drill down to a finer level and integrate it with other data. Scientists can dart into different databases, while focusing on the genes, proteins or compounds of interest. Because of Sentient’s semantics approach, scientists can also, “easily fit data from their own systems silos into internal or published pathways, interaction, or other correlation networks,” says Stanley.

For Higgins and colleagues at CLDA, hunting for toxicity biomarkers related to alcohol and other chemicals, this means being able to combine data from metabolomic and gene expression studies with digitized histopathology images. “This software gives me the ability, for the first time, to ask more complex questions,” says Higgins. “If I am looking at an alcohol study, and seeing effects in liver and brain, I can now ask if that’s happening in other studies, what is common between rats and human, and what is common to acetaminophen and alcohol.” 

Higgins concedes he could do similar things with other tools, “But it will take much longer and you have to do most of it manually. This is a key enabling tool.”  

Some of CLDA’s work centers on biomarkers of liver disease — part of the National Institute of Environmental Health Science’s Compendium Study. It is doing gene expression, metabolomic, histopathology, blood and urine studies in animals for various chemicals at different doses and time points to correlate specific signs, such as lobular necrosis, to particular markers.

“One of the key things Sentient lets us do is to ask the exactly the same set of questions about different compounds,” says Higgins. Higgins reports that they are seeing some preliminary correlations of genomic and metabolomic data. In addition, certain common features are starting to emerge. “Oxidative stress is clearly important in a variety of organ toxicities,” says Higgins. “And our data [are] bearing this out.”

Subscribe to Bio-IT World  magazine.

Click here to login and leave a comment.  

0 Comments

Add Comment

Text Only 2000 character limit

Page 1 of 1

White Papers & Special Reports

HP white paper image
Extreme Storage Knowledge Center
Sponsored by HP

Visit HP’s Extreme Storage Knowledge Center to find informative, complimentary white papers, case studies, videos, product information and more.  Brief overview of topics:

  • The challenges of unstructured storage and how to manage both cost-effectively and efficiently
  • Company case studies of data storage challenges that translate across pharmaceutical and biotech companies today
  • Systems that manage vast amounts of data with simple deployment, unified management, and extreme scalability at an exceptionally low price per terabyte
  • Life sciences data management; viable solutions for small and large companies to manage growing storage demands
  • Take our virtual product tour and see our storage unit from inside out


Coupa white paper 92
10 Secrets to Recession-Proof Your Business
Sponsored by Coupa


Read this white paper to discover 10 strategies smart companies deploy to recession-proof their business.
Leaders generally face hard choices on how to mange a company during an economic downturn and
behave in one of three ways:
1) “The ostrich” - Preserve the status quo/hope for the best
2) “The bull in the china shop” - Blindly cut expenses across the board
3) “The fox” - Use the downturn to make your business more effective and position it for future growth

Learn how to behave “like a fox” and use a recession as a means to pounce on emerging trends.



SGI BriefingON image
High-Performance Computing in Life Science & Education
Sponsored by SGI and Intel
The varied collection of Bio-IT World articles and insights assembled in this BriefingON examine key trends in HPC infrastructure and how researchers are putting their best computational resources to use. Provided here are stories and lessons around the effective use of high performance computing in life science. Download the BriefingON.


Life Science Webcasts & Podcasts

Medidata Solutions

Rising Clinical Trial Delays and Costs - Addressing the Cause, Not the Symptoms 

medidata podcastProtocol complexity is taking a toll on clinical study speed and efficiency: increasingly complicated and ambitious protocols are not only burdening sites and study volunteers but are also prolonging trials and increasing expenses. In response, sponsors have turned to global study placement, restructured site relationships and new site management practices, but the problem remains.

This podcast will discuss:

  • Why these responses address only the symptoms, not the underlying cause, of rising clinical trial delays and costs.
  • Results of a recent joint Tufts University / Medidata Solutions study.
  • New metrics benchmarking protocol design trends.
  • Systematic protocol design improvements and why they are essential to clinical trial performance excellence.

Speakers: Ken Getz, Senior Research Fellow at the Tufts Center for the Study of Drug Development, and Ed Seguine, General Manager, Trial Planning Solutions at Medidata.

Download Now 



More Podcasts

Job Openings

Manager, Scientific Computing & Programming
Lead SAIC-Frederick, Inc.’s Bioinformatics & Analysis Group in developing & maintaining informatics pipelines for generation/analysis of dense genotyping & next-generation sequencing data. Required:  MS or equiv.  5 yrs related experience.  Knowledge of programming/software development, high performance computing, bioinformatics, project management. Visit www.saic-frederick.com - #130019.





For reprints and/or copyright permission, please contact The YGS Group, 1808 Colonial Village Lane, Lancaster, PA;

(717) 399-1900 ext. 125, or via email to Ashley.Zander@theYGSgroup.com.