Building a Google for Bioinformatics



By Malorye Allison

August 8, 2007 | It’s not getting the data but making sense of it that is the hard part in genomics. “One of the big challenges is combining data sets, such as metabolomic and gene expression data. You have to do a lot of manual manipulation,” says Alan Higgins, senior director of Translational Medicine at Cogenics (formerly Icoria), a division of Clinical Data (CLDA). Then there’s the complexity of adding information from external sources, typically in different formats.

These challenges have been dogging genomics researchers for years, with the only recourse for biologists being either to learn how to do all this data handling themselves, or lean on their IT guys for it.

That’s one reason CLDA’s Cogenics unit teamed up with IO Informatics on an $11.7 million, five-year, Advanced Technology Program grant, funded in 2002 by the National Institute of Science and Technology. CLDA was searching multiple data types to find biomarkers that predict disease or response to therapy. IO informatics brought something new to the table — intelligent multidimensional object (IMO) database records. 

IMOs are based on the same principles underlying the Semantic Web (see “Masters of the Semantic Web,Bio•IT World, Oct. 2005) Like PDFs, IMOs are portable and can be easily shared. Unlike PDFs, however, IMOs are created so that specific data types are turned into freeform relational objects: Within the platform, these discrete data types are still distinguished from each other, but now they can also be manipulated, integrated, and compared. Users can thus work with specific data within records as well as pass the whole record easily between them. 

CLDA’s researchers became collaborators and beta testers for a new IO Informatics software platform, providing queries and other input to the product’s development. The result is Sentient, which lets researchers, “look at all the data related to their field of interest, all at the same time, all in the same place, and regardless of type of information or where the data is located,” says IO Informatics CEO Robert Stanley. In short, people can ask complex research questions in a “Google-like” environment. 

Sentient Life
The platform was built to accommodate the breadth of data types that constitute the field of systems biology. IO Informatics has also added features to assist scientists in tackling various forms of analysis. Regardless of whether the data sit in a spreadsheet or a complex image, they can be easily moved, integrated, and analyzed. Data can be viewed through the Web Query — a browser that lets researchers peek at a variety of types of data — or the Knowledge Explorer, which lets them search and relate data. 

For example, researchers can select an interesting dataset, then drill down to a finer level and integrate it with other data. Scientists can dart into different databases, while focusing on the genes, proteins or compounds of interest. Because of Sentient’s semantics approach, scientists can also, “easily fit data from their own systems silos into internal or published pathways, interaction, or other correlation networks,” says Stanley.

For Higgins and colleagues at CLDA, hunting for toxicity biomarkers related to alcohol and other chemicals, this means being able to combine data from metabolomic and gene expression studies with digitized histopathology images. “This software gives me the ability, for the first time, to ask more complex questions,” says Higgins. “If I am looking at an alcohol study, and seeing effects in liver and brain, I can now ask if that’s happening in other studies, what is common between rats and human, and what is common to acetaminophen and alcohol.” 

Higgins concedes he could do similar things with other tools, “But it will take much longer and you have to do most of it manually. This is a key enabling tool.”  

Some of CLDA’s work centers on biomarkers of liver disease — part of the National Institute of Environmental Health Science’s Compendium Study. It is doing gene expression, metabolomic, histopathology, blood and urine studies in animals for various chemicals at different doses and time points to correlate specific signs, such as lobular necrosis, to particular markers.

“One of the key things Sentient lets us do is to ask the exactly the same set of questions about different compounds,” says Higgins. Higgins reports that they are seeing some preliminary correlations of genomic and metabolomic data. In addition, certain common features are starting to emerge. “Oxidative stress is clearly important in a variety of organ toxicities,” says Higgins. “And our data [are] bearing this out.”

Subscribe to Bio-IT World  magazine.

Click here to login and leave a comment.  

0 Comments

Add Comment

Text Only 2000 character limit

Page 1 of 1



White Papers & Special Reports

sgi whp 2
Managing the Modern Genomics Data Flood
Sponsored by SGI

Managing and storing the perfect storm of multi-disciplined data pouring from next generation sequencers and other omics instruments is a central challenge in life sciences. Discover in this paper how the SGI ArcFiniti storage solution, optimized for unstructured genomics and life sciences data can: 

  • Reduce costs, proactively protect data integrity, and deliver the high performance I/O required for genomics data processing and analysis.  
  • Effectively manage capacities from 156TB to 1.4PB as a disk based, integrated hardware and software platform 


sgi - whp 1
Turning Genomics Data into Practical Insight
Sponsored by SGI

With worldwide sequencing capacity approaching 13 quadrillion DNA bases annually turning genomics data into knowledge is a true computational challenge. Read this paper and learn how the SGI UV coherent shared memory platform can:  

  • Speed results time while cost competitively tackling the most difficult computational problems across all omics disciplines. 
  • Push performance by scaling to extraordinary levels, up to 256 sockets (2,560 cores, 4,096 threads) per single system (one OS image). 

Provide support for up to 16TB of coherent shared memory in a single system image enabling extreme efficiency across a wide range of compute demands. 



accerlys-logo_2012_wh
New Complimentary Market Survey…
Collaborations and Communications Within Drug Discovery Research
Sponsored by Accelrys
This survey was conducted by the Cambridge Healthtech Media Group in January, 2012. It was sponsored by Accelrys related to their HEOS initiative to gather valid information around externalizing collaborative research while improving communications in the cloud. With 310 qualified industry respondents the survey findings reveal useful usage and trends patterns.  An insightful follow-on discussion and webinar related to this survey, and the HEOS by Scynexis SaaS portal is also available on the Bio-IT World website for complementary viewing.
 


Job Openings

tessella logo 
Scientific Software Engineer
Boston MA
$70,000 to $95,000
 
Apply at http://jobs.tessella.com   

oxford nanopore logo 


Early Access Collaborations ManagersClick here to find out more and apply   

Oxford Nanopore's GridION technology, VP, Sales and Marketing Click to  Apply  

For reprints and/or copyright permission, please contact  Tim McLucas, (781) 972-1342, tmclucas@healthtech.com .