August 8, 2007 | It’s not getting the data but making sense of it that is the hard part in genomics. “One of the big challenges is combining data sets, such as metabolomic and gene expression data. You have to do a lot of manual manipulation,” says Alan Higgins, senior director of Translational Medicine at Cogenics (formerly Icoria), a division of Clinical Data (CLDA). Then there’s the complexity of adding information from external sources, typically in different formats.
These challenges have been dogging genomics researchers for years, with the only recourse for biologists being either to learn how to do all this data handling themselves, or lean on their IT guys for it.
That’s one reason CLDA’s Cogenics unit teamed up with IO Informatics on an $11.7 million, five-year, Advanced Technology Program grant, funded in 2002 by the National Institute of Science and Technology. CLDA was searching multiple data types to find biomarkers that predict disease or response to therapy. IO informatics brought something new to the table — intelligent multidimensional object (IMO) database records.
IMOs are based on the same principles underlying the Semantic Web (see “Masters of the Semantic Web,” Bio•IT World, Oct. 2005) Like PDFs, IMOs are portable and can be easily shared. Unlike PDFs, however, IMOs are created so that specific data types are turned into freeform relational objects: Within the platform, these discrete data types are still distinguished from each other, but now they can also be manipulated, integrated, and compared. Users can thus work with specific data within records as well as pass the whole record easily between them.
CLDA’s researchers became collaborators and beta testers for a new IO Informatics software platform, providing queries and other input to the product’s development. The result is Sentient, which lets researchers, “look at all the data related to their field of interest, all at the same time, all in the same place, and regardless of type of information or where the data is located,” says IO Informatics CEO Robert Stanley. In short, people can ask complex research questions in a “Google-like” environment.
The platform was built to accommodate the breadth of data types that constitute the field of systems biology. IO Informatics has also added features to assist scientists in tackling various forms of analysis. Regardless of whether the data sit in a spreadsheet or a complex image, they can be easily moved, integrated, and analyzed. Data can be viewed through the Web Query — a browser that lets researchers peek at a variety of types of data — or the Knowledge Explorer, which lets them search and relate data.
For example, researchers can select an interesting dataset, then drill down to a finer level and integrate it with other data. Scientists can dart into different databases, while focusing on the genes, proteins or compounds of interest. Because of Sentient’s semantics approach, scientists can also, “easily fit data from their own systems silos into internal or published pathways, interaction, or other correlation networks,” says Stanley.
For Higgins and colleagues at CLDA, hunting for toxicity biomarkers related to alcohol and other chemicals, this means being able to combine data from metabolomic and gene expression studies with digitized histopathology images. “This software gives me the ability, for the first time, to ask more complex questions,” says Higgins. “If I am looking at an alcohol study, and seeing effects in liver and brain, I can now ask if that’s happening in other studies, what is common between rats and human, and what is common to acetaminophen and alcohol.”
Higgins concedes he could do similar things with other tools, “But it will take much longer and you have to do most of it manually. This is a key enabling tool.”
Some of CLDA’s work centers on biomarkers of liver disease — part of the National Institute of Environmental Health Science’s Compendium Study. It is doing gene expression, metabolomic, histopathology, blood and urine studies in animals for various chemicals at different doses and time points to correlate specific signs, such as lobular necrosis, to particular markers.
“One of the key things Sentient lets us do is to ask the exactly the same set of questions about different compounds,” says Higgins. Higgins reports that they are seeing some preliminary correlations of genomic and metabolomic data. In addition, certain common features are starting to emerge. “Oxidative stress is clearly important in a variety of organ toxicities,” says Higgins. “And our data [are] bearing this out.”
Subscribe to Bio-IT World magazine.