By John Russell
July 9, 2008 | Turning semantic web technology into practical applications that enhance biomedical research was a driving force behind Robert Stanley’s and Erich Gombocz’s collaboration long before they founded IO Informatics in 2003. The flood of data and new data types gushing from new instruments and the swelling mass of scientific literature convinced them early on that effective data management and data integration were critical for productive data analysis.
Of course they were right, and they weren’t alone. Many recognized the challenge and opportunity. But they were among the few who adopted semantic web ideas early and worked steadily to incorporate them deeply into a scientific data management platform: Sentient Suite. Semantic software applications use the “RDF” (Resource Description Framework) data model to enhance life science researchers’ ability to understand how data fit together in complex biological systems.
“We were looking 10 years ago, right when RDF was starting to come out, at how we could solve these general problems and came up with our own semantic data model,” says Stanley. “My background is in epistemology and ontology (University of Chicago), which are core underpinnings of semantics. We were thrilled when [Sir Tim] Berners-Lee and the rest of the W3C (World Wide Web Consortium) gang started pushing this semantic data model because we knew we were right on target with our ideas.”
It’s a decade since Stanley (CEO) and Gombocz (CSO) met and began collaborating. “We were both working with other companies,” says Stanley. “I had put together a business plan and was looking for some business help, interviewing potential CEOs; one said, ‘I’d consider coming and working with you but you’ve got let me bring Erich.’” The two hit it off immediately. Jettisoning the would-be CEO, Stanley and Gombocz began working together and founded Biosentients in 2000 to hold the intellectual property (IP) they were developing. In 2003, the team offered the IP, business plan, and emerging software to the investment community; a funding round was closed; and Biosentients was transformed into IO Informatics. Stanley was at first the CTO of the company and then elevated to CEO in April of 2007—guess he sharpened those business chops.
Today, IO Informatics has a staff of 20 that’s growing rapidly, with offices in the U.S., Canada, and Europe. It also has roughly 30 customers and “hundreds of users”—these aren’t eye-popping numbers but are indicative of strengthening market traction, particularly given that aggressive sales efforts didn’t start until late 2007. In broad terms, the Sentient Suite is comprised of five components: a Data Manager, a Process Manager, Web Query, Knowledge Explorer, and Image Interactor.
The Sentient Data Manager module is the heart of the suite. It automates data loading and acts as a data repository. The Process Manager can be thought of as a widely generalizable LIMS system with workflow capability. IO Informatics’ patented intelligent multi-dimensional object technology powers both. “These basic features turn data from any source into web-accessible, searchable, annotatable, HIPAA-compliant items and allow end-users to pick data subsets and identify data networks from them,” says Stanley.
The various query and analytics tools are tightly integrated with the Data Manager and Process Manager. Stanley says the company has exceptional image-handling expertise and, indeed, before creation of the Sentient Suite, he and Gombocz sold some image analysis software to Berlex and Washington University, St. Louis.
“The Web Query and Knowledge Explorer then make this data easily and usefully available in various formats, to a broad set of users,” says Stanley. “Virtually any researcher or manager derives both knowledge and efficiency benefits by using the Web Query to search and browse data and dashboards derived from federated data and to export results to their own applications (such as Partek or Spotfire) in ad hoc or automated workflows. Informaticians and other end-users apply the Knowledge Explorer to visualize integrated networks and run semantic queries. These queries are capable of characterizing a potential drug’s activities, stratifying tumor types based on multidimensional biomarker activity, etc.—using SPARQL and more advanced semantic query languages.”
Much of what sets IO Informatics apart, says Stanley, is its clever use of semantic technology to capture data from virtually any source—instruments or literature, public and proprietary databases—and recast the data into consistent ontologies that make it very easy for scientists to query and integrate diverse data and to analyze them together. “Think of it as creating targeted knowledgebases that can be broad or narrow, depending on the question being asked,” says Stanley, who uses biomarker research to make his point.
“Single-marker based research is running into a lot of challenges. Now, there are also challenges with multi-marker research, but IO Informatics makes it very easy to bring in all of the values from different data sources and networks and really see how they’re functionally connected,” he says. “In other words, how the proteins, the genes, the metabolites, etc., fit together to create a profile that describes the behavior of a molecule or the progression of a disease. You also can track data across time. Sentient makes it very efficient for researchers to create a signature that really tells the difference between a successful compound or a failing compound, or a tumor type.”
IO Informatics also tapped into industry knowledge. Last July the company assembled a small, but influential working group to focus on semantic applications for hypothesis generation in translational research. Members included Pat Hurban, executive director, scientific affairs, Cogenics, Inc.; Alan Higgins, VP preclinical development, Viamet; Jonas Almeida, professor of bioinformatics, MD Anderson Cancer Center; Ted Slater, associate director, knowledge management informatics, Pfizer’s Indications and Pathways Center; Mark Wilkinson, assistant professor, medical genetics and bioinformatics at the James Hogg iCAPTURE Center; and Bruce McManus, director of the iCAPTURE.
Some were customers, others were not. IO Informatics’ Gombocz served as the chair. The group, which relied on web meetings every few weeks, tackled topics such as the direct use of RDF, OWL and N3 data sources, all of which had already been combined with the “drag-to-knowledgebase” capabilities of the Sentient Knowledge Explorer. Each organization had specific projects.
MD Anderson was interested in stratifying patient populations according to tumor type and suggested treatments. It worked on ways to assemble a variety of biomarkers—molecular profiles, images, time rouse data—that could be used in stratifying patients. iCAPTURE wanted to be able to distinguish between individuals who were good candidates for transplants from those with poor prospects. Doing so required assembling and interpreting a panel of diverse biomarkers, including molecular and image data. Pfizer’s interest was more around compound profiling.
Some of the resulting capabilities found their way back into Sentient.
“Mark Wilkinson has done a lot of work with semantic software from iCAPTURE. Jonas Almeida is one of the world leaders in applying semantic technology. They were giving us feedback on the nuts and bolts,” says Stanley. “You really want to have a thesaurus, not only for classes and terms, but also for ontologies and relationships. So, this was an area where we went ahead and applied that. We put in a thesaurus that supports merging ontologies and mapping different relationships across ontologies. That’s a nice example.”
“It’s also really nice for us to hear what people can say appropriately about competitive analysis. I mean these guys are really looking at what else is out there and within their rights, they’ll tell us what they know about other software. That’s useful for us, and has also been very encouraging. People think they can do great competitive analysis, but it’s hard to do. You really need to have people using the software, the competing software,” says Stanley.
Clearly, other scientific data management platforms and providers are available. InforSense comes to mind. Teranode is another which was also an early provider of semantic web technology. There is even overlap with capabilities offered by pathway tools providers, such as GeneGo and Ingenuity Systems. Stanley agrees, but says the IO Informatics platform can work well with many of these third-party applications
IO Informatics’ life science working group is currently wrapping up its work, but the exercise proved so valuable that IO Informatics preparing to assemble a similar healthcare-oriented working group. “I think we can go into healthcare with healthcare data management. Particularly we’re going to have great strengths for decision support of diagnostics and biomarkers. We’re going to be creating knowledgebases with signatures that are going to be useful for physicians and doctors.”
It will be interesting to check back in a year to see the fruits of IO Informatics’ latest working group.
Stanley on Ontologies vs. Schemas
“Ontologies actually describe how you think something works, so you’re really talking about how you think things fit together in the real world, whereas a schema organizes data for applications. The nice thing about semantic technology with ontologies is that you have the option of using either formally reviewed ontologies from standards bodies such as W3C, GO (Gene Ontology Consortium) or Stanford’s NCBO (National Center for Biomedical Ontology), or you can easily create your own local, less tightly controlled ‘folk’ ontology for dedicated purposes, by adding new classes, adding new relationships, changing them, or merging them.
“We greatly appreciate and effectively use all of the great work that a lot of smart people have been putting into defining ontologies. It’s great for everyone that they’re publicly available and that our customers are able to refine and apply them to their best advantage. It’s how we create the networks.
“There’s a big argument about whether you have to have a perfect ontology or can or should you do things with smaller sub-ontologies or folk ontologies. We’ve learned through our experience that we can do useful things with both—with formal ontologies or with targeted sub-ontologies. Our customers are welcome to share these with their partners or with formal standards bodies. This is how learning happens! The more people who work on and with ontologies, the better. We’re helping pharma companies solve problems. We’re glad that there are people out there doing work on ontologies, and we’re taking advantage of that for customers.” – Robert Stanley, IO Informatics co-founder and CEO.
Spearheading the Semantic Web
Masters of the Semantic Web