Nov. 15, 2006 | SAN FRANCISCO — Among the star attractions of the 2006 Oracle OpenWorld conference* for its 42,000 attendees — aside from sessions on product lines including Oracle E-Business Suite, Oracle technology, Oracle Fusion Middleware, PeopleSoft Enterprise, and more — was a concert by Sir Elton John. Not far behind was a session in which pharma executives offered evidence that some, at least, are using the semantic web to work smarter and lower drug development costs.
The session — “Semantic Data Integration in Life Sciences” — was chaired by Susie Stephens, principal product manager at Oracle. Stephens said the semantic web can help life sciences companies integrate what they know and how they know it. Traditional approaches to knowledge management involved standardization of terms, but the focus is less on a priori standardization, as defining a proper schema for knowledge can be tricky. Rather than waste time and energy on defining what to say, Stephens said, the underlying approach is focusing on how to say things, emphasizing data sharing, explicit and ad hoc information.
The semantic web can integrate heterogeneous data by using explicit semantics to make data shareable and available. Oracle is working with the Worldwide Web Consortium (W3C) standard for data format. The Resource Description Framework (RDF) consists of triple nodes — a “node-link-node” structure — to convey terms, each of which has its own URI. RDF-S provides support for vocabularies. One can merge data since each component of the triple has a unique identifier.
Stephens said the Oracle RDF data model provides support for RDF and RDF-S. While triples will be stored as an informational table, users can interact with them as with an object. Links represent complete RDF triples. A table function allows graph queries to be embedded in an SQL query, enabling searches for arbitrary patterns against RDF data, meaning researchers can do queries against RDF data, but also do inferencing based on RDF, RDF-S and rules defined by the user. Oracle does performance testing with UniProt.
Putting the Web to Work
“The semantic web is a very exciting technology to companies like Pfizer,” said Giles Day, site head for research informatics at Pfizer’s Research Technology Center, in Cambridge, MA. “Pfizer has been through some extraordinary growth, but to maintain growth we will have to change the way we operate.” Day said Pfizer was working to reduce attrition, noting that company scientists must trawl through 85 candidates for each successful drug.
“That type of investment is not sustainable,” Day said. Pfizer continues to acquire companies at a “phenomenal” rate, turning to smaller biotechs such as Rinat Pharmaceuticals in South San Francisco, while also outsourcing to more CROs and collaborating with remote chemistry groups, especially in China. “That provides a challenge in securely passing information,” Day said.
Day said the significance of the semantic web is expanding because of the growth in the scale and complexity of data. But research programs build complex data at high speed, data that is hard to integrate across an organization. “[Pfizer] is a global operation, the sun never sets on what we do,” said Day. Pfizer builds huge data warehouses and has lots of silo data sets. Their intent is to start breaking down these divisional boundaries, to enable researchers to better communicate and make decisions.
Day cited a study of an unnamed new medicine in first human trials. “We might see unusual events,” Day said. “Blood pressure might be dropping, we might see something in brain pathology. Are these events linked to another biological pathway? Can this jumpstart a whole new research program that could discover new indications?” Currently events are monitored by hand with doctors using different terms to describe the same phenomena. Having a semantic layer could make data accessible to researchers throughout the company.
Day issued an important caveat; be careful about what inferences one derives from technologies. You can look up vampires, for example, and find that they are hematophagic and you can stab them with wooden stakes. But you might not see anywhere that they actually don’t exist. In laying ontology layers on top of data, one wants to have confidence that information is good and inferences are valid. But with say, 30,000 objects, one can get “a great big hairball,” Day pointed out.
Eli Lilly’s Patrick Hartman, team leader of discovery informatics, said Lilly has chosen the RDF approach “because pharma has a dilemma.” To get a drug from bench to market averages 5,000 screened compounds, 15 years and $1 billion — and that doesn’t even include the competition once to market.
Lilly wants to reduce the development attrition curve by cutting risks earlier in the pipeline. Informatics can be used to identify and validate promising targets. Starting with a therapeutic class and disease state, what are the biological pathways? How good is the target? Unfortunately, sources of data on pharmacology, druggability, ligands and toxicology are heterogeneous and may lack sufficient statistical power to draw real conclusions. “We tried data warehousing but it’s too expensive,” Hartman said. “How do we federate?
Lilly uses RDF to relate ontologies to public data such as Entrez Genes. Lilly also uses a resource called Lingua Franca, comprising its Discovery Target Assessment Tool (TAT) that provides one stop access to integrated information for target assessment. TAT accesses key content bases including pathways, disease associations, competitive chemical entities, and detailed target analysis. Discovery TAT is built on the Lilly Science Grid (LSG), a single technical architecture for integration of plug-ins.
Hartman anticipates RDF enabling semantic description and comparisons of patients and cellomics data, as in semantically describing cellular localization with other properties, such as how cell size relates to gene expression, and relating gene exons to transcription. Hartman said Lilly is now serializing data to XML, taking a federated approach and leaving it in its original sources.
Both Lilly and Pfizer believe the semantic web will also aid in alternative drug indication discovery. RDF may help them better pick through their databases and researchers’ notebooks for promising compounds that might be otherwise left by the wayside. Both Day and Hartman are mum about results so far, but hint promise. l
Wendy Wolfson is a science and technology writer based in Oakland, CA.
*Oracle OpenWorld 2006; San Francisco, October 23-26, 2006.