Ted Slater’s Semantic Technologies



Loading...

The semantic web doesn’t exist, but Pfizer’s Slater believes semantic technologies are paving the way.

By Kevin Davies and Phillips Kuhl

March 24, 2009 | The theoretical benefits of the semantic web for life sciences have been debated for a few years now (see, “Masters of the Semantic Web, Bio•IT World, October 2005), but practical examples within pharma remain scarce. Pfizer’s Ted Slater is an interesting exception. Slater heads a small group of four informatics scientists in St. Louis called the Indications and Pathways Center of Emphasis (IPCoE), which supports Pfizer research efforts in identifying and validating inflammation targets.

Slater trained as a molecular biologist, but made the mistake of buying his first computer shortly after starting his Ph.D. at UC Riverside. “72 straight hours later, I realized that I’d made a vocational error and maybe I wanted to learn more about computer science instead.” He took a Master’s in computer science, and went on to work for a string of genomics companies in the ’90s, including Sequana Therapeutics, GCG, and Paradigm Genetics. He was the founding vice president for knowledge engineering at Genstruct, joining Pfizer in 2004.

Slater tends not to use the term “semantic web”. “There just isn’t a semantic web,” he says candidly. “It doesn’t mean there won’t be one in the future, but there isn’t one now.” He prefers “semantic technologies,” so as not to provoke critics who would otherwise argue that he should move on to something else.

At CHI’s Bridging Pharma & IT conference last October*, Slater outlined a project focused on pathway data analysis that showed how structuring data in a semantic network could provide substantial benefits over traditional pathways solutions, such as automated hypothesis generation. The effort has gone from idea to reality over the past 6 months.

Using semantic technologies in this way helps to eliminate long-standing problems in informatics like data silos, where information is not interoperable with other necessary information, and data tombs, which simply make information very difficult to retrieve. Semantic technologies shift the focus from collecting information and making it safe to actually using the information in its proper context to solve research problems. “The computer should be a way of enhancing our own natural ability to reason, in the same way a bicycle enhances our ability to move ourselves around,” says Slater.

Be Reasonable

Ideally, Slater says you would like users to be able to reason with the data in silico: using “if-then rules,” let the computer generate a hypothesis, then let the scientist decide the potential implications and leverage knowledge to test whether they are supported with data.

“We constantly hear that the Holy Grail is complete data integration,” says Slater. “I have bad news—it will never happen! Users are able to set up and start building new, independent repositories of data faster than we can integrate existing data. You will never be able to get it all in one place where it is integrated and useable. The goal instead should be data that are interoperable, even if they are not integrated.”

Slater’s group helps scientists to study gene expression and signaling pathways in order to identify alternative indications for drugs in development. “There is no easy way to understand what is going on if you look at a list of 1,000 genes that are significantly up- or down-regulated,” he says. Even if those genes are mapped onto pathways using a commercial pathways tool, one is forced to work with what amounts to a reference tool. Much information is available on individual genes, but you have to try to tell a story about physiology by painstakingly going through the annotation for each gene one at a time.

An alternative approach lets the computer generate hypotheses based on available data, which would distill the range of possibilities to a few key relationships. If the data are represented correctly, you can use data from disparate databases, such that users can create ‘boutique’ knowledge bases for their own needs and easily link them together.

Adapting the familiar “triple” semantic RDF format—representing information as a subject, predicate, and object—Slater represents the data as a mathematical graph, with subject and object as nodes and the predicate (the relationship between them) as an edge. One triple’s subject can be another triple’s object, and so on, until a very large graph of everything known in some domain is created. In this format, the information can be handled with software to build inferences and test hypotheses. His group uses open source ontology development tools to build OWL ontologies and another open-source tool, Cytoscape, to view the data in graph format. For persistent storage, knowledge graphs can be managed in Oracle’s built-in RDF data model.

PEKE Performance

One of the goals in data analysis, says Slater, is to use heuristics over the knowledge bases to tell a story. Semantic representations of knowledge allow you to apply expert reasoning to experimental data, which may help explain a particular outcome and in turn suggest a testable hypothesis. “You don’t get inferences in a traditional structured database,” says Slater. “We have our share of traditional databases, and we are getting better at data warehousing. For many scientific applications, representing the data as an RDF graph and building for interoperability make the information much more usable. If the description of your problem solution ends with, ‘and then the user can query it,’ then you haven’t thought it through enough.” How you structure the information can either lock up the information in a data tomb or set it free.

The experimental system that Slater and his group have developed is called the “Pfizer Environment for Knowledge Engineering”, or PEKE. “PEKE handles all of the usual storage and querying capacities of traditional databases, but because of its architecture it has some surprising emergent properties,” says Slater. Among these are the ability to create, with just a couple of mouse clicks, new knowledge bases that essentially automatically interoperate with other PEKE knowledge bases.

Another capability of PEKE is that, because the semantics of each knowledge base are explicit in its OWL ontology rather than implicit in a relational database schema, PEKE supports knowledge bases containing any kind of knowledge with no changes to the architecture. While most PEKE knowledge bases are currently about molecular pathways, the ontology Slater uses to demonstrate how easy it is to create PEKE knowledge bases is the OWL pizza ontology from Stanford’s Protégé Team.

Slater says, “PEKE is world-class stuff. We think we can now build knowledge bases faster and cheaper than anyone else in the industry, and do much more with them once they’re built.” We may still be waiting on the semantic web, but semantic technologies are already paving the way for the next wave of informatics innovation. 


This article also appeared in the March-April 2009 issue of Bio-IT World Magazine.
Subscriptions are free for qualifying individuals. Apply today.

Click here to login and leave a comment.  

0 Comments

Add Comment

Text Only 2000 character limit

Page 1 of 1

White Papers & Special Reports

Quantum
StorNext 4.0: Technical Product Brief
Sponsored by Quantum

 
Proven in the world’s most data intensive industries, Quantum StorNext is a scalable, high-performance file system which allows data sharing across Linux, Mac, Unix, and Windows operating systems and manages data in enterprise storage environments. In this Technical Brief you'll learn:

  • How a high-performing file system can accelerate your business
  • How to simplify your data management
  • How a tiered storage approach can save you money


SURETY-IP_WPx108
Protect Your Scientific Intellectual Property: Proof of Lab Informatics Data Authenticity is Your Best Legal Defense
Sponsored by Surety, LLC

As a bio-technology or life sciences organization, your formulas, treatments and research and discoveries are the “lifeblood” of your business. But if you aren't protecting the integrity of your scientific data in your lab informatics systems, you risk losing IP ownership, revenue and consequently your business if you can't prove time-of-creation and data authenticity. Learn how you can implement simple, cost-effective and automated controls to protect your scientific intellectual property. Consider:

  • IP protection requirements in bio-pharma and other science-oriented industries can extend out 20, 30, 40 or more years
  • Most electronic lab management solutions include generic authenticity controls, so how "legally defensible" is yours?
  • Only standards-compliant, independent controls can future-proof your approach to long-term IP integrity protection and authenticity.
  • Learn more - get the free whitepaper now


BlueArc_WP_DataMigration.jpg
The Key to Life Sciences Data Management: Transparent Migration
Sponsored by BlueArc

Life sciences organizations face new data management challenges as the volume of research data grows and more data is kept online for longer times. Read this paper to learn about:

  • The benefits of transparent data migration (TDM)
  • How TDM technologies can simplify data management.
  • How using TDM can help increase storage utilization, improve computational workflow performance, and optimize the use of storage resources.


Life Science Webcasts & Podcasts

adobe_i3_btn_webinarNext-Generation Clinical Trial and Data Management Applications
Sponsored by Adobe

This webinar introduces i3Cube - a web-based, fully integrated, clinical trial and data management system built on Adobe’s LiveCycle® Enterprise Suite.  I3 cube provides end-to-end automation that delivers unprecedented visibility into information that sponsors need to accelerate the study process and complete trials efficiently. Viewers will learn more about:

  • Creating faster and more efficient trial processes
  • Reducing investigator burden 
  • Real-time sponsor transparency into study information
  • Enterprise solutions based on Adobe LiveCycle® ES utilizing cross-platform clients of Reader, Flash and AIR

    Download now.



More Podcasts

Job Openings

Employers -- Don't miss this opportunity to reach well-qualified life science candidates.

Loading...

For reprints and/or copyright permission, please contact The YGS Group, 3650 West Market Street, York, PA;

(717) 505-9701 ext. 125, or via email to Ashley.Zander@theYGSgroup.com.