Ted Slater’s Semantic Technologies



The semantic web doesn’t exist, but Pfizer’s Slater believes semantic technologies are paving the way.

By Kevin Davies and Phillips Kuhl

March 24, 2009 | The theoretical benefits of the semantic web for life sciences have been debated for a few years now (see, “Masters of the Semantic Web, Bio•IT World, October 2005), but practical examples within pharma remain scarce. Pfizer’s Ted Slater is an interesting exception. Slater heads a small group of four informatics scientists in St. Louis called the Indications and Pathways Center of Emphasis (IPCoE), which supports Pfizer research efforts in identifying and validating inflammation targets.

Slater trained as a molecular biologist, but made the mistake of buying his first computer shortly after starting his Ph.D. at UC Riverside. “72 straight hours later, I realized that I’d made a vocational error and maybe I wanted to learn more about computer science instead.” He took a Master’s in computer science, and went on to work for a string of genomics companies in the ’90s, including Sequana Therapeutics, GCG, and Paradigm Genetics. He was the founding vice president for knowledge engineering at Genstruct, joining Pfizer in 2004.

Slater tends not to use the term “semantic web”. “There just isn’t a semantic web,” he says candidly. “It doesn’t mean there won’t be one in the future, but there isn’t one now.” He prefers “semantic technologies,” so as not to provoke critics who would otherwise argue that he should move on to something else.

At CHI’s Bridging Pharma & IT conference last October*, Slater outlined a project focused on pathway data analysis that showed how structuring data in a semantic network could provide substantial benefits over traditional pathways solutions, such as automated hypothesis generation. The effort has gone from idea to reality over the past 6 months.

Using semantic technologies in this way helps to eliminate long-standing problems in informatics like data silos, where information is not interoperable with other necessary information, and data tombs, which simply make information very difficult to retrieve. Semantic technologies shift the focus from collecting information and making it safe to actually using the information in its proper context to solve research problems. “The computer should be a way of enhancing our own natural ability to reason, in the same way a bicycle enhances our ability to move ourselves around,” says Slater.

Be Reasonable

Ideally, Slater says you would like users to be able to reason with the data in silico: using “if-then rules,” let the computer generate a hypothesis, then let the scientist decide the potential implications and leverage knowledge to test whether they are supported with data.

“We constantly hear that the Holy Grail is complete data integration,” says Slater. “I have bad news—it will never happen! Users are able to set up and start building new, independent repositories of data faster than we can integrate existing data. You will never be able to get it all in one place where it is integrated and useable. The goal instead should be data that are interoperable, even if they are not integrated.”

Slater’s group helps scientists to study gene expression and signaling pathways in order to identify alternative indications for drugs in development. “There is no easy way to understand what is going on if you look at a list of 1,000 genes that are significantly up- or down-regulated,” he says. Even if those genes are mapped onto pathways using a commercial pathways tool, one is forced to work with what amounts to a reference tool. Much information is available on individual genes, but you have to try to tell a story about physiology by painstakingly going through the annotation for each gene one at a time.

An alternative approach lets the computer generate hypotheses based on available data, which would distill the range of possibilities to a few key relationships. If the data are represented correctly, you can use data from disparate databases, such that users can create ‘boutique’ knowledge bases for their own needs and easily link them together.

Adapting the familiar “triple” semantic RDF format—representing information as a subject, predicate, and object—Slater represents the data as a mathematical graph, with subject and object as nodes and the predicate (the relationship between them) as an edge. One triple’s subject can be another triple’s object, and so on, until a very large graph of everything known in some domain is created. In this format, the information can be handled with software to build inferences and test hypotheses. His group uses open source ontology development tools to build OWL ontologies and another open-source tool, Cytoscape, to view the data in graph format. For persistent storage, knowledge graphs can be managed in Oracle’s built-in RDF data model.

PEKE Performance

One of the goals in data analysis, says Slater, is to use heuristics over the knowledge bases to tell a story. Semantic representations of knowledge allow you to apply expert reasoning to experimental data, which may help explain a particular outcome and in turn suggest a testable hypothesis. “You don’t get inferences in a traditional structured database,” says Slater. “We have our share of traditional databases, and we are getting better at data warehousing. For many scientific applications, representing the data as an RDF graph and building for interoperability make the information much more usable. If the description of your problem solution ends with, ‘and then the user can query it,’ then you haven’t thought it through enough.” How you structure the information can either lock up the information in a data tomb or set it free.

The experimental system that Slater and his group have developed is called the “Pfizer Environment for Knowledge Engineering”, or PEKE. “PEKE handles all of the usual storage and querying capacities of traditional databases, but because of its architecture it has some surprising emergent properties,” says Slater. Among these are the ability to create, with just a couple of mouse clicks, new knowledge bases that essentially automatically interoperate with other PEKE knowledge bases.

Another capability of PEKE is that, because the semantics of each knowledge base are explicit in its OWL ontology rather than implicit in a relational database schema, PEKE supports knowledge bases containing any kind of knowledge with no changes to the architecture. While most PEKE knowledge bases are currently about molecular pathways, the ontology Slater uses to demonstrate how easy it is to create PEKE knowledge bases is the OWL pizza ontology from Stanford’s Protégé Team.

Slater says, “PEKE is world-class stuff. We think we can now build knowledge bases faster and cheaper than anyone else in the industry, and do much more with them once they’re built.” We may still be waiting on the semantic web, but semantic technologies are already paving the way for the next wave of informatics innovation. 


This article also appeared in the March-April 2009 issue of Bio-IT World Magazine.
Subscriptions are free for qualifying individuals. Apply today.

Click here to login and leave a comment.  

1 Comments

  • Avatar

    I wonder if any really valuabe examples of using ontologies can be demonstrated. By the way, if you'd like to see how to create ontology in Protege just ask me how. Probably it would be even cheaper than Slater's demonstration. But still, I don't know what a business value it has and how to use it. Ontologies (dont' mess it with semantic technologies in the whole) is a big myth which supported by people who either don't understand what it is or just would you pay them money...

Add Comment

Text Only 2000 character limit

Page 1 of 1



White Papers & Special Reports

sgi whp 2
Managing the Modern Genomics Data Flood
Sponsored by SGI

Managing and storing the perfect storm of multi-disciplined data pouring from next generation sequencers and other omics instruments is a central challenge in life sciences. Discover in this paper how the SGI ArcFiniti storage solution, optimized for unstructured genomics and life sciences data can: 

  • Reduce costs, proactively protect data integrity, and deliver the high performance I/O required for genomics data processing and analysis.  
  • Effectively manage capacities from 156TB to 1.4PB as a disk based, integrated hardware and software platform 


sgi - whp 1
Turning Genomics Data into Practical Insight
Sponsored by SGI

With worldwide sequencing capacity approaching 13 quadrillion DNA bases annually turning genomics data into knowledge is a true computational challenge. Read this paper and learn how the SGI UV coherent shared memory platform can:  

  • Speed results time while cost competitively tackling the most difficult computational problems across all omics disciplines. 
  • Push performance by scaling to extraordinary levels, up to 256 sockets (2,560 cores, 4,096 threads) per single system (one OS image). 

Provide support for up to 16TB of coherent shared memory in a single system image enabling extreme efficiency across a wide range of compute demands. 



accerlys-logo_2012_wh
New Complimentary Market Survey…
Collaborations and Communications Within Drug Discovery Research
Sponsored by Accelrys
This survey was conducted by the Cambridge Healthtech Media Group in January, 2012. It was sponsored by Accelrys related to their HEOS initiative to gather valid information around externalizing collaborative research while improving communications in the cloud. With 310 qualified industry respondents the survey findings reveal useful usage and trends patterns.  An insightful follow-on discussion and webinar related to this survey, and the HEOS by Scynexis SaaS portal is also available on the Bio-IT World website for complementary viewing.
 


Job Openings

tessella logo 
Scientific Software Engineer
Boston MA
$70,000 to $95,000
 

Tessella delivers software engineering and consulting services to leading pharmaceutical and biotech companies. We are recruiting Software Engineersto work with skilled bioinformaticians and scientists to identify business needs and recommend and develop technical solutions. Applicants require BS, MS or PhD in bioinformatics, biology or chemistry and 2+ years of software development in either: Java, C#, C++, C or VB.NET. 

Apply at http://jobs.tessella.com   

 

oxford nanopore logo 


 Early Access Collaborations Managers
Oxford Nanopore Technologies is developing a novel technology, GridIONTM for the direct, electronic analysis of DNA/RNA and other analytes.  As the system approaches the market, we are building a team of technically knowledgeable, highly motivated candidates with excellent customer service and facilitation skills to join our company as Collaboration Managers.  This is a unique opportunity to work with world-leading genomics customers throughout the early adoption phase of a new generation of DNA sequencing technology.. This is a facilitative, enabling role with responsibility for managing technology development collaborations with key customers at leading genomics institutions.  It will include long term management of the collaboration plan and milestones and associated meetings and documentation. Click here to find out more and apply   

Oxford Nanopore's GridION technology, VP, Sales and Marketing Oxford Nanopore Technologies is a fast-moving technology company that is developing a novel electronic molecular analysis technology. The technology is adaptable for the analysis of DNA/RNA, proteins, chemicals and other molecules.  It is therefore suitable for use in a variety of markets including scientific research and clinical applications.  As the technology approaches the market, Oxford Nanopore is seeking a visionary VP of sales and marketing to join the senior team.  The candidate will embrace the opportunities afforded by entering the market with a truly disruptive technology that has the potential to expand the number of users and the variety of applications in each target market.  This is a rare opportunity to influence the commercial strategy at an early phase of its commercial lifetime, in a well funded company.  Oxford Nanopore welcomes applications from candidates with a track record of high-level strategic commercial  leadership, who wish to apply a fresh approach to existing markets.  Experience in Life Sciences/DNA sequencing is central to this role, however we will consider your application if you have experience of disruptive technologies in other related industries.  We are particularly interested in candidates with strong expertise in the use of digital technologies for sales and marketing of scientific/technical products.  Click to  Apply  


 

For reprints and/or copyright permission, please contact  Tim McLucas, (781) 972-1342, tmclucas@healthtech.com .