Masters of the Semantic Web

By Salvatore Salamone
Oct 17, 2005 | “I see a huge amount of energy from people in the life sciences getting excited about the Semantic Web and what it can do to solve the big IT problems... But also the people involved in the Semantic Web pushing it along are also very excited about getting involved in the life sciences — it’s one of those areas that affect humankind, finding drugs, curing AIDS and cancer, etc. There seems to be a huge energy, and lots of practical technical reasons why this area is crying out to be one of the flagship areas that the Semantic Web really takes off...”

                         — Sir Tim Berners-Lee, Bio•IT World Conference+Expo, May 2005

Anyone who struggles to unite data from disparate resources in their work has heard the story. A vendor or group comes along with a new idea that is going to make information extraction and data access, integration, and sharing easier — much easier.

Over the years, most organizations have experimented with varying approaches ranging from remote procedure calls, to distributed object-oriented data models, to Java-based Web tools and portals to tie together information from numerous sources and present it in a way researchers might actually find useful. Or, they have initiated grand knowledge management projects to search for and extract similar information from disparate data sources.

But alas, most efforts require obscene amounts of programming and often require major changes to — and investments in — IT infrastructures.

Proponents of a new technology called the Semantic Web think they have something that might do the trick. At the simplest level, the Semantic Web puts data into a machine-readable format so that computers can aggregate data and make inferences about relationships between certain types of data. It has applications across many life science arenas including R&D, clinical trials, translational medicine, and personalized medicine (see Semantic Web: Safety and Innovation, page 18).

It sounds too good to be true. So, why would anyone believe yet another proposal that claims to help make it easier to get more information from his or her data?

For starters, the principal evangelist of the Semantic Web has certain credentials to back up his claims. Sir Tim Berners-Lee invented the World Wide Web 15 years ago, while working at CERN. Berners-Lee now heads the World Wide Web Consortium (W3C) at MIT, which is trying to move Semantic Web technology forward.

Berners-Lee sees the Semantic Web as the Web’s next phase. “If I have to define it, which is tricky, then I have to say it’s data integration across application boundaries and organizational boundaries,” he said in his 2005 Bio•IT World Conference + Expo keynote address. “The Semantic Web is about looking at data in a Web-like way, a bottom-up way, it’s not top-down, and quite a lot of the ways of looking at the Semantic Web are different from the ways we’ve done it before. It’s not like object-oriented programming, and it’s not that easy to explain... believe it or not, it wasn’t that easy to explain the World Wide Web 15 years ago.”

Web Power
One way to illustrate the potential power of using the Semantic Web is to compare it to the current Web. As it exists today, the Web presents information that is easy for people to read. It presents text in natural languages (English, French, Chinese, etc.) and uses graphics, images, and videos. But while a person can process this information, computers cannot.

“Today, all tags and bookmarks are designed for humans to read,” says Eric Neumann, an independent consultant and former head of discovery informatics at Sanofi-Aventis. With the Semantic Web, the information on Web pages and in data repositories is machine-readable. This offers several major advantages, says Neumann. First, similar data and information can be aggregated. Second, you can do machine-readable queries. “You can start hubbing and aggregating data,” says Neumann.

Putting such capabilities to work offers some significant benefits. “The Semantic Web allows you to take better advantage of information on the Web,” says Dennis Quan, a researcher at IBM Thomas J. Watson Research Center.

Whereas Berners-Lee prefers the analogy of a London subway map (see Going Underground, page 31), Quan offers a different example to illustrate the advantages of the Semantic Web: “If you want to see a movie, you might go to a number of Web sites and get several reviews and other information [start times, theater location , etc.]. This is okay if you only go to a movie once in a while. But it is not practical if you do it many times.”

“In bioinformatics, this type of [operation] is done all the time,” Quan continues. “People repeatedly go to a number of sites and download information to conduct their work.” The process is labor intensive — it requires opening a new browser session with each site. Frequently data must be cut, reformatted, and pasted into another application for it to be really useful.

The Semantic Web approach greatly simplifies this process. “With the Web today, if you need data from 10 sites, you need to go to all 10 sites and cut and paste the data to get an integrated view,” says Matthew Shanahan, chief marketing officer at the life science experiment design automation software company Teranode. “Semantic Web pushes this job of assembling data out from the desktop into the network. With the Semantic Web, the network knows how to get and assemble the data.”

To that end, a Semantic Web browser can be configured to go to multiple sites, find the specific information required, retrieve this information, and display it in a single Semantic Web browser. In essence, this application of Semantic Web technology is akin to a next-generation portal.

Such capabilities make the Semantic Web very interesting to life science organizations — or at least some. “The advent of the Semantic Web is providing the life sciences community with the standards and tools needed to build integrative informatics systems,” says John Reynders, information officer, discovery & development informatics, at Eli Lilly. “We are very interested in the [Semantic Web standards] and view them as essential tools in cracking the heterogeneous data integration challenge facing our drug-hunters here at Lilly.”

Those standards Reynders refers to are the heart of the Semantic Web’s potential for improving data accessibility. The W3C has developed standards for:

  • Data description and identification — the Resource Description Framework (RDF)
  • An ontology language — Web Ontology Language (OWL)
  • A semantic Web Rule Language (SWRL)

Semantic Web uses these standards in conjunction with existing data formatting and tagging standards such as XML and the Life Science Identifier (LSID). The result is a way to describe data and the relationship between various data elements (see The W3C Perspective, page 32).

Oracle is leading the efforts of some vendors in supporting RDF. Oracle added RDF support to the 10g database this summer. “In discovery, RDF helps [researchers] aggregate public data with their own internally generated data,” says Susie Stephens, principal product manager, life sciences, at Oracle. As an application of the technology, she points to Siderean Software, which incorporated RDF into its Seamark Navigation Server and has been demonstrating the application to the life sciences at recent conferences.

A key factor of the Semantic Web is that elements in a data set — a protein, gene, drug, or an author’s name — are uniquely identified and there is some information about the relationship of that element to other elements.

Putting in information about relationships between elements is what differentiates semantic Web data from simply adding metadata to a database. In the Semantic Web, elements are defined in statements called Semantic Web triplets, which contain a subject, predicate, and object.

This triplet description can be used in many ways. For example, it can identify an element in a biological sense or with respect to its use to a company. One could construct triplets along the lines of “kinase” is a “kind of” “protein” or “kinase” is a “kind of” “drug target.” One might also link a data element to a diagram or model. For instance, “human hemoglobin” “has a 3-D structure” of “this” (where this is 3-D representation in the Protein Data Bank).

With this triplet format, some life science areas that previously were treated as very separate entities, requiring different database technologies and analysis and search tools, start to look similar from a pure data-handling perspective. And as such, the Semantic Web offers a way to view, analyze, and act on disparate data — something that is critical in new life sciences areas of research. “If we are going to do translational medicine, we need to build bridges,” says Neumann. “If you look at subjects in clinical trials and patients in therapy, these are two different business models,” says Neumann. “But basically, they’re the same thing — [people and their reaction to drugs].”

Web Warriors
Neumann, Quan, Stephens, and a handful of other forward thinkers are leading the charge on Semantic Web for the life sciences. Under the guise of the W3C’s Semantic Web for life sciences group, the group has developed BioDash — a Semantic Web prototype of a drug development dashboard that associates disease, drug progression stages, molecular biology, and pathway knowledge.

BioDash illustrates the power of Semantic Web both as an aggregator and much more. Using BioDash, a researcher can quickly change the information presented based on interest. One view gives all the information collected about the target (say the enzyme glycogen synthase kinase 3 ß. A simple click of a link within the dashboard and the view can be changed to explore the relationships between various chemical entities and the target.

Neumann notes that one of the most powerful features of the Semantic Web is the ability to write and carry out complex rules with very little programming effort. For example, in BioDash, Neumann can drop and drag one view — a pathway network — onto a relationships view.

With such eye-catching features, appreciation is growing that the Semantic Web offers much more value than a simple data aggregation technology. Increasingly, life scientists are looking at the Semantic Web as an underlying technology to help in decision-making processes.

For example, at this year’s Bio•IT World Conference + Expo, Tonya Hongsermeier, corporate manager of clinical knowledge management and decision support at Harvard Partners, demonstrated a semantics-based knowledge management approach for translational medicine. The system uses RDF, OWL, and SWRL to inspect patients’ medical and family history, combine this with medical and clinical protocols, and then create rules for patient selection and treatment.

The Semantic Web is just at its early stage of deployment. As with the original Web, the usefulness of a Semantic Web will grow as more data and sites support RDF and the other Semantic Web standards.

Berners-Lee believes the success of the Semantic Web hinges on grassroots efforts where individual researchers or departments start small — such as putting collected data into RDF format. The payback may not be immediately apparent, however. “If you’re looking for an ROI,” Berners-Lee warns, “it’s difficult to say what’s the Semantic Web going to do for me in 18 months. We’ve got answers for that, but what gets people going is when they realize: If I did this in the next 18 months, and so did a bunch of other people, then look at what would happen, because my data would start connecting to other people’s data.”

Still many early adopters have a mixed view of Semantic Web. For example, Anastasia Christianson, director of Discovery Medicine Informatics at AstraZeneca, says despite some hype about the Semantic Web, “We are looking into it in small areas — it is difficult to do, it is not easy to implement, but it looks like it will be valuable.”

But Rainer Fuchs, Biogen Idec’s vice president of Research Informatics and Operations, raises the thorny issue of standards. “There’s something really attractive and appealing about [Semantic Web technology], but at the same time, it doesn’t solve the underlying problem... We still have to agree on standards. The informatics industry has suffered for 15 years for its inability to agree on any standards. If we don’t agree on standards, the technology doesn’t matter.”

“What we don’t have is a way to systematically describe what the scientists are actually asking,” Fuchs adds. “There are so many vendors out there building metadata repositories, ontologies, you name it, but there’s no one out there who tries to systematically capture what the user is interested in a way that can be computed on.” (Fuchs’ and Christianson’s comments came at the Bio•IT World/CHI Bridging Discovery and IT conference in late September.)

At this stage, it is debatable who has more to gain: do life scientists need the Semantic Web to solve their data management problems, or does the long-term development of the semantic Web depend on buy-in from life scientists? Berners-Lee isn’t trying to convert everyone. As he told his Bio•IT World keynote audience: “Frankly, of the hundreds of people here, if 20 people here ‘get it,’ and go away into their respective companies as champions, and explain what it means, read papers and write papers, then it will continue to grow at the exponential rate it seems to be now.”

Rather than spending years talking about top-down IT architectures, Berners-Lee’s challenge to the life science community is simple, direct, and doable: “Don’t tell your boss, just get started.”

Photo by Kathleen Dooher

White Papers & Special Reports

HP white paper image
Extreme Storage Knowledge Center
Sponsored by HP

Visit HP’s Extreme Storage Knowledge Center to find informative, complimentary white papers, case studies, videos, product information and more.  Brief overview of topics:

  • The challenges of unstructured storage and how to manage both cost-effectively and efficiently
  • Company case studies of data storage challenges that translate across pharmaceutical and biotech companies today
  • Systems that manage vast amounts of data with simple deployment, unified management, and extreme scalability at an exceptionally low price per terabyte
  • Life sciences data management; viable solutions for small and large companies to manage growing storage demands
  • Take our virtual product tour and see our storage unit from inside out


Coupa white paper 92
10 Secrets to Recession-Proof Your Business
Sponsored by Coupa


Read this white paper to discover 10 strategies smart companies deploy to recession-proof their business.
Leaders generally face hard choices on how to mange a company during an economic downturn and
behave in one of three ways:
1) “The ostrich” - Preserve the status quo/hope for the best
2) “The bull in the china shop” - Blindly cut expenses across the board
3) “The fox” - Use the downturn to make your business more effective and position it for future growth

Learn how to behave “like a fox” and use a recession as a means to pounce on emerging trends.



SGI BriefingON image
High-Performance Computing in Life Science & Education
Sponsored by SGI and Intel
The varied collection of Bio-IT World articles and insights assembled in this BriefingON examine key trends in HPC infrastructure and how researchers are putting their best computational resources to use. Provided here are stories and lessons around the effective use of high performance computing in life science. Download the BriefingON.


Life Science Webcasts & Podcasts

Medidata Solutions

Rising Clinical Trial Delays and Costs - Addressing the Cause, Not the Symptoms 

medidata podcastProtocol complexity is taking a toll on clinical study speed and efficiency: increasingly complicated and ambitious protocols are not only burdening sites and study volunteers but are also prolonging trials and increasing expenses. In response, sponsors have turned to global study placement, restructured site relationships and new site management practices, but the problem remains.

This podcast will discuss:

  • Why these responses address only the symptoms, not the underlying cause, of rising clinical trial delays and costs.
  • Results of a recent joint Tufts University / Medidata Solutions study.
  • New metrics benchmarking protocol design trends.
  • Systematic protocol design improvements and why they are essential to clinical trial performance excellence.

Speakers: Ken Getz, Senior Research Fellow at the Tufts Center for the Study of Drug Development, and Ed Seguine, General Manager, Trial Planning Solutions at Medidata.

Download Now 



More Podcasts

Job Openings

Manager, Scientific Computing & Programming
Lead SAIC-Frederick, Inc.’s Bioinformatics & Analysis Group in developing & maintaining informatics pipelines for generation/analysis of dense genotyping & next-generation sequencing data. Required:  MS or equiv.  5 yrs related experience.  Knowledge of programming/software development, high performance computing, bioinformatics, project management. Visit www.saic-frederick.com - #130019.

For reprints and/or copyright permission, please contact The YGS Group, 1808 Colonial Village Lane, Lancaster, PA;

(717) 399-1900 ext. 125, or via email to Ashley.Zander@theYGSgroup.com.