Oct 17, 2005 | “I see a huge amount of energy from people in the life sciences getting excited about the Semantic Web and what it can do to solve the big IT problems... But also the people involved in the Semantic Web pushing it along are also very excited about getting involved in the life sciences — it’s one of those areas that affect humankind, finding drugs, curing AIDS and cancer, etc. There seems to be a huge energy, and lots of practical technical reasons why this area is crying out to be one of the flagship areas that the Semantic Web really takes off...”
— Sir Tim Berners-Lee, Bio•IT World Conference+Expo, May 2005
Anyone who struggles to unite data from disparate resources in their work has heard the story. A vendor or group comes along with a new idea that is going to make information extraction and data access, integration, and sharing easier — much easier.
Over the years, most organizations have experimented with varying approaches ranging from remote procedure calls, to distributed object-oriented data models, to Java-based Web tools and portals to tie together information from numerous sources and present it in a way researchers might actually find useful. Or, they have initiated grand knowledge management projects to search for and extract similar information from disparate data sources.
But alas, most efforts require obscene amounts of programming and often require major changes to — and investments in — IT infrastructures.
Proponents of a new technology called the Semantic Web think they have something that might do the trick. At the simplest level, the Semantic Web puts data into a machine-readable format so that computers can aggregate data and make inferences about relationships between certain types of data. It has applications across many life science arenas including R&D, clinical trials, translational medicine, and personalized medicine (see Semantic Web: Safety and Innovation, page 18).
It sounds too good to be true. So, why would anyone believe yet another proposal that claims to help make it easier to get more information from his or her data?
For starters, the principal evangelist of the Semantic Web has certain credentials to back up his claims. Sir Tim Berners-Lee invented the World Wide Web 15 years ago, while working at CERN. Berners-Lee now heads the World Wide Web Consortium (W3C) at MIT, which is trying to move Semantic Web technology forward.
Berners-Lee sees the Semantic Web as the Web’s next phase. “If I have to define it, which is tricky, then I have to say it’s data integration across application boundaries and organizational boundaries,” he said in his 2005 Bio•IT World Conference + Expo keynote address. “The Semantic Web is about looking at data in a Web-like way, a bottom-up way, it’s not top-down, and quite a lot of the ways of looking at the Semantic Web are different from the ways we’ve done it before. It’s not like object-oriented programming, and it’s not that easy to explain... believe it or not, it wasn’t that easy to explain the World Wide Web 15 years ago.”
One way to illustrate the potential power of using the Semantic Web is to compare it to the current Web. As it exists today, the Web presents information that is easy for people to read. It presents text in natural languages (English, French, Chinese, etc.) and uses graphics, images, and videos. But while a person can process this information, computers cannot.
“Today, all tags and bookmarks are designed for humans to read,” says Eric Neumann, an independent consultant and former head of discovery informatics at Sanofi-Aventis. With the Semantic Web, the information on Web pages and in data repositories is machine-readable. This offers several major advantages, says Neumann. First, similar data and information can be aggregated. Second, you can do machine-readable queries. “You can start hubbing and aggregating data,” says Neumann.
Putting such capabilities to work offers some significant benefits. “The Semantic Web allows you to take better advantage of information on the Web,” says Dennis Quan, a researcher at IBM Thomas J. Watson Research Center.
Whereas Berners-Lee prefers the analogy of a London subway map (see Going Underground
, page 31), Quan offers a different example to illustrate the advantages of the Semantic Web: “If you want to see a movie, you might go to a number of Web sites and get several reviews and other information [start times, theater location , etc.]. This is okay if you only go to a movie once in a while. But it is not practical if you do it many times.”
“In bioinformatics, this type of [operation] is done all the time,” Quan continues. “People repeatedly go to a number of sites and download information to conduct their work.” The process is labor intensive — it requires opening a new browser session with each site. Frequently data must be cut, reformatted, and pasted into another application for it to be really useful.
The Semantic Web approach greatly simplifies this process. “With the Web today, if you need data from 10 sites, you need to go to all 10 sites and cut and paste the data to get an integrated view,” says Matthew Shanahan, chief marketing officer at the life science experiment design automation software company Teranode. “Semantic Web pushes this job of assembling data out from the desktop into the network. With the Semantic Web, the network knows how to get and assemble the data.”
To that end, a Semantic Web browser can be configured to go to multiple sites, find the specific information required, retrieve this information, and display it in a single Semantic Web browser. In essence, this application of Semantic Web technology is akin to a next-generation portal.
Such capabilities make the Semantic Web very interesting to life science organizations — or at least some. “The advent of the Semantic Web is providing the life sciences community with the standards and tools needed to build integrative informatics systems,” says John Reynders, information officer, discovery & development informatics, at Eli Lilly. “We are very interested in the [Semantic Web standards] and view them as essential tools in cracking the heterogeneous data integration challenge facing our drug-hunters here at Lilly.”
Those standards Reynders refers to are the heart of the Semantic Web’s potential for improving data accessibility. The W3C has developed standards for:
- Data description and identification — the Resource Description Framework (RDF)
- An ontology language — Web Ontology Language (OWL)
- A semantic Web Rule Language (SWRL)
Semantic Web uses these standards in conjunction with existing data formatting and tagging standards such as XML and the Life Science Identifier (LSID). The result is a way to describe data and the relationship between various data elements (see The W3C Perspective, page 32).
Oracle is leading the efforts of some vendors in supporting RDF. Oracle added RDF support to the 10g database this summer. “In discovery, RDF helps [researchers] aggregate public data with their own internally generated data,” says Susie Stephens, principal product manager, life sciences, at Oracle. As an application of the technology, she points to Siderean Software, which incorporated RDF into its Seamark Navigation Server and has been demonstrating the application to the life sciences at recent conferences.
A key factor of the Semantic Web is that elements in a data set — a protein, gene, drug, or an author’s name — are uniquely identified and there is some information about the relationship of that element to other elements.
Putting in information about relationships between elements is what differentiates semantic Web data from simply adding metadata to a database. In the Semantic Web, elements are defined in statements called Semantic Web triplets, which contain a subject, predicate, and object.
This triplet description can be used in many ways. For example, it can identify an element in a biological sense or with respect to its use to a company. One could construct triplets along the lines of “kinase” is a “kind of” “protein” or “kinase” is a “kind of” “drug target.” One might also link a data element to a diagram or model. For instance, “human hemoglobin” “has a 3-D structure” of “this” (where this is 3-D representation in the Protein Data Bank).
With this triplet format, some life science areas that previously were treated as very separate entities, requiring different database technologies and analysis and search tools, start to look similar from a pure data-handling perspective. And as such, the Semantic Web offers a way to view, analyze, and act on disparate data — something that is critical in new life sciences areas of research. “If we are going to do translational medicine, we need to build bridges,” says Neumann. “If you look at subjects in clinical trials and patients in therapy, these are two different business models,” says Neumann. “But basically, they’re the same thing — [people and their reaction to drugs].”
Neumann, Quan, Stephens, and a handful of other forward thinkers are leading the charge on Semantic Web for the life sciences. Under the guise of the W3C’s Semantic Web for life sciences group, the group has developed BioDash — a Semantic Web prototype of a drug development dashboard that associates disease, drug progression stages, molecular biology, and pathway knowledge.
BioDash illustrates the power of Semantic Web both as an aggregator and much more. Using BioDash, a researcher can quickly change the information presented based on interest. One view gives all the information collected about the target (say the enzyme glycogen synthase kinase 3 ß. A simple click of a link within the dashboard and the view can be changed to explore the relationships between various chemical entities and the target.
Neumann notes that one of the most powerful features of the Semantic Web is the ability to write and carry out complex rules with very little programming effort. For example, in BioDash, Neumann can drop and drag one view — a pathway network — onto a relationships view.
With such eye-catching features, appreciation is growing that the Semantic Web offers much more value than a simple data aggregation technology. Increasingly, life scientists are looking at the Semantic Web as an underlying technology to help in decision-making processes.
For example, at this year’s Bio•IT World Conference + Expo, Tonya Hongsermeier, corporate manager of clinical knowledge management and decision support at Harvard Partners, demonstrated a semantics-based knowledge management approach for translational medicine. The system uses RDF, OWL, and SWRL to inspect patients’ medical and family history, combine this with medical and clinical protocols, and then create rules for patient selection and treatment.
The Semantic Web is just at its early stage of deployment. As with the original Web, the usefulness of a Semantic Web will grow as more data and sites support RDF and the other Semantic Web standards.
Berners-Lee believes the success of the Semantic Web hinges on grassroots efforts where individual researchers or departments start small — such as putting collected data into RDF format. The payback may not be immediately apparent, however. “If you’re looking for an ROI,” Berners-Lee warns, “it’s difficult to say what’s the Semantic Web going to do for me in 18 months. We’ve got answers for that, but what gets people going is when they realize: If I did this in the next 18 months, and so did a bunch of other people, then look at what would happen, because my data would start connecting to other people’s data.”
Still many early adopters have a mixed view of Semantic Web. For example, Anastasia Christianson, director of Discovery Medicine Informatics at AstraZeneca, says despite some hype about the Semantic Web, “We are looking into it in small areas — it is difficult to do, it is not easy to implement, but it looks like it will be valuable.”
But Rainer Fuchs, Biogen Idec’s vice president of Research Informatics and Operations, raises the thorny issue of standards. “There’s something really attractive and appealing about [Semantic Web technology], but at the same time, it doesn’t solve the underlying problem... We still have to agree on standards. The informatics industry has suffered for 15 years for its inability to agree on any standards. If we don’t agree on standards, the technology doesn’t matter.”
“What we don’t have is a way to systematically describe what the scientists are actually asking,” Fuchs adds. “There are so many vendors out there building metadata repositories, ontologies, you name it, but there’s no one out there who tries to systematically capture what the user is interested in a way that can be computed on.” (Fuchs’ and Christianson’s comments came at the Bio•IT World/CHI Bridging Discovery and IT conference in late September.)
At this stage, it is debatable who has more to gain: do life scientists need the Semantic Web to solve their data management problems, or does the long-term development of the semantic Web depend on buy-in from life scientists? Berners-Lee isn’t trying to convert everyone. As he told his Bio•IT World keynote audience: “Frankly, of the hundreds of people here, if 20 people here ‘get it,’ and go away into their respective companies as champions, and explain what it means, read papers and write papers, then it will continue to grow at the exponential rate it seems to be now.”
Rather than spending years talking about top-down IT architectures, Berners-Lee’s challenge to the life science community is simple, direct, and doable: “Don’t tell your boss, just get started.”
Photo by Kathleen Dooher