July | August 2006 | The explosive growth of digital information is increasingly impeding knowledge-worker productivity due to information overload. Online information is virtually doubling every year, and most of that information is unstructured — typically in the form of text.
Traditional search engines are unable to sustain the pace of information growth primarily because they lack the intelligence to “understand,” semantically process, mine, infer, connect, and contextually interpret information to transform it to — and expose it as — knowledge. Furthermore, end-users want a simple yet powerful user-interface that allows them to flexibly express their context and intent and be able to “ask” natural questions on the one hand, but which also has the power to guide them to answers for questions they wouldn’t know to ask in the first place. Today’s search interfaces, while easy to use, do not provide such power and flexibility.
The Semantic Web addresses the need to understand and contextually interpret information, by marking up documents and providing metadata for the Web. Technologies such as RDF and OWL provide common structures that support exchange of data using XML. In contrast to the Semantic Web, however, there are search technologies that do not require any metadata to understand the documents and therefore require no markup of legacy information.
Knowledge has multiple axes, of which search is only one. Knowledge-workers wish to discover information they might not know they need ahead of time, share information with others, and have information presented in a way that is contextual, intuitive, and dynamic — allowing for further exploration and navigation based on their context. Even within the search axis, there are multiple sub-axes, for instance, based on time-sensitivity, semantic-sensitivity, popularity, quality, brand, trust, etc. The ultimate goal of an information-retrieval system should be to blend multiple axes for retrieval, capture, discovery, annotations, and presentation into a unified medium that is powerful yet easy to use.
In general, an intelligent information retrieval system should allow users to find knowledge, rather than merely information. Knowledge, in this context, is information infused with semantic meaning and exposed in a manner that is useful to people along with the rules, purposes, and contexts of its use. Knowledge requires the presence of context, semantics, and purpose. Today’s search engines have none of these elements and are fundamentally ill equipped to deal with the problem of information overload.
The Problem with Keywords
To mimic the intelligent behavior exhibited by a human researcher or librarian, an intelligent information retrieval system must first be able to “understand” what it stores and indexes. Today’s search engines cannot discern between keywords when those keywords are used in different contexts. For instance, the word “bank” can mean very different things — a commercial bank, riverbank, or “the sudden bank of an airplane.” This shortfall obliges users to manually filter out thousands of irrelevant results that have the right keywords but in the wrong context (false positives).
An intelligent information retrieval system must be able to retrieve information that doesn’t have the user’s expressed keywords but which is semantically relevant to those keywords. This would address the false negatives problem — wherein search engines leave out results that they deem irrelevant only because the results don’t contain the “right” keywords. For instance, the word “bank” and the phrase “financial institution” are semantically very similar in the domain of financial services. An intelligent retrieval system should recognize this and return the right results with either set of keywords.
Evaluating Knowledge Discovery Solutions
Each company has different constituencies within the organization with unique requirements when evaluating knowledge discovery solutions. Knowledge workers and researchers should review a solutions’ ability to discover information across silos, the use of wildcards to simply queries, and semantic ranking of results to quickly indicate relevance and precision.
IT staff should be evaluating ease and quickness of integration with existing systems, scalability, and performance, as well as how cost-effective the solution is and the projected ROI in context of increasing each knowledge workers’ productivity.
Meanwhile, business decision makers will examine how a solution leverages its current information and knowledge assets and how to increase the return on information and data the company acquires or creates. The result should be high-level insights across silos to support better decision making earlier in the discovery process.
Beyond standard search, technology exists to bring knowledge discovery to the desktops of life science workers. Understanding the difference between standard search retrieval and true knowledge discovery is an important step in helping knowledge workers gain insight and grapple with larger universal health and medical issues. l
Nosa Omoigui is the CEO of Nervana.