YouTube Facebook LinkedIn Google+ Twitter Xingrss  

Get Serious About Information Mining

By Salvatore Salamone

May 12, 2006 | Life science companies have long faced the problem of information silos. In a session on text, data mining, and the Web at last month’s Bio-IT World Expo, industry experts talked about some of the challenges and solutions that might help break down silos.

TRACK: IT/Informatics Solutions for Drug DiscoveryJerome Pesenti, chief scientist and co-founder of Vivisimo, noted that one common way companies try to cut across different databases is to use a federated search that encompasses all the information. This exacerbates the problem of trying to find or extract meaningful information from the collection of data, according to Pesenti.

“People often talk about information overload,” said Pesenti. “But it’s more overlook than overload.” Doing a search across multiple databases simply returns more results, and researchers may only look at the first few pages of returned results.

Pesenti’s point is that a straight Google-like search, while helpful in many situations, may overwhelm a researcher when hundreds or thousands of hits are returned. A technique that Vivisimo has developed is to create topic folders on the fly and group links to each topic into these folders. This helps a researchers navigate results more efficiently (For an example, visit

Along somewhat different lines, Manuel Peitsch, head of informatics and knowledge management at Novartis Pharmaceuticals, spoke about Novartis’ work on an expert system for contextual hyperlinking.

“The idea is to navigate at the conceptual level,” said Peitsch. “You want to be able to connect knowledge bodies [such as] literature, modeling, microarrays, assays, bioinformatics, etc.”

Many companies use simple text-mining techniques to try to find information. But Peitsch noted one problem with text mining is “resolution of ambiguities.” For instance, some terms can be the name of a company, the name of a disease, and the name of a drug target, and a knowledge management system would need to know the context to distinguish one from the other.

To help its researchers, Novartis developed an approach to knowledge management that intelligently integrates information. One Novartis solution that company officials have been talking about is Knowledge Space and the use of metadata and “knowledge maps” to describe the collection of information in terms of content and the location of the content. Novartis has spent a great deal of time putting data and libraries into suitable format for use in its Knowledge Space approach.

Peitsch spoke about taking advantage of all of this work using what Novartis calls an UltraLink, basically an intelligent context-sensitive hyperlink. By knowing about the context in a link, UltraLink presents an appropriate menu of choices to the researcher. An example used in past presentations by Peitsch showed an article with a hyperlink for the phrase “chronic myeloid leukemia.” Clicking on that link, a researcher would be given the option to retrieve information on all known launched products for this disease or a list of any internal or competitive products in development. Other menu items allow a researcher to initiate actions such as performing a search in Ensembl. If a gene appears in the text of an article, one UltraLink menu option might be to launch a BLAST run.

Wiki World
The Novartis approach is a top-down approach with well-defined processes for including libraries, integrating information, and developing metadata. Another speaker in the session, Eric Gerritsen, co-founder of BioPeer, discussed bottom-up collaborative and knowledge sharing technologies, such as shared wikis, Technorati, and

Like the Web’s Wikipedia encyclopedia, researchers could use a shared wiki to disseminate knowledge about a topic and to collaborate on a project. A number of companies such as JotSpot and SocialText target businesses with their wiki offerings.

Technorati is a search engine that crawls blogs. Beyond the normal content search, Technorati also lets a user search news and blogs via user-generated tags (i.e., tags the author has assigned to the blog’s content). lets researchers aggregate and tag RSS feeds, URLs, and files into one place. Within a researcher’s own workspace, tags are displayed in what is called a “cloud formation,” where the relative size of a particular tag denotes the number of times it has been used.

When a researcher looks at the tags of all users, the most frequently used tags are displayed. Again, the cloud format is used so that one can quickly get a sense of the topics other people consider to be important. And any tags the researcher has in common with the others appear in red (see figure on page 31).

Short-term, such technologies are likely to be adopted primarily by individual researchers. Word of mouth may expand their use to colleagues. Proponents of these technologies believe that, as with the original Web, as more people use them, the more useful they will become.

Click here to login and leave a comment.  


Add Comment

Text Only 2000 character limit

Page 1 of 1

For reprints and/or copyright permission, please contact  Jay Mulhern, (781) 972-1359,