YouTube Facebook LinkedIn Google+ Twitter Xingrss  

Spying Content with Document Lens

Praxeon’s new Cloud-based ‘efficiency tool’ extracts themes from scientific documents.

By Kevin Davies

June 8, 2011 | “Praxeon is based on the idea that true knowledge is embedded in textual content. For the past 20 years, I was dealing mainly with structured data—complex connections between chemistry, biology, and genetics. Textual information is the last shore. If you can integrate that along with the structured data, search it, get to the pieces of information you’re really interested in, that’s the Holy Grail. That’s why Praxeon was founded.”

That’s the bold mission statement of Dennis Underwood, the Australian co-founder of Praxeon, based in Cambridge, Mass. Underwood previously served as VP of computational sciences at Infinity Pharmaceuticals, where he succeeded Andy Palmer (see, “Conquering Infinity with Chemical Genetics,” Bio•IT World, Feb 2003), and developed the Infinity informatics platform. He also had stints at DuPont Pharmaceuticals, working in discovery informatics, and Merck. His co-founder at Praxeon is his former informatics colleague at Infinity, Kevin Gilpin.

“All of us are being drowned in data and information,” says Underwood. “The real problem is getting to the information that’s relevant to you in your context. It’s true of everybody... But in pharma, there’s a need to quickly spot patterns and extract usable information from the flood of data coming in. That’s extremely hard and getting harder.”

The solution, as Underwood sees it, is a new Praxeon offering called DocumentLens ( Researchers spend tremendous amounts of time searching for information, scanning articles, and trying to integrate and contextualize information with other miscellaneous data sources. That often unproductive cycle is repeated many times for each topic or project. Meanwhile, the task of keeping abreast of new information—PubMed, patent documents, Google Scholar, and third-party proprietary sources—is increasingly difficult. Underwood points out that PubMed houses more than 20 million articles.

“If you have 3-4 topics that you need to follow and understand, it’s very, very tough,” says Underwood. “The net result is knowledge gaps and missed opportunities. But it doesn’t have to be this way.”

DocumentLens is a web-based “efficiency tool” that provides users with direct and immediate access to ideas and content in a large collection of miscellaneous documents, but all centered around a particular topic of the user’s choosing, such as a disease or a gene family. It operates through “semantic fingerprinting,” but that’s just a component of the tool—and not one that the Praxeon team cares to dwell on. “Mention the term ‘semantics’ and people’s eyes just glaze over, they don’t want to know how it works, but rather want to know that it works,” says head of business development Bill Hayden.

Drag and Drop

The process begins with a web-driven folder that sits on the user’s desktop. “It’s just like any other folder,” says Underwood. “You dump content—articles, PDFs etc.—in there. We digest it, create a model of what the content is, and then we provide information navigators allowing you to search across biology or chemistry or clinical trials data. The Holy Grail is to be able to make connections between bits of information you’ve gleaned from various documents and draw inference, the basis of new ideas and understanding. You can create a theme or story line across documents in a collection that describes the innovations that have occurred and... opportunity for innovation going forward.” (See “Eye of the Lens”)

All of Praxeon’s tools are built within the context of a cloud environment, which Praxeon has been using for some five years. In addition to DocumentLens for life sciences, Praxeon has built a search product for health care professionals, organized in a physician-friendly way, available at www.Curbside.MD. Another feed for patients ( supplies information from news sources including mainstream media channels.

One of the key features of DocumentLens is that it handles full-text content, not merely abstracts. “We needed full-text analysis for the research community, to drill down into the article and look for the components for the arguments that build understanding that leads to the necessary conclusions. Then we stitch it all together,” says Underwood. The software also has the unique ability to stitch text with molecular structure data, chemistry, biology and clinical trial data to enable scientists and even non scientist reach a rational conclusion.

Hayden says Praxeon already has one leading biotech firm using DocumentLens. The reaction has been very positive, particularly the ability to ask any question of the collection, just as if they were asking a colleague. In addition, users applaud the ability to collaborate with their colleagues one-on-one, in groups and worldwide. The results are returned not just by article, but by the page where the keyword is mentioned. “Colleagues can look at each other’s notes in a concept we call the ‘storyline,’” is a series of annotations to a chronological listing of papers; any colleague can annotate or comment on specific papers, which are then displayed on the storyline. “We build a chronological map of all the annotations on these articles,” says Underwood.  

While DocumentLens was originally used by individual scientists and small research groups, it functions as an enterprise program to help users rapidly build an understanding of an area of research and provide value back to the large, multifunction and international enterprise. And it could prove useful for large pharma companies to mine the vast collections of buried documents and investigator document libraries left in the wake of mergers and layoffs.  


Eye of the Lens

Working with DocumentLens is straightforward. As an example, Underwood quickly installs a dedicated DocumentLens folder onto a desktop, into which he drags any desired information—PDFs, Word articles, any other documents. Powered by WebDrive, the folder automatically syncs with the cloud (Amazon EC2). Within minutes, the articles are securely uploaded into the cloud, and citations generated automatically.

As a demo, Praxeon maintains a dedicated collection on dementia for any new clients to explore, including more than 300 full-text research documents (about 3,000 pages of text), peer-reviewed papers and patent information. “Any time I add a new document to the folder, it automatically gets synced with the cloud instance and gets digested,” says Underwood. At the same time, each document in the collection is monitored for relevant new publications in PubMed. “If anything goes in that’s similar, then you’re notified,” says Underwood.

A dashboard provides an overview of collection content. It recognizes key text in documents. Using what Praxeon calls semantic fingerprinting, the terms are understood, and chemical terms are translated to full structures.

To mine the information, the user can simply type in a question in the form of how they might ask a senior colleague or simply drag and drop a paragraph from an article they may have come across that encapsulates the type of information they are seeking. K.D.



Click here to login and leave a comment.  


Add Comment

Text Only 2000 character limit

Page 1 of 1

For reprints and/or copyright permission, please contact  Jay Mulhern, (781) 972-1359,