YouTube Facebook LinkedIn Google+ Twitter Xinginstagram rss  

The Language of Text Mining

By Kevin Davies

Dec. 17, 2007 | The fifth Fraunhofer Symposium on text mining in the life sciences last September was supported by TEMIS, a French text mining company, with offices in Paris, Grenoble, Germany, and Philadelphia.

Last year, TEMIS released a new text mining platform called Luxid, replacing Insight Discoverer Suite. TEMIS subsequently released Luxid for Life Sciences, which is used in three major areas: accelerating drug discovery; analyzing and monitoring company IP assets (especially patents); and reducing adverse event-related risks.

"Luxid for Life Sciences integrates all our knowledge and experience in the field of life sciences. It integrates most of our experience through cooperation with larger pharma organizations and institutes. The key feature is basically the combination of three domains of knowledge - chemistry, biology, and medicine," says Charles Huot, TEMIS' cofounder and COO.

The list of TEMIS' pharma customers has grown steadily over the years, including Sanofi-Aventis, Roche, Novartis, and Pfizer. The company recently announced a large deal with Bayer HealthCare to support drug discovery efforts. "The objective for all these projects is really to help scientists read the scientific literature faster, whether it is Medline abstracts or full-text. The goal is always the same - the need to accelerate the discovery process," says Huot.

A major application of text mining is to understand the druggability of a particular target. "The idea is to look at combination between the chemistry model and the biological model," says Huot. "To do that, you need to discover and understand both the chemical compounds within documents, whether text or chemical formulae, and identify the associated gene or protein." Extracting information on both chemicals and biological products "needs two features that are hard to obtain together," says Huot.

Another key feature is normalization, says Huot, so everyone knows the drug names. But Luxid also allows investigators to "draw the chemical compound and search with the drawing." Huot claims that this dual search system - text and chemical structure - is unique.

One area that TEMIS has exploited successfully is adverse events. Huot points to a major U.S. pharma company that was required by the FDA to comply with Sarbanes-Oxley by managing information received in unsolicited emails. "If you have a patient who sent an email to the website of a pharma company, they have to read this email - they can't just throw it away. They have to see whether you might find a potential adverse event," Huot explains.

Such unsolicited emails might say: 'I'm taking this particular drug,' 'I feel sick, it seems to be worse than before,' or 'I get vertigo when I stand up.' "This email must be identified as a potential adverse event. It should be posted into a special FDA form and reported to the FDA in a very short time," says Huot.

Many pharma companies thus employ "an army of people" to read those emails. If anything strange is detected, "you have to report this information to another level," says Huot. "We are working on a system that replaces the human being to detect the potential association by reading the email. This is why we are growing our unit in Philadelphia," says Huot. "We put the system in production May 2007. It's a savings of hundreds of thousands of dollars per year" for the pharma client.

Although Luxid for Life Sciences encapsulates knowledge from chemists, biologists, and physicians, Huot acknowledges the importance of expert domain knowledge. The new Luxid release in early 2008 will feature social tagging - "the ability for Luxid to learn from its users," says Huot. "All automatic systems make errors - it's not a problem so long as you can learn from those errors and modify dynamically, and provide to your end user the correction very quickly. Luxid will integrate this ability to quickly integrate the knowledge from hundreds of scientists directly into the system."

Back to the main article.

Click here to login and leave a comment.  


Add Comment

Text Only 2000 character limit

Page 1 of 1

For reprints and/or copyright permission, please contact Angela Parsons, 781.972.5467.