Oct. 16, 2006 | At the Windber Institute in Windber, PA, Michael Liebman uses software from Chicago-based SPSS to scan about 250,000 journal pages per hour. Liebman's group is building a system to process cancer literature as a prototype for a more generalized system, scanning approximately 40 journals, and "using the full-text articles and semantic analysis of the text without using a biased thesaurus such as the [PubMed] MESH headings," says Liebman. He is interested both in more fundamental research into knowledge extraction and identification, as well the application to his groups' research.
Liebman began his group's text-mining efforts using SPSS's Lexiquest Mine tool, now part of Clementine. "It's been very effective," says Liebman, "but we continue to evaluate new tools and processes and will be examining Linguamatics shortly, through our relationship with Inforsense. We are continuously evaluating different text mining approaches and tools ... This may become a spin-out activity of the institute in a more commercial endeavor."
SPSS provides predictive analytics software, not purely text mining solutions, says Eric Martin, SPSS' director of text mining product marketing. "Many customers in life sciences want to better understand huge amounts of data and predict future events including both structured and unstructured information like drug toxicity and protein associations," says Martin. "We use NLP to understand all the sentences and extract the key events. The tool is shipped with a gene ontology dictionary, MESH terms, and predefined patterns to analyze protein interactions and cellular localization."
In addition to combining data and text mining, SPSS is grid computing compliant. "A huge pharma customer in the UK has one of the largest text-mining applications in life sciences," says Martin. "Text mining is applied to grid computing to mine Medline in a few hours." SPSS has partnered with United Devices so its text-mining tool can be deployed on the grid. -- K.D.
Return to main article.