Unstructured Data and the Discovery Problem

March 9, 2015

By Bio-IT World Staff  

March 9, 2015 | Tomorrow Content Analyst Company will announce the general availability of Cerebrant, a SaaS-based discovery platform designed to enable subject matter experts in any industry to gain rapid insight into unstructured content.

The company will be demonstrating Cerebrant’s capabilities for the pharmaceutical industry during a Bio-IT World webinar tomorrow, March 10, at 1:00 EDT.

Cerebrant is not intended to be a workflow product, John Felahi, CSO at Content Analyst Company, told Bio-IT World. Instead, it aims to solve a discovery problem by working with unstructured data.

The solution is the first commercial tool built on the company’s CAAT text analytics engine, used to process billions of text items in the US Intelligence Community and dozens of eDiscovery software products.

To hear Felahi tell it, Cerebrant is a Carrie Mathison-level analyst, but without the authority issues. Users can select data from public data sets and also upload  their own data via zip files to their SaaS workspace. Within hours, Cerebrant is parsing the information, learning internal acronyms, and seeing connections that no one else has made.

The company is not new—Content Analyst Company celebrated its 10th anniversary last December—but while past CAAT deployments have been within the firewalls of other products, Cerebrant is available through a browser-based interface requiring little or no IT support.

Other unstructured search products rely on predetermined lists of terms, Felahi explained. Cerebrant responds to how terms are used over time, understanding the “voice of the author”, he says, so nothing is missed.

To do this, Cerebrant uses Content Analyst Company’s proprietary Latent Semantic Indexing (LSI)-based learning engine.  Users paste in a selection of text ranging from a short phrase, sentence, paragraph, or entire document and Cerebrant identifies and ranks the most conceptually related documents, articles and terms across the selected content sets ranging from tens of thousands to millions of text items.

Workspaces can be easily shared to facilitate collaboration, and can be deleted once the dataset has been analyzed.

The pricing is meant to facilitate discovery, Felahi said. Cerebrant starts at $5,000 a month for ten users. A pharma version, preloaded with datasets including PubMed Central, FDA guidances and drafts, and pharma industry news, is $7,500 a month for a limited time.