By Mark D. Uehling
May 13, 2003 | What if you could click on genes and proteins like hyperlinks, and they exploded into starbursts and branching family trees, leading to more genes and proteins? That's the promise of software from Anacubis, a new Web-based visualization engine using Java and XML to connect disparate elements.
Anacubis wants to move into the life sciences. Software from its parent company, i2 Inc. of Springfield, Va., not to be confused with supply chain software company i2 Technologies in Dallas, is used by intelligence and law enforcement agencies to find unknown links in jumbled, heterogeneous datasets. Anacubis won't even whisper initials like CIA and FBI, but it sounds like analysts at those agencies would be familiar with i2 products.
"We dominate that market, quite frankly," says Greg Coyle, vice president of business development at the company. "We want to deliver the same technology we've made for law enforcement for commercial users." The Anacubis viewer for Web browsers is free, but accessing proprietary research or patent data is usually by subscription.
Find Protein or Osama
Coyle says that both Osama bin Laden and the snipers in the Washington, D.C. area were tracked with tools from i2, which also sells its wares into Fortune 500 companies that need to monitor competitors, patents, and news feeds. For its part, Anacubis has inked early deals to use its link-generating software to connect selected databases at Google, Lexis-Nexis, and Questel-Orbit, a provider of patent information.
Such gargantuan collections of facts, Coyle points out, are so large they defy manual analysis. "You never get a big picture of the data," he says. "You can't see all the relationships within it. The software exposes data the user might not have seen before."
In Coyle's demo of his software, he clicks on a hypothetical trove of information about a criminal case, with the suspect's cell phone numbers, mortgages, and other data all appearing as discreet icons on screen. It's impressive: a few clicks of the mouse lead to a new suspect. "Visualization is only part of our focus," says Coyle. "What do you do after you have got that visualization?"
Are My Data Structured?
Such analysis is only possible, of course, because the underlying databases have been groomed. That can mean simple XML tagging, or more elaborate formatting. "We don't get involved in data categorization," says Coyle. Categorization is the province of firms such as Entrieva (see sidebar, page XX), Inxight, Autonomy, and Clear Forest, all of which help companies organize their data. "Most business data is now structured," says Coyle.
How new is what Anacubis is doing? Georges Grinstein, founder and chairman of the scientific advisory committee of AnVil and co-director of the Institute for Visualization and Perception Research, says merely presenting large data sets using starbursts is not earth-shattering. "What's going to make the difference," he says, "are the tools that are going to be available to interact and query the data. Does the system support the next step the user wants to do?"
Grinstein notes Anacubis has some similarities to NetMap, an Australian application that has its own following in the world of law enforcement -- and a few life science customers. For instance, after seven backpackers were murdered in Australia in the early 1990s, NetMap software combed through vehicle records, gym memberships, gun permits, and police data. The list of suspects started at 18 million individuals.
Thanks to NetMap software, that number was trimmed first to 230 individuals, and finally to just 32, which included the killer, who was apprehended and convicted. Tools like Anacubis and NetMap could mean it's time to rethink one of the favorite refrains in the life sciences: that the winnowing process to find new drugs is more arduous than any other informational sifting.
Sidebar: Giving Form to the Formless
Tom Lewis, president and CEO of a small Virginia company called Entrieva, makes finding needles in hay sound easy. What if you're also looking for grass, wire, thread, hair, and twine?
Entrieva's SemioMap software can crawl through vast reams of data and automatically render them searchable. By putting XML wrappers around key noun phrases, the company's algorithms can index anything. Customers can point the Entrieva algorithms at their own servers full of email, Word documents, spreadsheets, Powerpoint presentations, Lotus Notes files -- or even at the Food and Drug Administration Web site.
The process creates hierarchical scaffolds of knowledge called taxonomies. A taxonomy of disease could have one branch -- asthma -- or hundreds. It may sound hopelessly abstract, but it isn't. Pfizer, AstraZeneca, GlaxoSmithKline, Hoffmann-La Roche, and other life science companies are using Entrieva's tools to organize their unstructured data. "A lot of large pharmas have been using our technology for years," says Lewis.
As Lewis points out, roughly 80 percent of the data generated in most modern companies is outside databases. Technically, these data are "unstructured." Giving unstructured data form is an emerging art of knowledge management. XML formatting and other meta-tagging can automatically link large quantities of new data to old information, helping companies automatically monitor patents, competitors or scientific activity in ways that no single individual could.
Entrieva makes Google look like a good tool for elementary school students. Lewis describes a hypothetical client that needs to know about imports of a particular drug, in a particular country, over a certain period of time. Finding that with conventional searching is impossible. But with his software, it's possible to drag and drop query modules for structured data about chemical compounds, nations, and chronology -- and come up with an answer.
The newly structured data can be stored just about anywhere, but data repositories and Oracle databases are the usual choices. With the latest version of the SemioMap program, 6.0, the offerings can be made available via Web services. The explosion of data, Lewis says, means his company and others like it only have to execute in what is clearly a growth market:: "It's like death and taxes. The rate of acceleration of information is not going to decrease. It's going to get faster. Our tool helps address that."