Nov. 15, 2006 | In a review published last year, Upinder Bhalla from the Tata Institute in Bangalore argued that modern pathway analysis tools could have detected Vioxx’s off-target effects when it was first in trials in 1999. Last summer, Ken Paigen, Director of the Jackson Laboratory explained to Ingenuity Systems’ inaugural user group meeting his use of pathway tools to map the functional organization of the mouse genome. And next spring, William Ricketts of Oncotech will present at a CHI’s Molecular Medicine Tri-Conference work using pathway analysis to identify signaling factors implicated in cisplatin resistance in ovarian cancer.
Innovative uses for pathway tools and exciting results from early users are sprouting like mushrooms after a spring rain — albeit following a few harsh winters. Three years ago pathway pioneer Ingenuity Systems had fewer than 1,000 users. Today, CEO Jake Leschly says there are 10,000 Ingenuity users. Fellow pioneer and rival, GeneGo claims its market penetration is even bigger — and, of course, that its tools are better.
Whatever the precise numbers, it is clear that the adoption of pathway analysis tools is growing and giving the beleaguered informatics sector something to cheer about. Take, for example, the comments of Venkateshwar Keshamouni, assistant professor in the Department of Internal Medicine at the University of Michigan:
“One senior professor said to me, ‘These software tools don’t do anything that a dedicated graduate student cannot do on his own by doing PubMed searches.’ He’s probably right; however it is pretty darn good if a software can consistently substitute for a dedicated graduate student. Literally with the click of a button these tools can provide a good overview of what is already known about the data set of interest, including potential interactions. On the whole, these tools will become more and more useful as they keep evolving by integrating different types of data and have potential to become indispensable tools for systems biology.”
At the Translation Genomics Research Institute (TGen) in Arizona, Jeff Kiefer, head of Knowledge Mining and Bioinformatics in the institute’s Pharmaceutical Genomics Division, is grappling with huge gene lists generated by high-throughput siRNA screening, much like a microarray experiment. “We employ a number of pathway tools, such as GeneGo’s Metacore, to assist in a number of areas around the large-scale datasets. Metacore is used in the pre-high-throughput (HT) siRNA screen to identify relevant genes to serve as controls in assay development steps. This type of analysis is also used in development of weighting schemes to improve hit selection for advancement to confirmation. After confirmation, we use Metacore to develop actionable hypothesis. We believe pathway tools play an indispensable part in all aspects of HT siRNA.”
Shoot That Pathway Arrow
Don’t get the wrong idea — it’s not all quick leads and validated biomarkers — but pathway analysis tools are gaining traction with researchers. They represent a multi-purpose arrow in the research quiver that straddles knowledge management and knowledge creation. Using them generally requires two components: a database of molecular interactions and biological interpretations, largely drawn from peer journals; and software to search the database, identify relationships, and present putative pathways related to a particular question.
At first, the tools were largely restricted to interpreting gene expression experiments. A researcher would input a list of differentially expressed genes from an experiment; the software would search the database for related interactions, perform some mathematical magic, and spit out suggested pathways shedding light on possible targets, mechanism of action, and possible biomarkers.
Today, the databases and software have grown more complex. Data of all types — gene expression, protein interaction, metabolites, drug interactions, species, and disease information — are rapidly being incorporated into the databases. Analysis feature sets are likewise growing and visualization is improving. Pathway technology suppliers are also racing to make their tools more biologist-friendly and starting to embed predefined workflows that can answer questions with a single mouse click.
Ingenuity’s Leschly gushes, “The penetration of [pathway] tools eventually is going to hit most biologists. So I don’t know how many there are, thousands or a million. They’re all going to have a need to use them because it’s so ubiquitous in its relevance, whether to experimental platforms they are using or to biological questions they are asking.”
Buyers are getting savvy too. “I believe the main difference is that people know they need pathway tools,” says Ilya Mazo, president of Ariadne Genomics. “You don’t have to sell them on the idea. You actually have to sell them on the specific tool. The change was really happening last year and this year we passed that adoption stage. Two years ago it still was perceived as a nice toy, something cool, but not absolutely necessary.”
Currently Ariadne, GeneGo and Ingenuity lead the U.S. commercial tool space, but there are many others offering pieces of the pathway puzzle. India’s Jubilant Biosys (see “Jubilant Curries Favor with BioPharma,” this issue), for example, has a respected pathway database. Europe’s Genomatix Software has a comprehensive in silico platform encompassing gene expression analysis platform and pathway analysis (BiblioSphere Pathway edition) and it’s mounting an attack on North American markets. Gene Logic has valuable data assets (omics data and tissues) and analysis tools. And an entire cadre of companies specializes in generating omic data for clients — BG Medicine and Metabolon, for example — and they all do some characterization of the data. Some use tools from pathway technology providers.
Slightly farther up the systems biology chain, biosimulation and model technology providers also use pathway software to help inform their model building. Genstruct (see “Predicting the Future of Systems Biology,” Bio-IT World, September 2006) and Gene Network Sciences (see “GNS Charts ‘Unknown’ Biology,” Bio-IT World, October 2006), for example, both use Ariadne tools, which isn’t surprising in part because their respective inference technology approaches fit well with Ariadne’s natural language processing (NLP) expertise.
Indeed, there is a lingering debate over how best to build pathway databases. GeneGo and Ingenuity swear by manual curation of data (see “Search and Deploy,” Bio-IT World, October 2006). Ingenuity uses an army of contract scientists, whereas GeneGo mostly uses FTEs to ensure quality control. Ariadne instead uses NLP to build its database.
“One of the big differentiators is content at the moment,” says Julie Bryant, VP of business development for GeneGo. “Over the past two years, people have realized the value of manually curated data compared to text mining and NLP because I think two or three years ago they thought more was more. Now they think less is more. When they get all the text mining, they get some of what they want and an awful lot of what they don’t want.”
Not so fast, says Ariadne’s Mazo. “We curate some of the pathways, which are the end results. We don’t think it makes a lot of sense to curate individual interactions or individual facts. It’s similar to what’s happening in microarray data; each individual spot on the microarray has some error noise. After you’ve clustered a few genes or do some statistics, that result is pretty much robust and does not depend on this. Plus we have so many facts that we mine from the literature we can tolerate some error noise and still have a biologically meaningful result.”
Many observers say the two approaches can work together. NLP permits near real-time scanning of the literature, while curated material takes a longer journey into the database. Even then, all curators are not created equal.
“[Of those who] claim that their tools are based on manually curated databases by experienced Ph.D. level scientists, none of them explain or provide the information about the criteria or rules they follow in picking the data,” says Keshamouni. “For example, one would often see myelin basic protein (MBP) as an interacting protein for several kinases. MBP is routinely used as a substrate for several kinases in most of the in vitro kinase assays. An experienced Ph.D. scientist will easily identify this and never record this as a meaningful interaction for a particular kinase, but MBP routinely pops up as an interacting node for several kinases.”
Market youth is part of the problem. Leschly and his competitors say the pathway tool market mirrors the early gene expression array analysis market. As expression data sets grew, so did the need for tools to make sense of them and their sophistication. They argue that as pathway tools become easy enough to use, virtually every biologist will use them. And, of course, the data deluge isn’t restricted to gene expression data. GeneGo trumpets its impressive database of diverse data as a unique selling point and many users agree.
Historically, says Ingenuity CTO Ramon Felciano, researchers used pathway tools primarily for target identification and validation, in step with the growing use of HT technologies. Cancer was the dominant disease examined. Lead identification then became important, particularly with regard to understanding mechanism of action and identifying biomarkers. Most recently, toxicity and safety assessment has emerged as a key application.
Three Steps to Heaven
Felciano describes three typical ways researchers use pathway tools: “They’ve run an experiment, which may be high throughput or a traditional experiment, and they want to understand what the new data mean in the context of what is known about biology and frankly about drug information as well. We’ve got a large amount of knowledge around drugs and which molecules they interact with.
“We see roughly an equal number of people simply come in with questions about biology. Somebody says, ‘I need to learn about prostrate cancer, or I’m interested in understanding the role that kinases play in prostate cancer.’ We will tell you what we know about prostate cancer genes and proteins, which drugs have been approved for prostrate cancer and so forth.”
The third approach, Felciano says, is relatively new but is starting to gain traction. “It’s using IPA to actually develop novel models of pathways and molecular functions. Users have the ability to put their own hypotheses into the system, so essentially they draw in that interaction into the system and [they can] start to develop proprietary pathways based on a combination of the public research and their knowledge.”
Somewhat contrarily, the usually risk-averse pharmaceutical industry has led in adopting pathway tools. Bryant says that’s not surprising since pharma has a mountain of omic data to decipher.
Some observers have wondered if a shakeout is coming, and when or if pharmaceutical companies will stop sampling the tool smorgasbord and choose a single pathway supplier. Just last month, Ingenuity announced an important win at AstraZeneca, which conducted a lengthy evaluation of pathway tool suppliers before settling on Ingenuity.
“I think the leaders are crystallizing out fast,” Leschly says, but he quickly acknowledges much depends on pricing. “If somebody is pricing the software at $5,000, then pharma has no reason not to buy it. This is no different than the expression market. They bought every conceivable solution in ’96, ’97, and ’98, and then Affymetrix clearly crystallized out as a leading solution.”
Certainly there remains plenty of market for commercial pathway players to divvy up, and most have one or two enterprise-wide deployments to trumpet for marketing purposes. A more important challenge is improving the tools.
Tim Wiggin, a bioinformaticist with National Center for Integrative Biomedical Informatics at the University of Michigan, is a proponent of GeneGo’s Metacore. “I found out the hard way that it’s nearly impossible to reproduce a network unless you’re keeping exhaustive notes,” says Wiggin. “Even with clear notes, it’s still extremely easy to go astray if you miss something, and it’s hard to backtrack if you have a new idea. If MetaCore recorded a detailed history, like that produced by Genomatix applications, that would go a long way toward resolving this gripe.”
So there’s work to be done, but don’t try to take Wiggin’s Metacore suite away.
Further Reading: Bhalla, U.S. “Systems modeling: a pathway to drug discovery.” Current Opinion in Chemical Biology, 2005 Aug; 9(4):400-6.
Sidebar: Ariadne: Industrial Strength Software
Ariadne Genomics is a relatively new kid on the pathway block, founded in 2002 by folks from software developer Informax. It is therefore not surprising that Ariadne president Ilya Mazo cites robustness and scalability of Ariadne software as key differentiators.
“We are probably the only company out there right now that has a scalable product. Scalable means we have a desktop product that we sell for a few thousand dollars and that targets individual researchers and users. People can afford it in academia and everywhere. It scales up to enterprise solutions that you can deploy inside a large organization,” says Mazo.
Ariadne relies on powerful natural language processing technology to build its database and infer pathways. Core pathway products include ResNet 4.0, a database of more than 1 million relations for human, rat, and mouse, as well as a collection of more than 1,000 reconstructed signaling pathways, a portion of which have been manually curated. Its text-mining tool, MedScan, is a past Bio-IT World “Best of Show” winner, and has been well received. Pathway Studio (pathway analysis) and PathwayExpert (regulatory mechanisms and drug response modeling) are pathway analysis tools.
Known first for its desktop product, Ariadne now offers enterprise solutions. Mazo says the company has roughly 1,000 installations and another ten site licenses. The National Cancer Institute has also licensed Ariadne’s tool suite.
Based in Rockville, MD, the company has grown to 50 staff, including a sizeable contingent in Moscow. After depending on distributors for sales, Ariadne recently moved to a direct sales model, excluding Japan/Asia Pac.
Mazo says the need to understand biomarkers — “not just be presented with a 100-gene pattern” — and toxicology will drive the market. He is also bullish on inference technology’s ability to eventually make models more sophisticated, perhaps simulatable.
“Based on [client] information, you can start building kinetic models some more sophisticated logical inference models,” says Mazo. “There are of course companies that are doing this very successfully, perhaps not in the software product, but in contract research organizations. Genstruct is one example; GNS is another. I think we will have to move into this, but of course the challenge for us is to make such tools more simple so they will be adopted by hundreds and thousands of users, regular biologists.” J.R.
Sidebar: GeneGo: Breadth and Depth of Data
Founded in 2000 by Tatiana Nikolskaya, GeneGo has always emphasized data quality enforced by manual creation as well as the rapid incorporation of multiple data types. Based in St. Joseph, MI, the company now has 90 employees including 40-plus doing manual curation and 15 or so programmers. It has long claimed to be profitable.
Julie Bryant, VP of business development, says GeneGo’s customer base has doubled in each of the past three years, and now includes 165 unique individual clients and thousands of users. GlaxoSmithKline is one major customer.
Users have prodigious data requirements, says Bryant. “[It’s] not just genes and proteins; it’s proteins, compounds, antibiotics, endogenous compounds, transcription factors, and hormones. They’ve also realized that species is also important, and understood that being able to look at the whole of the cell life is important.” GeneGo can build merged metabolic and signaling networks for pathways, for example.
Metacore is GeneGo’s core product. The company describes it as, “an integrated software suite for functional analysis of microarrays, SAGE, proteomics and other experimental data [including] human protein-protein and protein-DNA interactions, transcriptional factors, signaling, metabolism and bioactive molecules.” GeneGo also offer a toxicogenomics platform, MetaDrug.
A recent enhancement, Compare Experiment, allows researchers to quickly compare results from several experiments. Also new is Map-editor to assist in building canonical maps. Customers can import their own data using MetaLink as a tool for importing.
Most recently, GeneGo began offering wet lab services to clients. The company was doing it to validate internal research programs and GeneGo has IP associated with an SIBR project to identify breast cancer biomarkers. It is currently in phase two of the program, which includes a novel large-scale gene expression and genotyping study to be run at Mayo Clinic and data analysis in MetaCore.
Available wet lab services include gene expression and proteomics, but not metabolomic work. Bryant was a little fuzzy about whether GeneGo actually owns wet lab equipment and facilities, saying instead it has access to wet lab capacity. Asked if GeneGo would consider purchasing a company to add wet capacity, she declined to comment .
An IPO or secondary offering might provide funding for acquisitions, but Bryant nixes the idea: “So we’re profitable. We’re growing organically. So as we have more funds we hire more people. Customers tell us we’re at least 18 months to two years ahead of our competition.” J.R.
Sidebar: Ingenuity: Ease of Use and More
Founded in 1998, Ingenuity Systems is the senior of the commercial pathway analysis tool provider family. Like GeneGo, it manually curates its pathway database (Ingenuity Knowledge Base, IKB) and emphasizes ease of use. Unlike GeneGo, it’s not profitable, but that’s by design, insists CEO Jake Leschly.
“Congratulations to them,” he says. “Obviously we could be profitable if we wanted to, but we’re taking a more long-term view trying to balance our investments” against current demand and expected future demand. “We’re close to profitable,” he says.
Headcount for the Mountain View, CA-based company has reached 80 and Ingenuity used its early mover status to achieve high penetration of the top 40 pharma. “I think everybody’s got a license more or less,” says Leschly. The company’s technology has been featured in close to 200 peer-review articles this year alone.
Ingenuity Pathway Analysis (IPA) and IKB are the core products. IPA has a reputation for ease of use and Leschly believes that is critical to reaching a broad market. IKB is growing regularly — just as competitor databases are. Ingenuity has chosen to keep its tool and database broad.
CTO Ramon Felciano says customers with specific needs can address them through other services offered by Ingenuity, such as its Directed Content Acquisition program. The company has also been expanding its ability to handle different omic data. It released support for SNP analysis earlier this year. Felciano says customers are asking for more chemical content and RNAi analysis support.
Ingenuity Labs is an in silico services offering. “Essentially, we’ve gone under the hood to do specific custom-tailored analytics and inference space computations that target a particular research problem a customer has, whether it is in predictive toxicology or combination therapeutics or any area which you could imagine,” says Felciano.
Says Leschly, “We started with gene expression because there was a huge need to make sense of the really large investment made in gene expression, and obviously focused on the pharma guys. Now as that begins to evolve, we are going to get to the point that the person we care the most about is individual researcher, anywhere, [and] provide tools that allows them to make sense out any biological inquiry, whether with an expression chip, or proteomics platform, or other platform.” J.R.