Automated workflows from InforSense are now a part of Celera’s DNA.
By John Russell
September 15, 2009 | While widespread adoption of automated workflow tools has been sluggish in life sciences, Celera has jumped onboard, building on its several-years-old relationship with workflow platform vender InforSense. Nowadays, Celera’s data handling comes in the form of genome-wide association studies (GWAS) and functional genomics work to identify and validate biomarkers.
As explained by John Sninsky, Celera’s VP, Discovery Research, the turn to automated workflows for informatics analysis stemmed from its business plan evolution. Famously founded to sell access to its trove of genomic data, Celera transitioned its business model “to have solely a diagnostic and pharmacogenomic focus.” Two years ago, Celera purchased a CLIA-approved clinical reference laboratory—the Berkeley Heart Lab—enabling it to offer services and generate in vitro diagnostic products.
In changing direction, Celera sought to exploit unmet medical needs. It initially settled on six areas ranging from autoimmune disease, neurological disease, liver disease, cancers, and cardiovascular disease. A discovery team for each area was established and work progressed. “Over the years we have found important associations but in some cases the therapeutics area hasn’t developed as rapidly as we would like,” says Sninsky. “For example, we were hoping that new drugs would have come on board for Alzheimer’s disease so that our risk markers would have had value in early evaluation of treatment.” Alas, those therapeutics did not materialize and the Alzheimer’s program has been suspended.
Indeed, Celera has pragmatically pared back in areas where the payoff seemed more distant, focusing instead on three areas: cardiovascular disease, cancer, and liver disease.
Two major challenges prompted the adoption of automated workflows. One was the sheer industrial scale that Celera has traditionally managed, whether it be sequencing, SNP discovery, or mRNA profiling. “That industrial approach generates large amounts of data and complex data that you need to filter, sort, analyze, and interpret,” Sninsky says.
Second, as the analysis tools were pushed out into the disease teams, “there began to be idiosyncratic modifications and decisions made about how one analysis would be done and what kind of filters would be used. We started ending up with a non-standardized analysis, very similar but different for different disease indications.”
Sninsky happened to know InforSense CSO Jonathan Sheldon from when they were colleagues at Roche. Before long, the two companies were collaborating. The idea is to be able to rapidly build, archive, and re-use workflows, which would bring efficiency, better control over the processes Celera scientists used.
David Ross, Celera’s director of computational biology, says, “The standardization and uniformity led to remarkable improvements in how we as an organization dealt with the data.” He cites one workflow developed for expression analysis over a couple of afternoons that would have taken weeks for a developer. “It’s also reduced the amount of time that the patent group needed to address the question of how a particular analysis was done,” he says. “They don’t need to ask, ‘Was this done? How was that done?’ It was done in a standardized way.”
Sninsky estimates efficiency has jumped five-fold, and notes that those productivity improvements are getting the attention of other divisions at Celera, including development and the clinical reference laboratory. “Although we served as the entrée for the Celera organization to the InforSense tools, my expectation is they are going to be embraced by a larger number of people,” says Sninsky.
Despite their power, informatics workflow automation tools have taken time to catch on. InforSense and SciTegic (Pipeline Pilot) were both founded in 1999 to bring the technology to life science research. Teranode was formed in 2002 with a similar strategy. But scaling up business has proven difficult. Accelrys purchased SciTegic in 2004 and IDBS has just acquired InforSense.
Among the many possible reasons is that the platforms, though powerful, can be tricky to use. The pace of change in experimental and analysis technology has made companies reluctant to invest in automating workflow tools, suspecting there will always be too much manual work required. The ability to easily integrate a sufficient diversity of third-party analytical tools is also important. Even conveying clearly what the platforms do can be challenging. (Both InforSense and Accelrys/SciTegic have increasingly positioned themselves as business intelligence/analytics platforms suitable for many industries.)
So what is an informatics workflow? Ross describes the elements of a workflow this way: “You grab data, most of the times from a database, which is both internal data and public data that we’ve put in; we manipulate that data—by that I mean pivot it or transform it or whatever we need to do to get it into the form we need to submit it for an analysis procedure… then they are displayed and they can be manipulated.”
The InforSense platform “allows us to put a number of different analytical engines in the middle of that very easily.” It’s an organic system, he says. Manipulations can be done in SAS or R or Matlab. “We can easily grab those new analytical procedures and try them ourselves. We’re particularly interested in the new semantic languages and databases.”
Sheldon agrees. “In an area like genetics,” he says, “there really isn’t a set of five standard workflows that do genome-wide associations. You need a very open platform. You need the ability to rapidly integrate a whole variety of different algorithms from different data sources. Workflow has been the mechanism by which [Celera] could rapidly prototype different approaches to analyze the data.”
Working together early has benefited both companies. InforSense was still developing what would become its Translational Research Solution (TRS), aimed at biomarker discovery activity. Celera was able to influence its direction, for example suggesting early on the inclusion of a SAS node to meet its needs.
Sheldon credits much of the input from Sninsky and Ross over several years to fine tune GenSense, such that InforSense can cope with genomics data types and include analytical methods appropriate for GWAS. The TRS includes modules for various ‘omic analyses, as well as ClinicalSense for cohort identification and patient stratification, which Sheldon says is “typically one of the first steps that you carry out in a translational research study to identify biomarkers.” VisualSense is the module for report generation, although Spotfire is also supported.
InforSense has also worked extensively with medical institutes such as the Mayo Clinic and Dana Farber, encapsulating that translational research insight into the product, says Sheldon. Sheldon is seeing signs that the surge in GWAS and translational medicine approaches will boost demand for informatics workflow tools. InforSense is working closely with two major pharma and several others at earlier stages. Meanwhile, Celera has at least one test approved (KIF6) and is busy looking for more.
This article also appeared in the September-October 2009 issue of Bio-IT World Magazine.
Subscriptions are free for qualifying individuals. Apply today.