Celera Pushes Boundaries with Automated Workflows

By John Russell

July 16, 2009 | The use of leading edge informatics tools is part of Celera’s DNA. In the race to sequence the human genome, high performance computing and sophisticated home-grown informatics were as important as high speed sequencing machines to Celera’s success. Celera is hardly that company today, having redefined its mission to become a leader in diagnostics and “personalized disease management.” What hasn’t changed is its use of leading edge informatics.

While widespread adoption of automated workflow tools has been sluggish in life sciencesunlike many other industriesCelera has jumped onboard, building on its several-years-old relationship with workflow platform vender InforSense. Making sense of vast quantities of data has always central to Celera’s mission; today that generally means performing GWA studies and associated functional genomics work to identify and help validate biomarkers. 

As explained by John Sninsky, Celera’s VP, Discovery Research, the turn to automated workflows for informatics analysis  occurred as a result of its business plan evolution. Famously founded to sell access to its treasure trove of high quality data, Celera transitioned its business model “over the years to have solely a diagnostic and pharmacogenomic focus.” Roughly two years ago Celera purchased a CLIA-approved clinical reference laboratory, the Berkeley Heart Lab, enabling it to also offer services and have the potential to generate new in vitro diagnostic products.

In changing direction, Celera sought to exploit unmet medical needs and initially settled on six areas ranging from autoimmune disease, neurological disease, liver disease, cancers, and cardiovascular disease. “Some were combinations of diseases, cardiovascular diseases such as not only myocardial infarction but also stroke,” says Sninsky A discovery team for each area was established and work progressed.

“Over the years we have found important associations but in some cases the therapeutics area hasn’t developed as rapidly as we would like. For example, we were hoping that new drugs would have come on board for Alzheimer’s disease so that our risk markers would have had value in early evaluation of treatment. Unfortunately, those therapeutics have not come along so we’ve discontinued our Alzheimer’s work,” he says

Celera has pragmatically pared back in areas in which the payoff seemed more distant. “The three areas that we’re focusing on now are cardiovascular disease, cancer, and liver disease. So we went from eight different teams down to three teams. [Nevertheless], you can imagine having eight teams do analyses; very quickly their processes diverged.”

Two major challenges prompted the adoption of automated workflows. “One is the industrial scale in which we have operated since our inception whether that’s from a sequencing point of view or from a SNP discovery or messenger RNA profiling perspective. That industrial approach generates large amounts of data and complex data that you need to filter, sort, analyze, and interpret,” he says.

The second issue, “was that as we pushed some of these analysis tools out to the disease area teams within Celera there began to be idiosyncratic modifications and decisions made about how one analysis would be done and what kind of filters that would be used. We started ending up with a non-standardized analysis, very similar but different for different disease indications.”

Adopting a platform able to automate workflows was an attractive solution. Sninsky knew InforSense CSO Jonathan Sheldon from when they both worked at Roche and called him to learn more. The two companies soon began collaboarating. The basic idea is to be able to rapidly build, archive and re-use workflows, which would bring efficiency, better control over the processes Celera scientists used. So it has turned out.

David Ross, Celera’s director of computational biology, says, “The standardization and uniformity led to remarkable improvements in how we as an organization dealt with the data.” He cites one workflow developed for expression analysis. “We sat down for a couple of afternoons and hammered out a fairly complex workflow and visual [report] that would have taken at least a couple of weeks for a developer to do. It’s also reduced the amount of time that the patent group needed to address the question of how a particular analysis was done. They don’t need to ask ‘Was this done? How was that done?’ It was done in a standardized way.”

Sninsky estimates efficiency has jumped perhaps five-fold, adding “over the last six month we’ve started to demonstrate to other parts of the companywhether that’s development or the clinical reference laboratorythe kind of productivity improvements that come with using these workflows. Although we served as the entrée for the Celera organization to the InforSense tools, my expectation is they are going to be embraced by a larger number of people in other groups.”

It is interesting to note that despite their power and roughly a decade to mature, informatics workflow automation tools haven’t been widely embraced yet. InforSense and SciTegic (Pipeline Pilot) were both founded in 1999 to bring the technology to life science research. Teranode was formed in 2002 with a similar strategy. But scaling up business has proven difficult. In 2004, Accerlys purchased SciTegic in 2004 and IDBS’s has just acquired InforSense.

No doubt there are many reasons. The platforms, though powerful, could be tricky to use. The fast pace of change in experimental and analysis technology has sometimes made companies reluctant to invest in automating workflows tools, thinking there will always be too much manual work required. The ability to easily integrate a sufficient diversity of third party analytical tools is also important. Even conveying clearly what the platforms do can be challenging. (Both InforSense and Accerlys/SciTegic have increasingly positioned themselves as business intelligence/analytics platforms suitable for many industries.)

It is perhaps useful to offer a description of an informatics workflow. “To me,” says Ross, “the [elements of a ] workflow are you grab data, most of the times from a database, which is both internal data and public data that we’ve put in, we manipulate that data [and] by that I mean pivot it or transform it or whatever we need to do to get it into the form we need to submit it for an analysis procedures or maybe a set of different analyses procedures that may be parallel or serial, then they are displayed and they can be manipulated even in display or displays are static. That pretty much encapsulates everything that we do.”

“[The InforSense platform] allows us to put a number of different analytical engines in the middle of that very easily. It allows us to add additional visuals downstream, so it’s an organic system. Things can be done in SAS or R or other things like Matlab and we can easily grab those new analytical procedures and try them ourselves. We’re particularly interested in the new semantic languages and databases and I think that’s a future area.” 

InforSense CSO Sheldon agrees, “In an area like genetics where it’s evolving at such a pace, there really isn’t a set of five standard workflows that do genome wide association. You need a very open platform. You need the ability to rapidly integrate a whole variety of different algorithms from different data sources. Workflow has been the mechanism by which you [Celera] could rapidly prototype different approaches to analyze the data.”

Working together early has benefited both companies. InforSense was still developing what would become its Translational Research Solution (TRS), aimed at biomarker discovery activity. Celera was able to influence its direction, for example suggesting early on inclusion of a SAS node to meet Celera’s needs.

Sheldon says, “It’s worth saying that at the start of our relationship we were working on a genetics module for the platform and clearly a lot of the input that John and David have given us over the years has really helped kind of fine tune GenSense so we’re able to cope with the data types that you see in genetic analysis and we have in the system analytical methods which are appropriate for genome wide associations.”

The TRS includes a variety of modules for various omic analyses plus ClinicalSense “for carrying out cohort identification and patient stratification, which is typically one of the first steps that you carry out in a translational research study to identify biomarkers,” says Sheldon. VisualSense is the module for report generation, although Spotfire is also supported.

In addition to Celera, “we worked a lot with large medical institutes like the Mayo Clinic and Dana Farber and learned a tremendous amount about translational research over the last four or five years and we’ve encapsulated into the product,” says Sheldon. The fact that there is a community of InforSense users willing to share best practices is another plus, says Sninsky.

The surge in GWA studies and translational medicine approaches may boost demand for informatics workflow tools and Sheldon says he is seeing evidence of this now. InforSense is working closely with two major pharma now and several others at earlier stages. Time will tell how the molecular diagnostics field evolves. Celera has at least one test approved (KIF6) and is busy looking for more.  

This article first appeared in Bio-IT World’s Predictive Biomedicine newsletter. Click here for a free subscription.
Click here to login and leave a comment.  


Add Comment

Text Only 2000 character limit

Page 1 of 1