Market size and vendor jockeying aside, there is increasing agreement among researchers that the tools deliver clear value. “I believe workflow tools are making good headway in biopharma,” says Clarence Wang, Genzyme’s director of science computing. “The personal productivity gain is pretty recognizable. The organizational gain is not yet huge, but steadily growing.”
Wang describes two major uses: First, as an everyday workbench tool for informaticists to transform and manipulate data sets with easily repeatable and re-usable methods. “While an actual application may be created for a one-off type of task, the use of the workflow tool can be simpler than custom perl scripting, for example,” says Wang.
Second, the tools can be used as laboratory workflow applications that automate data manipulation and analysis in support of high-throughput platforms such as imaging and array processing. “The informaticists create applications that enable a high-throughput laboratory process or instrument workflow that would be completely impractical for a bench scientist to perform manually in a spreadsheet. Eventually, workflow tools can and should supplant many data analysis processes that are currently done with a combination of fragile spreadsheets and marginally re-usable macros,” says Wang.
At Pfizer, workflow tools, notably Pipeline Pilot, are becoming popular “to provide flexible extraction and processing of enterprise-wide, compound-centric screening and molecular property data,” says Enoch Huang, executive director, head of Computational Sciences Center of Emphasis (Pfizer Research).
Skepticism, however, persists about workflow’s uber sciBI aspirations (see, “Challenges Facing ‘Workflow’ Vendors”). Mark Murcko, VP and CTO for Vertex Pharmaceuticals, notes bluntly the potential has been, “ridiculously overhyped. People have made claims like these for data management tools for 20+ years. Discovering, developing, and commercializing drugs is a rather complex and multifaceted process. Enterprises like pharma and biotech companies need lots of different things to go right in order to function efficiently.”
“Data workflow is not the bottleneck,” says Murcko. “Clear thinking and risk management are the bottleneck. The best possible data management tools would provide 10% of the solution, at most. I wish that the problems of Pharma could all be solved by pipelining tools, but that is a fantasy.”
Early Days
In many ways the commercial workflow market is still young, spurred into existence by the proliferation of instruments, data types, and sheer volume of data. SciTegic, the developer of Pipeline Pilot, and InforSense, developer of KDE platform, were both formed in 1999. (Each won a Bio-IT World Best of Show award (SciTegic, ’02; InforSense, ’05) as young start-ups.) Today there is a frothy constellation of workflow/data flow offerings from commercial and open-source organizations.
In addition to Accelrys and IDBS, there is growing competition from BioFortis and Teranode. On the open-source side, KNIME is a popular choice, along with Genepattern from the Broad Institute and Taverna (developed by Carol Goble, University of Manchester). The biggest player across biopharma and academia is probably still home grown workflows, cobbled together by both IT-knowledgeable researchers as well as internal and external IT resources.
There was a time when defining workflow was fairly straightforward. They were mostly personal productivity tools that allowed a scientist to work with different experimental data types, perform calculations and generate reports. Some of the analytics were built into the basic offerings but third-party tools such as Spotfire visualization could be incorporated.
In reality, the tools were powerful but not necessarily easy to use. On the other hand, once created, a well-defined workflow could be archived and re-used. Celera Diagnostics, for example, working with IDBS/InforSense’s platform, is speeding up research and driving compliance with approved workflows to ensure consistency across research (see “Celera’s Workflow Informatics,” Bio•IT World, Sept 2009).
More recently, the simple workflow platforms have evolved. Improved collaboration capabilities, many more analytics, links to electronic notebooks, expanded scientific data search, and expanded networks of third-party point tools, web enablement, improving GUIs are all among the changes. A new category label is probably needed.
If Accelrys and IDBS/InforSense are the current market leaders—firm numbers are hard to find—the core differences between the two platforms stem from where they started and other products they are associated with.
Looking at the competitive landscape, Greg Caressi, SVP, Healthcare & Life Sciences for market-watcher Frost & Sullivan says, “Accelrys is more on the R&D to commercialization side whereas Infor-Sense seems to have really taken up the niche around translational medicine.”
Secret Sauce
Accelrys is a powerhouse among chemists. Its robust modeling and simulation tools (Discovery Studio) have dominated medicinal chemistry, and when it purchased SciTegic in 2004, many rich analytics were incorporated into Pipeline Pilot platform and sold to the Accelrys customer base. In recent years it has added biology capabilities in an effort to penetrate a wider market. Recently it agreed to merge with Symyx, adding the established Symyx ELN and content to the Accelrys portfolio.
Frank Brown, Accelrys CSO, says the data model is the power behind Pipeline Pilot. “[It] has to understand all these different data types, and yet not one single data item will clog up [the pipe]. The chemistry box knows how to look for the chemistry pieces and the imaging box knows how to look for imaging pieces as this stuff is streaming down the pipe at thousands of records per second. That’s really the secret sauce to pipeline pilot. We figured out how to stream all this data in extremely rapid fashion.”
Brown was an early user of Pipeline Pilot at Johnson & Johnson before joining Accelrys in 2006. The early user base actually required less hand-holding, he says: “We started with a community that was quite talented in informatics, which could take the tools and imagine what they might do with it and then build it. As IT has cut back, and become smaller, more and more [say], ‘I want an application and I want to roll it out.’” Accelrys is responding by building workbenches in translational medicine—“capabilities to demonstrate a 65 or 85 percent solution so they can imagine how this all hangs together as it becomes more complex,” he says.
InforSense’s roots were core datamining technology and a focus on biology/omic spheres. “We provide some best practices around omics analysis and we’ve been in this field for six or seven years so we’ve internally developed a huge amount of domain expertise,” says Jonathan Sheldon, former CSO of InforSense and now leading translational research for IDBS.
In 2009, InforSense was acquired by IDBS, whose ActivityBase suite is entrenched in biopharma. IDBS’ E-Workbook Suite also has a strong biology-centric offering, Biobook, in addition to Chembook. Lately IDBS/InforSense has emphasized biomarker development and translational development thrusts.
Chris Molloy, IDBS’ VP of corporate development, says the fully integrated InforSense forms one of three suites of IDBS software. “InforSense provides a sophisticated analytics and integration framework on top of not only the data that’s well-curated within ActivityBase and E-Workbook, but it also enables the integration of external data. So it’s the lynchpin in the portfolio, providing data management, business process management, and data deployment across the organization. That’s how the technologies fit and how the company fits together,” says Molloy.
Like Accelrys, IDBS is racing to make the tools easier to use and has packaged a Biomarker Datamining Solution as well as Translational Research Solution (ClinicalSense) to make deployment easier and time-to-benefit shorter.
“In terms of an environment where there are a lot of scientists, we have solutions where we packaged up a whole stack of functionality, the workflows are prebuilt and deployed as web applications in that environment,” says Sheldon. Citing the biomarker solution, he says, “We have capabilities around how you would analyze your omic data—genomic, proteomic, genetic—that are packaged so you’re not forcing the scientist into the workflow authoring environment.”
“If you are in a much larger organization with a large informatics and IT team and a more complex data landscape, then you want the ability to go the workflow environment and say actually I’d like to pull in that data source and I’d like to pull in some additional analytics and R scripts and you have the ability to do that through the workflow environment. It depends on the scale of the deployment.”
Genzyme’s Wang believes “The market and applications of workflow technology could be accelerated by development of standards around workflow definition and description, such as in the BPM space. This sort of pre-competitive approach would motivate instrument and scientific software providers to make their systems more readily integratable.”
Understandably, both vendors trumpet their successes. IDBS/InforSense touts the Dana-Farber Cancer Institute’s broad use of its Translational Research Solution. DFCI has collected a vast amount patient data and biological samples for research and has deployed ClinicalSense to speed access relevant clinical, sample, and research data and to streamline analysis.
Accelrys points to a Pfizer deployment, using Pipeline Pilot for all the ETL (extracts, transform, load) services, successfully gathering data and transforming it and adding more data tags to it. Brown says the concept is to gather data from multiple sites into a datamart, which then allows the team to provide several different ways to look at the data. “One way is through Pipeline Pilot; one way might be through the Sharepoint bridge we built with Pfizer; and the other one is through another front end, FileMaker Pro.”
Brown says Pfizer can rapidly create multiple team-based datamarts “because they are creating them in parallel fashion in the same Oracle instance and each datamart is not connected to any other, so they can pull one out and pull one down—the opposite idea of a data warehouse where everything is connected to forms.”
Bumping Heads
As both Accerlys and IDBS grow, they are starting to bump heads more frequently. The consensus is that Accelrys appears to have the stronger chemistry tool kit, but gaps in its biology tool kit. Unsurprisingly, the reverse is said about InforSense. Many companies use both tools.
One knock sometimes heard on the announced Accelrys-Symyx merger is that it makes Accelrys look even more chemistry-centric, whereas broader deployments mean both platforms must talk to both disciplines effectively.
Jonathan Usuka, former senior director, life science marketing at Accelrys, says his new company Celgene, where he is director of global business partnering, is ground zero for such integration activity between biologist and chemists. “We want to be able to have one type of notebook for search and so that’s exactly what InforSense and Pipeline Pilot are trying to do. They want to be extractors of this kind of information because chemists are using one thing and biologists are using another.”
Usuka says Celgene has traditionally used Pipeline Pilot, but has also been considering InforSense. This year, Celgene took out an enterprise-wide license with Accerlys to evaluate whether the Accelrys solution could be adopted across disciplines.
Biogen Idec is an example of a company using both vendors’ products. “Pipeline Pilot has been an integral tool for cheminformatics,” says William Hayes, director, decision support, Biogen Idec. “Our scientists (mostly the informatics folks) use it for developing quick analytical and data processing workflows.” But Hayes also uses InforSense to support literature informatics in various workflows involving data integration (see, “Search and Deploy,” Bio•IT World, Oct 2006).
One new workflow involves gathering drug safety literature from various search databases, normalizing the results, filtering out duplicates and previously seen literature, inserting a curation step for false positives and then delivering results in PDF or another format to drug safety scientists. Hayes says this new workflow is several-fold more time efficient. “We’ve been using InforSense to integrate multiple RSS feeds, filter, redeploy as combined feeds—similar to Yahoo Pipes but with internal feed integration for quite a while. It also has been used for ad hoc data analysis.”
Our major issue is that the workflow tools can be challenging to learn, and Hayes, who manages a small literature informatics team, doesn’t have enough time to easily develop heavier workflows from scratch. “We’ve been using consulting from InforSense to build more complicated workflows. However, once we have a workflow developed, we’ve been able to ‘clone’ them and re-purpose them to great effect,” he says.
“As a basic platform for data workflows, InforSense has seen slower adoption from what I can tell,” says Hayes. “I’ve not heard of anyone using KNIME yet in biopharma, but it’s looking quite promising... If we didn’t have both Pipeline Pilot for chemistry and InforSense for other data workflows, I know we’d be at least testing KNIME,” says Hayes.
The changing business model within biopharma—the arrival of personalized medicine, tightening links in the health care chain, and the ever-expanding volume of data—will force scientific data management systems to morph into scientific business intelligence platforms.
Sudeep Basu, Frost & Sullivan’s practice leader, innovation services, says biotech and pharma face major challenges integrating their different global divisions. Moreover, “they have all these repositories of knowledge in these niche areas and they are spread across multiple geographies, different segments of the organization, and across different technology platforms. All of that has to be brought together.” As most workflow/data management providers are small, they have trouble convincing top management at Big Pharma to bet big—enterprise-wide—on their tools. More likely, says Basu, is a partnering strategy with big players in business intelligence such as IBM or SAP, or perhaps acquisitions.
That still seems distant for several reasons. The life sciences share of the market is yet to reach the $200-million threshold that would be attractive to bigger players, although Frost & Sullivan projects adoption rates of 30-40% by Big Pharma. Moreover biopharma IT and R&D is often decentralized. As Basu says, “pharma doesn’t work as one unified organization where the CIO comes in and says here’s what you’ve got to run. It doesn’t work like that.”