YouTube Facebook LinkedIn Google+ Twitter Xinginstagram rss  

Workflow’s Towering Aspirations

The challenges facing the primary producers of workflow and pipeline software—Accelrys and InforSense—may not be so much with each other as evolving in step with pharma’s needs. 

July 29, 2010 | It’s tempting to paint the commercial workflow market as a fierce dual between Accelrys’ Pipeline Pilot (see, “Pipeline Pilot 8.0”) and IDBS’ new acquisition, InforSense. A more important question may be how far workflow tools can grow beyond personal productivity instruments into what Frost & Sullivan has termed scientific business intelligence platforms serving the enterprise (sciBI). Without such growth, workflow vendors face the constraints of a niche market.

Market size and vendor jockeying aside, there is increasing agreement among researchers that the tools deliver clear value. “I believe workflow tools are making good headway in biopharma,” says Clarence Wang, Genzyme’s director of science computing. “The personal productivity gain is pretty recognizable. The organizational gain is not yet huge, but steadily growing.”

Wang describes two major uses: First, as an everyday workbench tool for informaticists to transform and manipulate data sets with easily repeatable and re-usable methods.  “While an actual application may be created for a one-off type of task, the use of the workflow tool can be simpler than custom perl scripting, for example,” says Wang.

Second, the tools can be used as laboratory workflow applications that automate data manipulation and analysis in support of high-throughput platforms such as imaging and array processing. “The informaticists create applications that enable a high-throughput laboratory process or instrument workflow that would be completely impractical for a bench scientist to perform manually in a spreadsheet. Eventually, workflow tools can and should supplant many data analysis processes that are currently done with a combination of fragile spreadsheets and marginally re-usable macros,” says Wang.

At Pfizer, workflow tools, notably Pipeline Pilot, are becoming popular “to provide flexible extraction and processing of enterprise-wide, compound-centric screening and molecular property data,” says Enoch Huang, executive director, head of Computational Sciences Center of Emphasis (Pfizer Research).

Skepticism, however, persists about workflow’s uber sciBI aspirations (see, “Challenges Facing ‘Workflow’ Vendors”). Mark Murcko, VP and CTO for Vertex Pharmaceuticals, notes bluntly the potential has been, “ridiculously overhyped. People have made claims like these for data management tools for 20+ years. Discovering, developing, and commercializing drugs is a rather complex and multifaceted process. Enterprises like pharma and biotech companies need lots of different things to go right in order to function efficiently.”

“Data workflow is not the bottleneck,” says Murcko. “Clear thinking and risk management are the bottleneck. The best possible data management tools would provide 10% of the solution, at most. I wish that the problems of Pharma could all be solved by pipelining tools, but that is a fantasy.”

Early Days

In many ways the commercial workflow market is still young, spurred into existence by the proliferation of instruments, data types, and sheer volume of data. SciTegic, the developer of Pipeline Pilot, and InforSense, developer of KDE platform, were both formed in 1999. (Each won a Bio-IT World Best of Show award (SciTegic, ’02; InforSense, ’05) as young start-ups.) Today there is a frothy constellation of workflow/data flow offerings from commercial and open-source organizations.

In addition to Accelrys and IDBS, there is growing competition from BioFortis and Teranode. On the open-source side, KNIME is a popular choice, along with Genepattern from the Broad Institute and Taverna (developed by Carol Goble, University of Manchester). The biggest player across biopharma and academia is probably still home grown workflows, cobbled together by both IT-knowledgeable researchers as well as internal and external IT resources.

There was a time when defining workflow was fairly straightforward. They were mostly personal productivity tools that allowed a scientist to work with different experimental data types, perform calculations and generate reports. Some of the analytics were built into the basic offerings but third-party tools such as Spotfire visualization could be incorporated.

In reality, the tools were powerful but not necessarily easy to use. On the other hand, once created, a well-defined workflow could be archived and re-used. Celera Diagnostics, for example, working with IDBS/InforSense’s platform, is speeding up research and driving compliance with approved workflows to ensure consistency across research (see “Celera’s Workflow Informatics,” Bio•IT World, Sept 2009).

More recently, the simple workflow platforms have evolved. Improved collaboration capabilities, many more analytics, links to electronic notebooks, expanded scientific data search, and expanded networks of third-party point tools, web enablement, improving GUIs are all among the changes. A new category label is probably needed.

If Accelrys and IDBS/InforSense are the current market leaders—firm numbers are hard to find—the core differences between the two platforms stem from where they started and other products they are associated with.

Looking at the competitive landscape, Greg Caressi, SVP, Healthcare & Life Sciences for market-watcher Frost & Sullivan says, “Accelrys is more on the R&D to commercialization side whereas Infor-Sense seems to have really taken up the niche around translational medicine.”

Secret Sauce

Accelrys is a powerhouse among chemists. Its robust modeling and simulation tools (Discovery Studio) have dominated medicinal chemistry, and when it purchased SciTegic in 2004, many rich analytics were incorporated into Pipeline Pilot platform and sold to the Accelrys customer base. In recent years it has added biology capabilities in an effort to penetrate a wider market. Recently it agreed to merge with Symyx, adding the established Symyx ELN and content to the Accelrys portfolio.

Frank Brown, Accelrys CSO, says the data model is the power behind Pipeline Pilot. “[It] has to understand all these different data types, and yet not one single data item will clog up [the pipe]. The chemistry box knows how to look for the chemistry pieces and the imaging box knows how to look for imaging pieces as this stuff is streaming down the pipe at thousands of records per second. That’s really the secret sauce to pipeline pilot. We figured out how to stream all this data in extremely rapid fashion.”

Brown was an early user of Pipeline Pilot at Johnson & Johnson before joining Accelrys in 2006. The early user base actually required less hand-holding, he says: “We started with a community that was quite talented in informatics, which could take the tools and imagine what they might do with it and then build it. As IT has cut back, and become smaller, more and more [say], ‘I want an application and I want to roll it out.’” Accelrys is responding by building workbenches in translational medicine—“capabilities to demonstrate a 65 or 85 percent solution so they can imagine how this all hangs together as it becomes more complex,” he says.

InforSense’s roots were core datamining technology and a focus on biology/omic spheres. “We provide some best practices around omics analysis and we’ve been in this field for six or seven years so we’ve internally developed a huge amount of domain expertise,” says Jonathan Sheldon, former CSO of InforSense and now leading translational research for IDBS.

In 2009, InforSense was acquired by IDBS, whose ActivityBase suite is entrenched in biopharma. IDBS’ E-Workbook Suite also has a strong biology-centric offering, Biobook, in addition to Chembook. Lately IDBS/InforSense has emphasized biomarker development and translational development thrusts.

Chris Molloy, IDBS’ VP of corporate development, says the fully integrated InforSense forms one of three suites of IDBS software. “InforSense provides a sophisticated analytics and integration framework on top of not only the data that’s well-curated within ActivityBase and E-Workbook, but it also enables the integration of external data. So it’s the lynchpin in the portfolio, providing data management, business process management, and data deployment across the organization. That’s how the technologies fit and how the company fits together,” says Molloy.

Like Accelrys, IDBS is racing to make the tools easier to use and has packaged a Biomarker Datamining Solution as well as Translational Research Solution (ClinicalSense) to make deployment easier and time-to-benefit shorter.

“In terms of an environment where there are a lot of scientists, we have solutions where we packaged up a whole stack of functionality, the workflows are prebuilt and deployed as web applications in that environment,” says Sheldon. Citing the biomarker solution, he says, “We have capabilities around how you would analyze your omic data—genomic, proteomic, genetic—that are packaged so you’re not forcing the scientist into the workflow authoring environment.”

“If you are in a much larger organization with a large informatics and IT team and a more complex data landscape, then you want the ability to go the workflow environment and say actually I’d like to pull in that data source and I’d like to pull in some additional analytics and R scripts and you have the ability to do that through the workflow environment. It depends on the scale of the deployment.”

Genzyme’s Wang believes “The market and applications of workflow technology could be accelerated by development of standards around workflow definition and description, such as in the BPM space. This sort of pre-competitive approach would motivate instrument and scientific software providers to make their systems more readily integratable.”

Understandably, both vendors trumpet their successes. IDBS/InforSense touts the Dana-Farber Cancer Institute’s broad use of its Translational Research Solution. DFCI has collected a vast amount patient data and biological samples for research and has deployed ClinicalSense to speed access relevant clinical, sample, and research data and to streamline analysis.

Accelrys points to a Pfizer deployment, using Pipeline Pilot for all the ETL (extracts, transform, load) services, successfully gathering data and transforming it and adding more data tags to it. Brown says the concept is to gather data from multiple sites into a datamart, which then allows the team to provide several different ways to look at the data. “One way is through Pipeline Pilot; one way might be through the Sharepoint bridge we built with Pfizer; and the other one is through another front end, FileMaker Pro.”

Brown says Pfizer can rapidly create multiple team-based datamarts “because they are creating them in parallel fashion in the same Oracle instance and each datamart is not connected to any other, so they can pull one out and pull one down—the opposite idea of a data warehouse where everything is connected to forms.”

Bumping Heads

As both Accerlys and IDBS grow, they are starting to bump heads more frequently. The consensus is that Accelrys appears to have the stronger chemistry tool kit, but gaps in its biology tool kit. Unsurprisingly, the reverse is said about InforSense. Many companies use both tools.

One knock sometimes heard on the announced Accelrys-Symyx merger is that it makes Accelrys look even more chemistry-centric, whereas broader deployments mean both platforms must talk to both disciplines effectively.

Jonathan Usuka, former senior director, life science marketing at Accelrys, says his new company Celgene, where he is director of global business partnering, is ground zero for such integration activity between biologist and chemists. “We want to be able to have one type of notebook for search and so that’s exactly what InforSense and Pipeline Pilot are trying to do. They want to be extractors of this kind of information because chemists are using one thing and biologists are using another.”

Usuka says Celgene has traditionally used Pipeline Pilot, but has also been considering InforSense. This year, Celgene took out an enterprise-wide license with Accerlys to evaluate whether the Accelrys solution could be adopted across disciplines.

Biogen Idec is an example of a company using both vendors’ products. “Pipeline Pilot has been an integral tool for cheminformatics,” says William Hayes, director, decision support, Biogen Idec. “Our scientists (mostly the informatics folks) use it for developing quick analytical and data processing workflows.” But Hayes also uses InforSense to support literature informatics in various workflows involving data integration (see, “Search and Deploy,” Bio•IT World, Oct 2006).

One new workflow involves gathering drug safety literature from various search databases, normalizing the results, filtering out duplicates and previously seen literature, inserting a curation step for false positives and then delivering results in PDF or another format to drug safety scientists. Hayes says this new workflow is several-fold more time efficient.  “We’ve been using InforSense to integrate multiple RSS feeds, filter, redeploy as combined feeds—similar to Yahoo Pipes but with internal feed integration for quite a while.  It also has been used for ad hoc data analysis.”

Our major issue is that the workflow tools can be challenging to learn, and Hayes, who manages a small literature informatics team, doesn’t have enough time to easily develop heavier workflows from scratch. “We’ve been using consulting from InforSense to build more complicated workflows. However, once we have a workflow developed, we’ve been able to ‘clone’ them and re-purpose them to great effect,” he says.

“As a basic platform for data workflows, InforSense has seen slower adoption from what I can tell,” says Hayes. “I’ve not heard of anyone using KNIME yet in biopharma, but it’s looking quite promising... If we didn’t have both Pipeline Pilot for chemistry and InforSense for other data workflows, I know we’d be at least testing KNIME,” says Hayes.

The changing business model within biopharma—the arrival of personalized medicine, tightening links in the health care chain, and the ever-expanding volume of data—will force scientific data management systems to morph into scientific business intelligence platforms.

Sudeep Basu, Frost & Sullivan’s practice leader, innovation services, says biotech and pharma face major challenges integrating their different global divisions. Moreover, “they have all these repositories of knowledge in these niche areas and they are spread across multiple geographies, different segments of the organization, and across different technology platforms. All of that has to be brought together.” As most workflow/data management providers are small, they have trouble convincing top management at Big Pharma to bet big—enterprise-wide—on their tools. More likely, says Basu, is a partnering strategy with big players in business intelligence such as IBM or SAP, or perhaps acquisitions.

That still seems distant for several reasons. The life sciences share of the market is yet to reach the $200-million threshold that would be attractive to bigger players, although Frost & Sullivan projects adoption rates of 30-40% by Big Pharma. Moreover biopharma IT and R&D is often decentralized. As Basu says, “pharma doesn’t work as one unified organization where the CIO comes in and says here’s what you’ve got to run. It doesn’t work like that.”

Pipeline Pilot 8.0 

Accelrys Pipeline Pilot supports more scientists and researchers working individually or as collaborative teams across the wider scientific R&D enterprise, according to Matt Hahn, Accelrys’ senior vice president of R&D. (Hahn was the co-founder and former CEO of SciTegic, which originally developed Pipeline Pilot before the firm was acquired by Accelrys in 2004.)

Pipeline Pilot was developed, as Hahn says, to “capture processes that scientists go through, allowing them to make decisions and understand data, without doing the manual steps over and over again. We understand the complex data of R&D driven companies—molecules, biology, sequence, image, textual data. We make it easy to integrate those data.”

Pipeline Pilot can be deployed through the Web or portal clients such as Microsoft Office SharePoint. “In many cases, clients don’t know they’re using Pipeline Pilot,” says Hahn, “It’s the ‘Intel inside.’”

Hahn says Accelrys has three major goals with Pipeline Pilot 8. “First, move laterally from research chemistry into biology. A significant thrust that we have is to position in biology as strong as chemistry.” Second is imaging. “The growing volume of [life sciences] data is largely image data. We need to understand images as easily as biological and chemical data.” And third, to provide a window into understanding scientific data, moving beyond life sciences into, for example, the material science area.

“Much of our effort has been to make the platform enterprise ready,” says Hahn. To that end, the new edition has improved scalability, so users can provide protocols that can be executed by larger groups, e.g. a Web-based report or running a project overnight.

The 8.0 release is Windows Server 2008 64-bit compatible. “Over 50% our customers run on Windows servers,” says Hahn. “We also support GRID and cluster processing.” Accelrys has also added new capability to monitor remote servers and beefed up security. There is improved two-way integration with SharePoint as well as Perkin Elmer’s Columbus (for image data management) and Thermo Scientific’s Cellomics (for high-content screening).

The Pipeline Pilot release follows the Accelrys merger with Symyx. “Symyx [is] a dominant data management company with an emerging electronic lab notebook business and applicable across various scientific domains. We have the middle piece of the analysis—the automation. Our strategy is to integrate the best of both product lines to create a new generation of products.” — Kevin Davies


Challenges Facing ‘Workflow’ Vendors

Former Accelrys marketing executive Jonathan Usuka is enthusiastic about the workflow tools deployed at his new company, Celgene, but here in his own words, he notes the substantial “mega challenges” confronting vendors’ ambitions.

1. M&As Plus Fewer Big Customers “There are two major dynamics. One is the M&A activity and the second is most new compounds feeding into development pipeline are coming as in-licensing deals from biotech. As a vendor of these kinds of (workflow) systems, instead of hitting up your ten best customers, you’re now getting less business from what are now your five best customers and you have to navigate a whole constellation of small biotechs to get to the same number of users. The user base hasn’t grown it’s just shifted.”

2. Off-Shoring of Chemical Synthesis and R&D “Pharma or biotechs are not doing their own chemical synthesis anymore—they’re outsourcing to CROs in China and India. So now, even if you are trying to make good chemical development decisions, you need to incorporate data from CROs in China, and make your decisions, in my case in San Diego. Neither of these tools (Accelrys or IDBS) or any of the others are really addressing that core challenge, which is communications between businesses, not communications between databases.”

3. The Open Source Challenge “There is a direct competitor to Pipeline Pilot called KNIME. This freeware does exactly the same things that InforSense and Pipeline Pilot do. It’s very good for academics. Pharmaceutical companies would be loath to depend on it because of the way pharma works. The problem [vendors] face with KNIME is that academics are active in giving back scientific expertise technology. If a component doesn’t exist for what they want done, they will build one. Commercial vendors don’t have that kind of flexibility of user base.”

4. SciBI or SciFi? Would Senior Execs Use It? “All of this technology is enabling good pharmaceutical development decisions—which compound should continue, which ones you should kill, are you developing the best compounds, etc? But the people who make those decisions currently do so entirely by PowerPoint! They sit in a room, have presentations from project team leaders, and then a decision eventually gets made. Would any of that be made more efficient if it were hooked up to real-time systems analysis software? Could a dashboard be built that would tell you answers? Is it better to have real time reporting than to have people build up PowerPoint presentations and make cases to you?”


This article also appeared in the July-August 2010 issue of Bio-IT World Magazine. Subscriptions are free for qualifying individuals. Apply today.
View Next Related Story
Click here to login and leave a comment.  


Add Comment

Text Only 2000 character limit

Page 1 of 1

For reprints and/or copyright permission, please contact Angela Parsons, 781.972.5467.