Cloud-based portal for easy collaboration could seal the deal for pharma.
By Kevin Davies
March 29, 2011 | Recent discussions with several big pharmas, including Sanofi-Aventis and Pfizer, have convinced software architect Karim Chine that big pharma is ready for the Cloud. That’s good news for Chine, who is the designer of an exciting Cloud-based portal called Elastic-R for data analysis and collaboration.
Elastic-R—a secure scientific Web computing platform and a portal for virtual and collaborative computational science—consists of some 250,000 lines of code that Chine, who is based in Cambridge, U.K., has written in five languages over the past three years. “R is becoming the lingua franca of data analysis,” says Chine. “Elastic-R allows you to run anything useful near the data, and expose to the client (your browser, Excel, Word etc.) capabilities needed to take control.” Chine’s Elastic-R application “wraps the Cloud with capabilities that allow you to make the Cloud work for the user’s needs and software.” He argues that pharma companies need a framework for statistical drug discovery applications. “The idea of having a consistent platform for data analysis and scientific computing is a big need for many large pharma companies,” says Chine. For example, discussions with Oliver Gigonzac (Sanofi-Aventis) and others indicate that pharma has “plenty of applications related to drug discovery and data analysis, but it’s made of bits and pieces, heterogeneous platforms and technologies. They’re looking for a consistent infrastructure.”
Cloud computing stands to make a dramatic change on research practices both in terms of cost and providing new capabilities and flexibility. Provisioning a new machine in the Cloud takes a matter of minutes. “Keeping track of the software and the stack of capabilities used for processing data can be archived at any time. You can trace what you’ve used for data analysis,” says Chine. Moreover, with so many high-throughput computing challenges in life sciences, the Cloud is perfect for tackling highly parallel computational problems.
The problem, as Chine sees it, is taking the data to the Cloud. “Amazon offers capabilities, but other companies allow you to take the data there. Once your data are there, you can benefit from the cloud elasticity and infinite resource capacity. Elastic-R is there to make running massively parallel computations as easy as adding a formula to a spreadsheet cell.”
Chine’s priority is facilitating real-time collaboration using the Cloud. “Imagine being able to do data analysis in a virtual environment, where we can interact with the data, build the analysis, and look at the data together, keep track of the procedure and make it reproducible.”
For Pfizer and other pharma companies, that is a recurring problem, with no shortage of tools to let people collaborate on documents. But says Chine, “Nobody, including SAS, offers help for real-time collaboration—as obviously as Facebook or Google Docs. That ubiquity is now possible with the pervasiveness of HTML5-enabled browsers and virtual environments such as Elastic-R. The Cloud is the back end of this new generation of ubiquitous applications.”
“When Microsoft talks about the Cloud, it’s not just SaaS but also Cloud services that are going to empower any Microsoft app on your desktop and get it to talk to Azure. But to make this happen for scientific applications, there’s a need for a new type of software to interact seamlessly with Amazon, for example, and let your Excel spreadsheet compute with a Cloud and work with various data analysis tools... You need a ubiquitous software platform that lives in the Cloud that delivers those capabilities. Elastic-R is a precursor of such platforms.”
Chine believes in taking the computation to the data rather than taking the data to the computer. “The idea of Elastic-R is to create smaller entities to run near the data. That entity looks into data and obeys your orders... Select Cloud, select service as a virtual machine that you’ll pay for hourly, then click: that action will bring to life a server that offers you whatever capability you need. It’s SaaS based on your own private appliances, delivered on your own Cloud platform.”
A query might be: transform this metric or normalize this data or process this image. Elastic-R brings the aggregated results back to the spreadsheet or document. Any researcher can, in a few clicks, get browser access to a set of tools commonly used in bioinformatics, such as R, PERL or Python. Indeed, Chine says his project could just as easily be called “Elastic-Python or whatever.”
Elastic-R was initially developed as a fully open-source project (the Biocep-R project, www.biocep.net ), but as he lacks sufficient grant funding, Chine is targeting a model whereby the www.elastic-r.org portal will be free to academics, but commercial users will require a subscription. One early user, BD, is considering both the deployment of Elastic-R on a private cloud and on Amazon EC2. In addition, Chime says that the Cambridge Centre for Scientific Computing has expressed interest in setting up a dedicated Elastic-R portal for Cambridge University researchers.
“I hope one day to have thousands of scientists, educators and students as users and to have a large number of pharma companies as subscribers,” says Chine. “The Cloud is a huge opportunity for countries and researchers that don’t have facilities. Imagine a lab in Tunisia needing to process or work on genomic data. If that data is available on Amazon, and if they can use the Cloud easily and freely via Elastic-R—which was designed to also work over a slow internet connection—that will open their horizons and facilitate large collaborations. The aim is to have a social impact.” •
This article also appeared in the March-April 2011 issue of Bio-IT World Magazine. Subscriptions are free for qualifying individuals. Apply today.