June 8, 2011 | At the recent Molecular Medicine Tri-Conference (MMTC) in San Francisco*, an industry executive took me aside to ask: “Is Cloud computing ever going to take off?” The question frames an expectation that Cloud computing should be happening faster, but Cloud adoption depends on your definition of Cloud computing. If you consider the Cloud as “information-as-a-service,” then life sciences and biological research are indeed the early adopters. Web portals such as NCBI Entrez, the UCSC Genome Browser, and EBI Ensemble carry a tremendous computational load for academic and industrial research. Despite large investments in private bio-portals, researchers often prefer public portals, because the information is more up to date and has better user interfaces.
In contrast, many private portals have not kept pace. In response, pharma IT executives have launched the Pistoia Alliance, a pre-competitive initiative designed to sponsor innovation by defining interoperability standards for commercial tools (see, “The Italian (Informatics) Job,” Bio•IT World, Jan 2010). Pharma executives are giving each other permission to take the lead and outsource services they previously delivered in-house.
Commercial services such as CAS Registry, STN, Delphion, Micropatent, Ingenuity, GeneGo, and GenomeQuest provide information services in secure, private Clouds, offloading significant computational loads from in-house networks and servers. Taken together, the public Web portals and commercial services indicate a significant portion of the market has already moved to the Cloud.
End users don’t care where or how the information is computed and stored. A successful case study for Cloud computing is the Virtual Proteomics Data Analysis Cluster (ViPDAC) at the University of Wisconsin. The system runs protein identification both locally and remotely on Amazon servers, handling all data transfers, machine requisitioning and decommissioning, and parallelization operations for the end-users. This sounds exciting, so why aren’t all applications hosted in the Cloud?
Strong Head Winds
One reason is the Cloud value proposition doesn’t solve the data integration problem facing discovery research—and may actually make the issue harder. A large pharma enterprise has (minimally) hundreds of terabytes of genomics, proteomics, and clinical data to integrate and tens of thousands of relational tables to manage. Forgetting privacy and security concerns, it is logistically impossible to imagine how these data would migrate to the Cloud.
The other resistance to Cloud computing is inherent in the architecture of the applications themselves. Since 2000, many applications have migrated from client-server architecture to Web-based. Whereas client-server apps operate by downloading data to the client and processing it there, Web-based systems process all the data on the server and only render the results in the browser. For this reason, Web-based applications are more likely to migrate into the Cloud.
Next-gen sequencing (NGS) and high-performance Cloud computing are highly symbiotic. NGS produces reams of data; Cloud computing promises on-demand access to unlimited computing and storage. The obstacle to Cloud adoption remains continued reliance on sneakernets for big file transfers.
Proponents of Cloud solutions argue that so long as bulk data transfers are automated and occur within the cycle time of the sequencing instrument, automation prevails and everyone should be happy. However, most transfer time calculations consider bandwidth on the receiving end, not the fluctuating bandwidth on the sending end where most of the problems occur. Until corporate networks grow to accommodate sustained bulk transfers, sneakernet will remain the preferred method.
An emergent class of pharmaceutical R&D Cloud applications is slowly migrating computing cycles out of the enterprise and into the Cloud and associated Web browser. Applications such as Surechem.com, Chemspider.com, and Metamolecular.com are pure play Web applications taking aim at traditional pharma R&D applications. A stealth Web site—www.druggable.com—claims to provide a comprehensive and intuitive index of druggable targets, chemistry, experimental activity, crystallographic structures, and in silico docking predictions. Another site—Transparency Life Sciences—is a stealth open-source discovery enterprise.
Ironically, these Web-applications, by adhering to strict programming rules, retrieve their data from the Web but perform much of their computation in the browser. Software is once again shifting the computation from the server back to the desktop. Fortunately, the only one who needs to care is the programmer.
An informal poll of software executives at MMTC indicated that established firms currently host between 5-50% of their customers applications and data. All executives felt the trend to the Cloud was rising in their market spaces. Taking into account the evolution of the software programming methods, the integration challenges of Pharma R&D and the networking hurdles inherent in remote computing, my sense is the transition to the Cloud is on schedule and will continue its upward trend for the foreseeable
Ron Ranauro, former CEO of GenomeQuest, is the founder of Next-Gen Informatics. He can be reached at firstname.lastname@example.org.