By Allison Proffitt
November 10, 2009 | Giles Day, senior director of informatics at Pfizer’s Biotherapeutics & Bioinnovation Center (BBC), has been experimenting with Amazon Web Services (AWS). “It really has transformed the way we do a lot of our work,” he said last April at the Bio-IT World Expo High Performance Computing & Storage workshop. “The pay-as-you-go model for us is really very, very nice.”
The line was echoed by Andrew Kaczorek, associate information consultant with Eli Lilly. “The amount of time it spends and the amount of dollars it costs are entirely predictable. If [researchers] know exactly what workload that they’re pushing into it, they can know exactly when it should be done and how much their bill is going to be.”
Kaczorek reported that Lilly was running clinical trial optimization in Amazon’s EC2 (Elastic Compute Cloud) and proteomics workflows—the company’s heaviest compute consumer—with the help of Cycle Computing’s CycleCloud and hoping to take advantage of Condor’s scheduling capabilities (see p. 28). “Since then,” Kaczorek told Bio•IT World, “we’ve gone after ten other internal applications. They cover a lot of different areas including proteomics, bioinformatics, statistics, [and] adaptive trial design.”
Day reported running their Rosetta macromolecular modeling in the cloud to determine antibody-antigen docking, using about 500 EC2 instances, reducing a typical experiment from 48 hours to about 3.5 hours. Day explained that Rosetta modeling is done in EC2, all results are stored in S3 (Simple Storage Service); the results are then scored, ranked and filtered in EC2, and returned to S3. Because bandwidth between EC2 and S3 is free, bringing the compute to the data saves the company significant money.
Johnson & Johnson Pharma R&D is hosting a couple of analytical solutions in the cloud, specifically NONMEM applications. “We built AWS scripts for configuring node images so that we can scale up or down in order to run large simulation jobs, which can take advantage of an on-demand cloud architecture,” says Rick Franckowiak, J&J’s information technology director.
Of course, using just a few hundred instances is not the end goal. Day put it frankly: “[Pfizer] could quite happily suck up many thousands of Amazon instances without any problem at all.”
“Within the next year, I’d like to see a handful of applications going into a production state,” says Franckowiak. “Generally we are looking at low risk applications, but I do see us at a point soon where these are ready for the production state. I would like to get to a point where we have a documented approach as to how we want to leverage cloud infrastructure for meeting our business objectives. I would also like to have a solid framework with respect to security and guidelines highlighting which solutions make sense in a cloud environment, which ones don’t, and a little background as to why, so we can get beyond simply selling the concepts.”
Once a few applications are running successfully, Franckowiak hopes to “start looking at what it will mean to scale at an enterprise level.” He sees the potential of the cloud increasing exponentially. “Right now, we’re dealing with a small number of opportunities on the cloud. What if that doubles or triples or quadruples or goes beyond that for a large portion of J&J?”
Kaczorek is running about ten applications, but while he doesn’t have hard goals for the amount of workload he’d like to put into the cloud in the next year or two, Lilly is “moving forward aggressively, both inside high performance computing and outside.” Kaczorek mentions doing non-research and discovery work in the cloud, “qualification of servers, collaboration utilities, and stuff like that.”
Kaczorek says one or two are new applications on the cloud couldn’t have been run internally. “Those are workloads that the amount of processing we do directly reflects on the precision of the data that the researchers get.”
Even though the results are promising, it is still early days. “We choose to be early because we see such great value in it, but with that comes a bit of added complexity,” concedes Kaczorek.
Franckowiak reported in April that cloud computing was a challenging sell to business partners, and the intervening months haven’t eased the pressure much. His team is still working to address security issues to those who view the cloud as “a very insecure, wild west environment.” He says the view is often that using cloud resources means “provisioning extra boxes that Amazon has lying around.” He does see a shift in attitudes though as more education is done around how the model works and on-disk encryption, in-memory encryption, and across the wire encryption are put into place.
Day reported that input and output from the cloud has been a problem for Pfizer, but not from Amazon’s end. Pfizer’s corporate IT group closed the Amazon ports at one point. “They didn’t think it was an experiment, they thought we were, I don’t know, storing MP3 files on S3 or something!”
Day also noted that the cloud isn’t all that easy to use. “Without Adam [Kraut, consultant with the BioTeam], I think we would have failed miserably in our attempts to use EC2 because it’s just not that easy.” Kaczorek agreed, but predicted that “it’s going to get easier.”
Furthermore, said Day, “We find it pretty hard to get metrics out of the cloud. What jobs are running, how long they’re taking,… shutting down your instances and just general management of the work that’s going on on the cluster is just not as simple as you think it should be.”
All of that is changing though, and quickly. Franckowiak believes that provisioning—in a truly automated fashion—compute power that allows the customer to manage, configure, and deploy virtual environments is still not quite here.
“When we talk about cloud now, we’re talking about Amazon AWS, we’re talking about EC2 and S3,” said Kaczorek in April. “That’s not going to be the case pretty soon. And we don’t really want it to be the case pretty soon. As much as we like Amazon, we want the ability to not just move things around to different clouds, but the ability to move machines, workloads, from our internal environment transparently out to Amazon or any other kind of cloud provider. We’d love to get away from the idea of synchronizing file systems, doing replications.”
Kaczorek says that Lilly is keeping its options open, and exploring ways of limiting the technical differences that might tether the company to one vendor. Lilly wants to be able “to leverage clouds both internal and external,” says Kaczorek.
Vendors in the space are anxious to please. Franckowiak says that many cloud vendors, including Amazon, have been very willing to address specific security concerns. Kaczorek said that the cloud ecosystem is so new that providers have been receptive to both his feedback and his challenges, even more so than established IT industries.
Still the call is for more security and accessibility. “I’d say to the vendors, ‘Make it easier to do what we do without the complexity of the technical nuts and bolts,’” says Kaczorek. “The bottom line of all of this is, we want to simplify the experience of HPC for our end users… they just want to push workload in and be done so they can do what they’re paid to do.”
But the burden doesn’t lie solely with vendors. Ramping up will require changes to pharma’s internal environments. “Part of this is pulling apart the pipeline that we have and trying to optimize it to be more loosely coupled,” says Kaczorek. Franckowiak agrees. “We’re still kind of early on in understanding how this fits with our traditional infrastructures.”
But even with challenges, none of the early adopters are wavering on their commitment. “The risks are becoming manageable,” says Franckowiak, “and a lot of that is the result of us dealing with these vendors and supplying them with our requirements and the things that concern us.” His advice: “Keep an open mind, do your homework, make sure you understand what you’re getting into, and take it one step at a time.”
Lilly views cloud computing as a clear way to improve quality of research, and is looking forward to “the trickledown effect of the innovation that comes from being able to compute with no boundaries,” says Kaczorek. “To actually be able to solve problems we would never have been able to solve before! I think that some of that, maybe a year ago, was just optimism. I think now we’re actually starting to see that become a reality.”
This article also appeared in the November-December 2009 issue of Bio-IT World Magazine.
Subscriptions are free for qualifying individuals. Apply today.