The New England Computing Revolution



REvolution Computing brings Parallel R to life sciences.

By Kevin Davies

July 14, 2008 | With the exponential growth in data, researchers and clinicians are finding it increasingly challenging to run analyses in real time. REvolution Computing, a New Haven-based next-generation analytics software company with a background in parallel computing, is quietly helping life scientists using open source R software to do just that.

Less than three years ago, REvolution Computing began deploying parallel solutions around R software, which was developed as an open source statistical system, “to fill the gap in the wake of proprietary systems in business intelligence such as SAS and S+,” says CEO Richard Schultz.

REvolution Computing “has a long history of being able to make software run very fast,” says Schultz, whether it be in life sciences or energy exploration. The rapid growth in R’s popularity and market share enables the community of statisticians and bioinformaticians “to contribute research back into [the] project, to see modernization happen in real time.” Indeed, where SAS dominated ten years ago, Schultz says the text books are “all R now. That speaks to the pace at which this system has been applied.”

Open Source Model
REvolution’s open source business model around R is similar to the strategies of companies such as MySQL, JBOSS, and RedHat. REvolution is providing commercial support for a user base of 1 million and growing, many of whom want to know, ‘How do I take this research tool and scale on the clinical side?” Schultz points to the volumes of clinical data (MRIs, gene expression) “that weren’t even storable ten years ago. That’s the future. Mapping that data in real time onto a set of solutions.”

“The benefits of open source have to do with modernizing a challenging and difficult platform,” says Schultz. “In statistics, this is a critical point. The legacy players, their solutions have been around for 40 years. That kind of software had difficulty scaling to modern kinds of usages. Existing software just can’t keep up.”

Without the open source approach, it would take “hundreds of researchers to keep this up,” which Schultz terms “a very difficult proposition.” This way, “we take a worldwide community, the thought leaders, often with leading academic positions, and get them to contribute to the project and give back to the community at large.” He draws parallels with the open source goals of the HGP, “furthering society and drug discovery. That’s why we got engaged in this aspect of the project.”

 REvolution Computing started to work with Pfizer about two years ago, says Schultz, and now has “a huge user base of R.” But R was not designed to take advantage of more than one computer core. As it turned out, REvolution had a scalability solution ready. “Our team had been doing research on multi-core computing,” he says. “Can you marry high performance technology with the community’s work on R? We set out to develop Parallel R, a high-performance version of R.”

Schultz claims that Pfizer got a “150-fold speed up on their cluster in one week” (see, “Pfizer Partnership”).  Moreover, “work that would take six months can be done in a day.” REvolution has quietly been expanding its list of life science customers. The company has also worked with Novartis, Merck, Bristol-Myers Squibb, and other top 15 pharmas. “Beyond that, our software is widely used, sometimes in ways we don’t even know.”

Schultz sounds a little put out when I ask how pharma companies found his company. “For folks looking for high-performance computing, we have a reputation as being leaders in the ability to address very difficult problems.” In fact, REvolution worked on the Star Wars project, and has worked on financial applications and with other major IT companies such as Cray. “I hesitate to say, we’re a very big fish in a small pond!”

Once REvolution began deploying high-performance R solutions, it was soon faced with much broader R questions, such as: How do I support it on my cluster? What do I do with huge gene datasets, or clinical trial data? New challenges in cancer and diabetes are emerging, he says.

Intel Invests
The only external investor so far is Intel. The chipmaker’s interest comes from two perspectives. On the one hand, Intel is a leading provider of multi-core hardware, but there’s a dearth of suitable software to take advantage of the increased performance. “They saw the synergy,” says Schultz. Moreover, Intel has “a very active open source software group. The R Project reached their radar. Our combination of multi-core and open source R was right in the sweet spot of what they found interesting.”

Intel’s investment as a strategic partner provides access to scale. “As our user base grows, our ability to support that is extremely important. Making sure we have the infrastructure to solve today’s problems, rather than yesterday’s, means we’ll have continued investment in R&D. The Intel backing provides further assurance we’ll be the leader,” says Schultz.

“We hope to get ahead of our customers, in the same way the hardware has gotten in front of the software,” Schultz continues. “Everybody’s got a multi-core machine but most software only runs on one of those cores. I’ve got a dual-core Apple laptop in front of me, but I run Word on just one core. Our job is to flip that equation back to the other side, and make the tools the statisticians and clinical researchers utilize that much more powerful.”

REvolution is gaining traction in other verticals, especially financial services,—anywhere R is used where there are lots of data or simulations. But, says Schultz, “life sciences will always be near and dear to us.” 

Pfizer Partnership

At the Bio-IT World Conference and Expo last April, REvolution Computing announced results of a benchmark study, conducted with Pfizer, on chemical classification data on quad-core AMD Systems. The groups looked at caretNWS , a parallel version of caret implemented using Parallel R, for drug safety studies.  (The caretNWS package is available publicly at: www.cran.r-project.org.)

Working with Pfizer, REvolution Computing parallelized its caret software.  caretNWS provides parallel processing functionality that reduces the computational time to build models without sacrificing model quality. The research showed that caretNWS software accelerated the analysis of large data sets, reducing the number of potential candidate molecules for new drugs and increasing the efficiency of drug development.

“We were able to improve our ability to bring new medicines to the market quickly,” said Max Kuhn, Pfizer’s associate director of non-clinical statistics. “CaretNWS is an asset in the battle against the rising costs associated with new drug development, which is why this is available on a broad, public basis. The ability to conduct large data analysis across multi-core processors represents a significant benefit for drug discovery and development.”

CaretNWS was used to predict the safety component of compounds, specifically carcinogenic side effects in potential drugs.  These models can also eliminate the expensive and time-consuming process of studying a large number of potential compounds in the physical laboratory. --K.D.

___________________________________________________

 

This article appeared in Bio-IT World Magazine.

Subscriptions are free for qualifying individuals.  Apply Today.

 

 

Click here to login and leave a comment.  

0 Comments

Add Comment

Text Only 2000 character limit

Page 1 of 1



White Papers & Special Reports

sgi whp 2
Managing the Modern Genomics Data Flood
Sponsored by SGI

Managing and storing the perfect storm of multi-disciplined data pouring from next generation sequencers and other omics instruments is a central challenge in life sciences. Discover in this paper how the SGI ArcFiniti storage solution, optimized for unstructured genomics and life sciences data can: 

  • Reduce costs, proactively protect data integrity, and deliver the high performance I/O required for genomics data processing and analysis.  
  • Effectively manage capacities from 156TB to 1.4PB as a disk based, integrated hardware and software platform 


sgi - whp 1
Turning Genomics Data into Practical Insight
Sponsored by SGI

With worldwide sequencing capacity approaching 13 quadrillion DNA bases annually turning genomics data into knowledge is a true computational challenge. Read this paper and learn how the SGI UV coherent shared memory platform can:  

  • Speed results time while cost competitively tackling the most difficult computational problems across all omics disciplines. 
  • Push performance by scaling to extraordinary levels, up to 256 sockets (2,560 cores, 4,096 threads) per single system (one OS image). 

Provide support for up to 16TB of coherent shared memory in a single system image enabling extreme efficiency across a wide range of compute demands. 



accerlys-logo_2012_wh
New Complimentary Market Survey…
Collaborations and Communications Within Drug Discovery Research
Sponsored by Accelrys
This survey was conducted by the Cambridge Healthtech Media Group in January, 2012. It was sponsored by Accelrys related to their HEOS initiative to gather valid information around externalizing collaborative research while improving communications in the cloud. With 310 qualified industry respondents the survey findings reveal useful usage and trends patterns.  An insightful follow-on discussion and webinar related to this survey, and the HEOS by Scynexis SaaS portal is also available on the Bio-IT World website for complementary viewing.
 


Job Openings

tessella logo 
Scientific Software Engineer
Boston MA
$70,000 to $95,000
 
Apply at http://jobs.tessella.com   

oxford nanopore logo 


Early Access Collaborations ManagersClick here to find out more and apply   

Oxford Nanopore's GridION technology, VP, Sales and Marketing Click to  Apply  

For reprints and/or copyright permission, please contact  Tim McLucas, (781) 972-1342, tmclucas@healthtech.com .