Strategic Insights: No Researcher Left Behind

By BIO-IT World
Loading...
STRATEGIC INSIGHTS 

A new crop of user-friendly cluster productivity tools targets the scientist

BY SALVATORE SALAMONE
 

Strategic Insights 
· No Researcher Left Behind 
· Driving Linux Cluster Performance
February 15, 2005 | As clusters take on more of the computational chores within life science organizations, the challenge for researchers is how to make sure software is running efficiently on their systems.

Many open-source and commercial diagnostic tools can probe a cluster's performance. But virtually all of these tools are designed for use by the experienced software developer. This quarter, Engineered Intelligence, Microway, and PathScale are bringing new tools to market that are aimed at the scientists themselves.

The new tools are part of a class of tools that Art Wieboldt, marketing manager for IBM Deep Computing, and others call productivity software. "This is a layer of software that [functionally] sits above the operating environment and management layer software," Wieboldt says.

"It's the next layer [of software] up the stack from operating environment tools such as debuggers, compilers, and math libraries," Wieboldt says. He notes that productivity software includes such things as interconnect management tools, trace analyzers, performance tuning tools, and parallelization tools.


Changing Requirements 
The need for new tools — especially ones that work at Wieboldt's productivity layer — is being driven by two major life science software development trends.

First, there is the porting (to clusters) of software that used to run on SMP Unix mini-computers and high-end workstations. Industry experts say there is a large installed base of such software (much of which was custom written) within the life sciences.

The second trend is to take software that was written to run on a single Linux server and move it to a cluster.

In both areas, the big issue is how to get software that was designed to run on a system with tightly integrated memory, processors, and data management to run efficiently on multiple independent cluster nodes.

For several years, application developers and programmers have had a bevy of tools that helped in this area. For example, vendors such as Verari Systems Software (formerly MPI Software Technology), Scali, Intel, and Scientific Computing Associates have been offering tools that help parallelize applications, tune performance, and manage the interconnection of server nodes and storage systems.

The new tools from Engineered Intelligence, Microway, and PathScale work in these areas, but with a twist. The tools are some of the first designed specifically with the needs of a researcher — and not necessarily the experienced application developer or programmer — in mind.

One distinction between these new tools and what has typically been available is ease of use. "I have been using trace analyzers for years," says Martin Cuma, scientific applications programmer at the University of Utah. In some cases, the newer tools simply handle mundane tasks that were required to use the older tools. For instance, Cuma notes that with some trace analyzer tools he would have to run the software on each node and collect the data together to look for patterns. He is now using PathScale's OptiPath MPI Acceleration Tools, which carries out performance analysis of applications that use message passage interface (MPI) techniques to run an application in a distributed mode on a cluster.

PathScale says that the OptiPath software is easier to use than many open-source performance analysis programs. For instance, the software automates many manual tasks (e.g., running test programs in multiple nodes and aggregating the information in one place) that typically are required to diagnose a cluster performance problem.

The PathScale OptiPath MPI Acceleration Tools software has been in limited distribution for several months; it will be commercially available this spring.

In a similar ease-of-use vein, Microway's MPI Link-Checker offers performance analysis features that a developer could use, but the tool presents that information in a way that helps scientists troubleshoot problems and tune and optimize an application's performance.

In the past, performance monitoring was often done using benchmark programs alone. But this approach has limitations. For instance, if an application underperforms, benchmarking software typically cannot tell if the problem is a single bad cable, a systems-wide problem, or poorly written application algorithms. And even when a benchmark program can isolate such problems, the user must sift through data to determine the root cause of a problem.

The MPI Link-Checker runs an MPI application on each node and then measures latency and bandwidth between all of the computational nodes of a cluster. The software then collects the results for all the nodes and plots the latency and bandwidth between all pairs of nodes in the cluster. The visual display of the data makes it easy to identify problems. For instance, if a single cable, network interface card, or node is bad, the program's display flashes a yellow background on the bad node to point out the problem.

Microway used the tool to test its own systems before shipping them, and would include the tool for its distributors and partners to use. After feedback from partners, systems vendors, and others, Microway decided to commercialize the tool. The MPI Link-Checker was announced late last year and will be commercially available this quarter.


Test Driving Applications
The PathScale and Microway products can help troubleshoot, optimize, and tune clustered applications' performance. But before such testing is required, an application must be running on a cluster. And that's where the Engineered Intelligence tool comes in.

Labs looking to move applications onto clusters often need help porting software that ran on a single machine to the distributed cluster environment. Many organizations do not have the technical skills to do this and must hire developers to perform the porting.

Engineered Intelligence's CxC tool helps a scientist develop a parallel application on his or her desktop computer, and then helps deploy that application to a cluster. Using virtual machine technology (in a similar way that Java applications run on virtual machines), CxC lets a scientist use a desktop computer to prototype and test a program intended for a cluster.

Essentially, the CxC software lets scientists define a parallel computer environment. In that environment, the scientist can then run existing programs or create new applications and test them on a virtual system with a pre-defined number of nodes. (CxC can work with programs written in C, C++, or Fortran.)

CxC and the other new productivity layer tools that are easier to use are part of general trend in high-performance computing. "There's been a maturing of cluster computing," says Michael Swenson, research manager at Life Science Insights. "There's an expanding base of users for clusters."

In the past few years, the increased use of clusters has focused attention on the operational management of the systems. Today, Swenson says, "you see cluster management tools maturing; IT vendors offering packaged, pre-loaded small cluster systems for the life sciences; and improving tools and compilers for developing parallel applications."* 





White Papers & Special Reports

sapiosciences
The Workflow Driven Lab
Sponsored by Sapio Sciences

Many companies have recognized that their internal business units operate as a set of business processes. These business processes are also called workflows. Modern Laboratories are highly suitable to this workflow driven approach. In fact, the lab environments successful operation is predicated on the successful definition and adherence to workflows. It could be said that a modern  laboratory is an advanced process implementing construct. It is important that laboratory management software mirrors the process driven nature of the lab thereby increasing automation, shortening learning curves, improving data quality and increasing lab throughput.

  • The modern laboratory is an advanced workflow implementing construct
  • Laboratory Management Software solutions should fully embrace and mirror this process driven approach
  • Effective information management of workflow processes with a LIMS results in increased automation, reduced training curves, better data quality and increased lab throughput


panasas
Curing Life Sciences Data Management Challenges with Scalable Storage
Sponsored by Panasas

High performance storage systems are a given to meet today’s life sciences R&D computational challenges. But with the explosive growth in data produced by next-gen lab equipment, scalability and long-term data management issues must also be addressed. Read this paper to learn:

  • Why new lab equipment will impact R&D workflows
  • How to avoid the hidden costs of long-term data management
  • What approach you should take to accommodate today’s data while having the flexibility to scale to meet future demands.


Quantum
StorNext 4.0: Technical Product Brief
Sponsored by Quantum

 
Proven in the world’s most data intensive industries, Quantum StorNext is a scalable, high-performance file system which allows data sharing across Linux, Mac, Unix, and Windows operating systems and manages data in enterprise storage environments. In this Technical Brief you'll learn:

  • How a high-performing file system can accelerate your business
  • How to simplify your data management
  • How a tiered storage approach can save you money


Life Science Webcasts & Podcasts

Predict or Perish! Shaping the Practices of Clinical Trials
Decisionview webinarSponsored by:  DecisionView

Predictive Analytics are a key differentiator in running your clinical trials successfully through 2010 and beyond. They will help you to optimize your patient enrollment, reduce your clinical operations costs and minimize your financial liability in the clinical supply chain. In this session, you will:
• Learn what predictive analytics are and what they are not
• Understand why you need predictive analytics to run your clinical trials, and
• Explore how predictive analytics will shape the future of clinical trials

Download Now. 

 



More Podcasts

Job Openings

The University of Washington Department of Genome Sciences is seeking a LINUX SYSTEMS ENGINEERING MANAGER to lead a team in a diverse scientific computing environment that includes multiple HPC systems, petascale storage, and custom application servers. Apply online at UW Hires for req number 61505.  http://www.washington.edu/admin/hr/jobs/

Loading...

For reprints and/or copyright permission, please contact The YGS Group, 3650 West Market Street, York, PA;

(717) 505-9701 ext. 125, or via email to Ashley.Zander@theYGSgroup.com.