Scientists and IT groups create return on investments.
Dec. 17, 2007 | Several trends over the past 25 years have revolutionized how scientists and engineers work: PCs replacing minicomputers for data acquisition and analysis, high-level environments replacing Fortran for technical computing, and modeling and simulation playing a key role in embedded system development.
Those trends reflected scientists' thirst for more speed and the need to handle ever-larger data sets. It was never simply a question of CPU quantity or clock speed or addressable memory, but rather the user's ability to harness that compute power to do useful work. Nowadays, as I talk with biomedical researchers about their challenges and computing needs, I have a sense of déjà vu, as I see some familiar patterns mixed with some new twists.
Back in the centralized-computing days of VAX, Convex, Alliant, and Cray, the computer's OS and system administrators bore the administrative burdens while end-users struggled at their terminals to set up their work. As standalone personal computers (PCs) took over, the end users flourished: they controlled what their computer did, high-level environments enabled them to develop algorithms and analyze data without the need for low-level programming, and ever-faster processors gave them power boosts without additional effort. However, system administrators struggled to manage and support those dispersed computing activities.
At that time, it was groundbreaking to provide a technical computing environment that let the end user work interactively while insulating them from computer variations, running on a PC, workstation, or supercomputer without change. There was no need to recompile or figure out the byte order of the CPU with which you were working.
Today's landscape has some familiar aspects and other attributes that are new. End users still use their PCs, except they now have dual- and quad-core processors. Today's multiprocessor clusters feel a lot like the old mainframe/supercomputer paradigm, except cheaper and based on "standard" processors and operating systems. And end users still desire more speed, better data handling, and improved productivity.
One twist is the mixed composition of users in biopharma: biologists and chemists who rely on data and want turnkey, easy-to-use software; statisticians and mathematicians who need to create, refine, and deploy new algorithmic approaches; and computer scientists and programmers who want tools for rapidly generating production-computing applications that work on huge volumes of data.
Somewhere along the way from mainframes and minis to PCs to today, scientists and IT groups seem to have lost track of each other. End users crave speed and power, but don't talk to their IT groups (except to badger for a faster PC with more memory). Meanwhile, IT groups are buying more multi-core PCs (there really isn't any other option nowadays) that users can't fully take advantage of. And they set up server farms and have to search in-house for projects and end users interested in using them. It is ironic!
Hidden in this bizarre situation is a very interesting opportunity. The multi-core PC enables end-users to do parallel and distributed computing without impacting IT groups. The improved OS, scheduling, and administrative tools of compute servers (along with the fact they're based on the same processors and operating systems as the PCs) mean that more compute power is available - and more affordable - than ever before.
Recent enhancements in technical computing environments provide greater consistency in how users can capitalize on today's variety of computing systems. High-level technical computing tools that can distribute work on a server farm, without the need for low-level message passing programming, allow users to harness multiprocessor clusters for applications such as Monte Carlo simulations and sequence analysis. Scientists can also work with larger data sets, as the number of processors and memory space can scale up. And these same environments can also take advantage of multi-core PCs, enabling end-users to make full use of the computing systems at their disposal, with minimal changes to what they do on their own PC.
While speeding up existing applications is unquestionably good news, the most exciting opportunity is still to come: providing tools so that algorithm and application developers can more easily create techniques that make explicit and optimal use of parallel-computing systems, regardless of whether it's a dual-core PC or a server farm with hundreds of processors. The programming and system administration tools are reaching the point where end users can really focus on the problems and applications, taking advantage of available hardware without the need to deal with it explicitly.
But so what? Are you doing better drug discovery or advancing your understanding of systems biology better than the next guy? Now you and your users have the opportunity to create that return on your investments in your computing resources.
Jim Tung is a MathWorks Fellow. He can be reached at firstname.lastname@example.org.
Subscribe to Bio-IT World magazine.