April 14, 2006 | Bioinformatics and related information technologies are undergoing a paradigm shift where the prototypical use-case is changing from a very small set of highly specialized individuals operating on fantastically expensive, custom supercomputing systems to encompassing all scientific researchers on large operating systems that can be purchased affordably off the rack. This dramatic change in user profile requires an appropriate response from the producers of these technologies to provide a level of “ease of use” that is consistent with the large-volume, low-cost, and less specialized user.
If you read your computing history (or are old enough to remember), we witnessed a similar paradigm shift in research computing once before. During the early days of the first computer-aided scientific research, scientists did all of their own programming, data entry, computer operation, and computer repairs. Because these computers were so costly, it was considered of utmost importance to keep them in continual use. This led to the development of the first operating systems, which were essentially batch processing programs, where instead of operating the computers themselves, scientists would put their programs onto punch cards and turn them over to a computer operator. The computer operator, a semi-skilled laborer, would load up each program, run it, and collect the results for the scientist. Because the cost of the computer and the programmer were so large relative to the data-entry personnel, programs were written for the convenience of the computer and the programmer, rather than for the computer operator. This approach of having the user conform to the computer and programmer lasted long after the price of computers and programmers had dropped, and what had at one time made a lot of economic sense continued long after the economic model had reversed. We all know what happened there, right? Computers became sufficiently inexpensive that anyone could own one (or more), and the computer human interface became simple enough that children could operate them.
Bio-IT is repeating this computing history. In recent years past, due to the high cost and novelty of supercomputing for bioinformatics research, it has been common to witness a Ph.D.-level scientist up to his or her elbows in supercomputing hardware infrastructure design, mucking with RAM, disks, CPUs, and networking. As computing hardware costs have dropped such that supercomputing resources are well within reach of small academic research laboratories, a researcher is still required to conform to the computer. The ease of use is not there yet. A large number of shared network services must be installed and configured by IT professionals to transform the many discrete computers into a single virtualized compute resource, and the scientists are still plunking away at the UNIX command-line, writing their own data analysis algorithms and high-performance parallelization batch processing routines. This reminds me of roughly the point in time just after I bought my first personal computer (Timex Sinclair 1000, with BASIC command-line shell), and just before I bought my fourth (Atari 520 ST, with a point-and-click graphical user interface). That is to say, scientists can afford to buy the computer but still have to work quite hard and with a clumsy interface to get their work done.
What must and will occur in bio-IT is the equivalent advances in ease of use as the transition from the BASIC command-line shell to the point-and-click graphical user interface. I’m not at all sure when this will occur or what the solution will ultimately look like, but I know the ease of use I’m looking for. Massive and massively scalable compute resources will be self-assembling. I’m not talking nanobots that will do the racking and stacking, but rather they will require zero configuration of any shared network services, involving no more than placing two or more computers on the same network. As a user, whether I’m performing some data analysis as a single-threaded processes on my laptop, or as a highly parallel, high-performance analysis executing over many discrete remote systems, I as a user won’t know or even care. I will simply get the results I’m looking for right now. All of my data analysis algorithms will be knowledgeable of data types they accept and data types they produce and will thereby inter-operate, and each of these algorithms will be accessible directly from my desktop and from within my desktop productivity tools. The ease of use I’m looking for is for bioinformatics research tools to be as simple to use as my iPod.
E-mail Bill Van Etten at firstname.lastname@example.org.