Our lives are shaped by those byproducts of the space race, from CAT scans and kidney dialysis, to satellite communications, advanced weather forecasting, and fuel cells. The same challenge that inspired stirring feats in rocketry and space flight is responsible for candy wrappers and cordless power tools, not to mention the dubious cook/chill concept for serving delicious airplane food.
Our generation is challenged with a mission no less ambitious (or outrageous) than the moon: to leverage the full potential of genomic knowledge to revolutionize how we cure disease. There is every reason to believe that we, too, will meet our goal and benefit from scientific and technical innovations that must necessarily come of such an endeavor.
The IT industry will clearly be one of the primary beneficiaries of this race to exploit the genome. The emerging computation requirements and data complexity related to R&D are pushing the current boundaries of many different technologies within the IT industry. Following are illustrations of just some of the technology areas in which IDC expects to see accelerated growth.
Large shared-memory server architectures: Much of genomic and proteomic computing makes fewer demands on raw processor speed than it does on the input/output required to move very large amounts of data (terabytes and more) from one place to the next. As soon as a researcher goes beyond microarray analysis and into metabolic and signal transduction pathway simulations, for example, the complexity of the modeling task increases precipitously. Many researchers say that a new class of large shared-memory supercomputers is critical for these types of genomic comparison and assembly research applications, casting doubt on the premise that "commodity clusters" will suffice for all future biological approaches. IDC believes that both computational environments will likely play a role in bioscience workloads. The real question is whether the current economics will allow high-end niche super-computer suppliers to invest in new classes of systems (see June Bio·IT World, page 50).
Clusters and grid computing: That said, a significant portion of life science computational workloads remain "embarrassingly parallel," i.e., they tend to consist of running a series of mostly independent jobs with little or no internal communication requirements. The modest node-to-node communication needed for protein threading and microarray analysis, for instance, qualifies them as embarrassingly parallel and as such makes them particularly suitable for distributed computing environments. Subsequently we see increasingly large installations of massively parallel processing (MPP) systems to solve advanced biological and chemical problems. On the far extreme, IBM Corp.'s Blue Gene supercomputer — a 64,000-processor, petaflop (quadrillion floating-point operations per second) behemoth — will aid the study of protein folding (see "Think Blue ... Again," page 28). MPP computers of this size and scale give systems designers an opportunity to work on the core problems of large-scale systems design, such as the use of cellular designs for massively parallel systems, integrated processor-memory logic, error recovery, algorithms, and new programming models and tools.
Heterogeneous data integration and query: The heterogeneity and changeable nature of the data intrinsic to modern research has led to extreme integration and query challenges. In response, many IT vendors are creating integrated analysis, data-mining and data-integration tools, ranging from adaptive, warehouse-based data query schemas, to platforms capable of parsing hundreds of public and private databases. Unlike high-performance computing, where bigger is always better, it is difficult to speculate the technologies that will emerge from this exercise of turning raw data into knowledge. One possible scenario is a computing utility services model in which the full complement of Web services converges with large-scale grid installations, resulting in a highly specialized Internet search engine surpassing anything we know today. We are likely years away from this type of infrastructure. Much can happen in the meantime.
These are just some of the IT areas likely to benefit from the converging life and information
Our challenge is to leverage the full potential of genomic knowledge.
sciences. Other benefits will emerge that are not technological. Much ado has been made of the economic upside to the bio-IT industry from this convergence. Unlike the space race, however, we have no great governmental patron funding the enterprise. Sadly, our "patron" can rightly be seen as the worsening economics of drug R&D. The drug industry knows the need for IT requires immediate attention if it is to survive.
Kennedy continued his 1961 speech: "No single ... project in this period will be more exciting, or more impressive to mankind, or more important ... and none will be so difficult or expensive to accomplish."
Amen to that.
Mark Hall is director of life science research at International Data Corp. and can be reached at email@example.com.