It also depends on money, which the federal government appears to recognize. In his State of the Union speech, President Bush proposed funding Project BioShield to the tune of about $6 billion over 10 years; the money would be used to develop and purchase countermeasures against smallpox, anthrax, and other deadly agents.
In the Defense Science Board's report Protecting the Homeland, prepared in the summer of 2000, the authors identify 50 pathogens posing the greatest threat to humans (see " Murderous Microbes"), including various bacteria, viruses, and toxins. If this laundry list of pathogens isn't frightening enough, the report also discusses former programs at the biowarfare center in Almaty, Kazakhstan, where Soviet scientists created 900 different strains of plague, 300 strains of anthrax, 200 strains of tularemia, and 200 strains of cholera. (The Russian government says it is increasing security at the facility and working with the United States to clean up this and other hazardous sites.)
In the face of these clear and present dangers, the bio-IT community has a unique opportunity to contribute to national programs in biodefense, creating new commercial products for detection and treatment.
|A report from the Defense Science Board lists 50 pathogens posing the greatest threat to humans. Bio-IT hardware and software could be used to detect and respond to these deadly agents.
One immediate need in considering biodefense opportunities from an informatics perspective is comparative genomics. A key proposal in the Defense Science Board 2000 study is the creation of a "Bio-Print" database, comprising unique signatures for each of those 50 pathogens that could be exploited for terrorism. The goal is to find unique genetic tags in each pathogen that could be used to distinguish that organism from less virulent microbes. These sequences could be printed on a chip to provide immediate diagnoses of diseases documented in the database, flagging manmade or unusual diseases to healthcare workers. Groups such as The Institute for Genomic Research, Washington University in St. Louis, and the University of Wisconsin are sequencing and studying the basic biology of Category A and B pathogens to identify genomic regions that can be put on a BioPrint chip. Multiple strains of weaponized pathogens will need to be compared to look for novel regions that might suggest human tampering. Common regions conserved across multiple strains of a bug could be used as targets for vaccine creation.
Several companies (including mine, I should point out) and academic institutions are pursuing novel strategies for ultra-rapid DNA sequencing. Assuming that economical, rapid sequencing technology arrives in the near future, scientists will be able to generate whole genome sequences for a host of microbes. Instead of looking for biologically meaningful regions in one bacterial genome, researchers will have at their disposal hundreds of strains of an organism for comparison.
While this is an intrinsically exciting prospect, it's important to realize the bioinformatics systems for performing meaningful analysis across dozens of related genomes do not yet exist. It is unlikely that researchers will want to scroll through a typical genome viewer showing 10 or more 3MB genomes and flagging similar or different regions for further analysis. Bioinformatics software will need to identify and tag regions that may be interesting to microbiologists and present those in a way so that scientists can make meaningful discoveries without being overwhelmed by data.
The Need for Speed and Storage
Clearly, new methods of data visualization will be needed. Speed and storage will become major issues for institutions sequencing multiple strains of related organisms. Consider a government entity such as the Centers for Disease Control employing a whole genome sequencing technology and analyzing the genomes of 100 or more strains of Staphylococcus aureus. To ensure accuracy, each genome will be sequenced as many as eight times, with difficult regions even more times. The S. aureus genome is about 2.8 MB, so this would entail about 2 billion bases. After these genomes are assembled and sequencing errors removed, scientists will want to rapidly compare 280 million bases to identify single nucleotide polymorphisms, novel coding regions, and so on.
Once these rapid sequencing technologies are adopted, new opportunities will exist for IT storage providers and for vendors developing alignment and pattern-matching hardware and software. There is no point in sequencing a genome in a day if it takes six months to analyze the results.
The issue of data reporting will become critical as well. As new microbes are sequenced and genetic changes identified, these will need to be placed in a centralized database made available to government and healthcare centers across the country. If a microbe is identified as weaponized — insertion of a novel pathogenic protein or set of proteins — that information will need to be reported to other sequencing centers so that outbreaks of disease can be screened for these new coding regions. In turn, those new data will need to be included in the database so scientists can look for mutations associated with drug resistance. During an outbreak, the flow of data between infected regions, sequencing centers, and government databases could bring an IT infrastructure to its knees.
The problem of national biodefense offers opportunities for researchers in bio-IT to create solutions. The need for rapid and massive sequence comparison, enormous data storage, novel visualization methods, analysis and reporting tools, and communications infrastructure should open up new opportunities for visionary companies.
James Golden is manager of business development at 454 Corp., a subsidiary of CuraGen. He can be reached at firstname.lastname@example.org.