Sequence Signatures and Homeland Security

Horizons
GUEST COMMENTARY 


April 15, 2003 | The threat of chemical and biological weapons has been made all the more tangible by the use of chemical agents in a Tokyo subway, deaths in the United States due to anthrax inhalation, fears over Iraq's deployment of chemical weapons, and the recent discoveries of ricin in London and Paris. The ability to detect and prevent infections related to bioterrorism incidents depends critically on the state of biomedical science and the technology infrastructure put in place to predict, detect, and monitor infectious diseases.

It also depends on money, which the federal government appears to recognize. In his State of the Union speech, President Bush proposed funding Project BioShield to the tune of about $6 billion over 10 years; the money would be used to develop and purchase countermeasures against smallpox, anthrax, and other deadly agents.

In the Defense Science Board's report Protecting the Homeland, prepared in the summer of 2000, the authors identify 50 pathogens posing the greatest threat to humans (see " Murderous Microbes"), including various bacteria, viruses, and toxins. If this laundry list of pathogens isn't frightening enough, the report also discusses former programs at the biowarfare center in Almaty, Kazakhstan, where Soviet scientists created 900 different strains of plague, 300 strains of anthrax, 200 strains of tularemia, and 200 strains of cholera. (The Russian government says it is increasing security at the facility and working with the United States to clean up this and other hazardous sites.)

In the face of these clear and present dangers, the bio-IT community has a unique opportunity to contribute to national programs in biodefense, creating new commercial products for detection and treatment.

Murderous Microbes 
A report from the Defense Science Board lists 50 pathogens posing the greatest threat to humans. Bio-IT hardware and software could be used to detect and respond to these deadly agents.

Read More 
  
One immediate need in considering biodefense opportunities from an informatics perspective is comparative genomics. A key proposal in the Defense Science Board 2000 study is the creation of a "Bio-Print" database, comprising unique signatures for each of those 50 pathogens that could be exploited for terrorism. The goal is to find unique genetic tags in each pathogen that could be used to distinguish that organism from less virulent microbes. These sequences could be printed on a chip to provide immediate diagnoses of diseases documented in the database, flagging manmade or unusual diseases to healthcare workers. Groups such as The Institute for Genomic Research, Washington University in St. Louis, and the University of Wisconsin are sequencing and studying the basic biology of Category A and B pathogens to identify genomic regions that can be put on a BioPrint chip. Multiple strains of weaponized pathogens will need to be compared to look for novel regions that might suggest human tampering. Common regions conserved across multiple strains of a bug could be used as targets for vaccine creation.

Several companies (including mine, I should point out) and academic institutions are pursuing novel strategies for ultra-rapid DNA sequencing. Assuming that economical, rapid sequencing technology arrives in the near future, scientists will be able to generate whole genome sequences for a host of microbes. Instead of looking for biologically meaningful regions in one bacterial genome, researchers will have at their disposal hundreds of strains of an organism for comparison.

While this is an intrinsically exciting prospect, it's important to realize the bioinformatics systems for performing meaningful analysis across dozens of related genomes do not yet exist. It is unlikely that researchers will want to scroll through a typical genome viewer showing 10 or more 3MB genomes and flagging similar or different regions for further analysis. Bioinformatics software will need to identify and tag regions that may be interesting to microbiologists and present those in a way so that scientists can make meaningful discoveries without being overwhelmed by data.


The Need for Speed and Storage
Clearly, new methods of data visualization will be needed. Speed and storage will become major issues for institutions sequencing multiple strains of related organisms. Consider a government entity such as the Centers for Disease Control employing a whole genome sequencing technology and analyzing the genomes of 100 or more strains of Staphylococcus aureus. To ensure accuracy, each genome will be sequenced as many as eight times, with difficult regions even more times. The S. aureus genome is about 2.8 MB, so this would entail about 2 billion bases. After these genomes are assembled and sequencing errors removed, scientists will want to rapidly compare 280 million bases to identify single nucleotide polymorphisms, novel coding regions, and so on.

Once these rapid sequencing technologies are adopted, new opportunities will exist for IT storage providers and for vendors developing alignment and pattern-matching hardware and software. There is no point in sequencing a genome in a day if it takes six months to analyze the results.

The issue of data reporting will become critical as well. As new microbes are sequenced and genetic changes identified, these will need to be placed in a centralized database made available to government and healthcare centers across the country. If a microbe is identified as weaponized — insertion of a novel pathogenic protein or set of proteins — that information will need to be reported to other sequencing centers so that outbreaks of disease can be screened for these new coding regions. In turn, those new data will need to be included in the database so scientists can look for mutations associated with drug resistance. During an outbreak, the flow of data between infected regions, sequencing centers, and government databases could bring an IT infrastructure to its knees.

The problem of national biodefense offers opportunities for researchers in bio-IT to create solutions. The need for rapid and massive sequence comparison, enormous data storage, novel visualization methods, analysis and reporting tools, and communications infrastructure should open up new opportunities for visionary companies.



James Golden is manager of business development at 454 Corp., a subsidiary of CuraGen. He can be reached at jgolden@454.com.








White Papers & Special Reports

sgi whp 2
Managing the Modern Genomics Data Flood
Sponsored by SGI

Managing and storing the perfect storm of multi-disciplined data pouring from next generation sequencers and other omics instruments is a central challenge in life sciences. Discover in this paper how the SGI ArcFiniti storage solution, optimized for unstructured genomics and life sciences data can: 

  • Reduce costs, proactively protect data integrity, and deliver the high performance I/O required for genomics data processing and analysis.  
  • Effectively manage capacities from 156TB to 1.4PB as a disk based, integrated hardware and software platform 


sgi - whp 1
Turning Genomics Data into Practical Insight
Sponsored by SGI

With worldwide sequencing capacity approaching 13 quadrillion DNA bases annually turning genomics data into knowledge is a true computational challenge. Read this paper and learn how the SGI UV coherent shared memory platform can:  

  • Speed results time while cost competitively tackling the most difficult computational problems across all omics disciplines. 
  • Push performance by scaling to extraordinary levels, up to 256 sockets (2,560 cores, 4,096 threads) per single system (one OS image). 

Provide support for up to 16TB of coherent shared memory in a single system image enabling extreme efficiency across a wide range of compute demands. 



accerlys-logo_2012_wh
New Complimentary Market Survey…
Collaborations and Communications Within Drug Discovery Research
Sponsored by Accelrys
This survey was conducted by the Cambridge Healthtech Media Group in January, 2012. It was sponsored by Accelrys related to their HEOS initiative to gather valid information around externalizing collaborative research while improving communications in the cloud. With 310 qualified industry respondents the survey findings reveal useful usage and trends patterns.  An insightful follow-on discussion and webinar related to this survey, and the HEOS by Scynexis SaaS portal is also available on the Bio-IT World website for complementary viewing.
 


Job Openings

tessella logo 
Scientific Software Engineer
Boston MA
$70,000 to $95,000
 
Apply at http://jobs.tessella.com   

oxford nanopore logo 


Early Access Collaborations ManagersClick here to find out more and apply   

Oxford Nanopore's GridION technology, VP, Sales and Marketing Click to  Apply  

For reprints and/or copyright permission, please contact  Tim McLucas, (781) 972-1342, tmclucas@healthtech.com .