By Malorye A. Branca

April 7, 2002 | The intensified focus on anthrax—and specifically, its evolution—is driving researchers to new heights in genomics.

New studies of the bacterium that causes anthrax will involve “a lot more sequencing, phylogenetic context, and developing molecular signatures for strains,” says P. Scott White, a researcher in the Bioscience Division at Los Alamos National Laboratory’s Center for Human Genome Studies.

White heads up the Bioscience Division’s Microbial Sequencing Program, which is one of several centers now working on Bacillus anthracis genomics. The enhanced steps he foresees should help researchers revamp microbial family trees, making it easier to track and study such features as virulence and resistance, and to develop drugs against them. New phylogenetic trees could also provide better analytical tools for predicting the functions of unknown proteins.

Bacterial phylogeny research was a low-profile activity until last fall’s series of anthrax contaminations, which killed five people and infected more than a dozen others. The FBI subsequently sought the help of the scientific community, most significantly Northern Arizona University researcher Paul Keim. Keim has been collaborating on anthrax studies with Los Alamos researchers, including Richard Okinaka and Paul Jackson.

Also on the hunt is The Institute for Genomic Research (TIGR) in Rockville, Md., which has sequenced more than 20 microbial genomes since 1995. TIGR researchers are preparing to publish a completed sequence for an index version of the Ames strain of B. anthracis used in the bioterror attacks. They are also publishing the first look at one of the bioterror samples, a comparison of the Ames index with the anthrax that killed photo editor Robert Stevens at American Media Inc. headquarters in Boca Raton, Fla.

The anthrax bacterium is particularly slow to mutate, making it difficult to find DNA sequence differences among isolates. To find telltale differences among samples, Keim has been looking at repetitive stretches of the genome, called variable number tandem repeats (VNTR), using a technique called multi-locus VNTR analysis (MLVA). He has also studied the anthrax pXO1 plasmid, a circular DNA molecule that contains several toxin-coding genes. However, because of the high rate of mutation in VNTR, there is a greater chance that two samples bearing the same mutation are not related, but coincidentally arrived at the same genetic makeup. To make a firm identification, more sequencing and new approaches for matching isolates are needed.

In new findings, Keim and colleagues at Los Alamos and TIGR have recently identified differences among several Ames samples. The groups are looking for markers that may help investigators find the specific laboratory source of the anthrax outbreak.

A New Angle on Anthrax
An improved B. anthracis family tree is also important in case future attacks occur using strains other than Ames. “Groupings based on VNTRs are good for finding similar samples, but do not necessarily reflect family history,” says Jonathan Eisen, an assistant investigator at TIGR and a specialist in phylogenetic analysis. “Now we need the rest of the story on the relationships between more distantly related changes.”

A method known as multi-locus sequence typing (MLST), similar to MLVA, looks at the relative amount of variation in a combination of sites across the genome. But the trick will be deciding which sites to study. “One could sequence a 250-base pair region in 10 different locations in the genome of B. anthracis and find zero variation among diverse isolates,” White says.

TIGR intends to sequence the genomes of as many as 14 anthrax strains from around the world, far more than have been sequenced for any other species. “What is needed is an archive containing a fair number of samples and a strategy for going from sequencing of multiple, whole genomes to finding regions containing highly specific molecular signatures,” White says.

The Los Alamos group is developing a process for establishing such signatures in any pathogen based on a molecular taxonomic approach, which White describes as the “rational design of DNA signatures.” The development of such signatures for bacteria is fairly new, and the researchers do not yet know how many data points will be required to provide more reliable classification. But once this work is done, the result will be a molecular genetics-based B. anthracis family tree --- and a blueprint for building this in other species.

But microbe phylogenies pose unique challenges. For starters, White says, “The nomenclature for many microbial groups is a nightmare.” As a result, the names of the microbes do not necessarily correlate well with the branches of the family tree.

Another problem is that microbes can trade genes back and forth, a process known as horizontal gene transfer. “Researchers are only now realizing that there should be a convergence of phylogenetic analysis and signature development,” White says. The DNA sequence-based signatures will provide a much finer degree of resolution for classification.

Better trees will then provide a better context for understanding how genes that affect such traits as virulence or drug resistance evolve, and where in the genome to look for them. For example, “There are strains of bacteria that are clearly B. anthracis but don’t cause disease,” White says.

Microbes in certain groups can evolve to become toxic more quickly than others, and so data linking genes to pathology can become a public health tool. Researchers in Great Britain have developed an international database ( that contains MLST data on several disease-causing organisms, including Haemophilus influenzae, Neisseria meningitidis, and Streptococcus pneumoniae.  “It’s a quick and easy way to compare bacterial strains,” says Man-Suen Chan, one of the database’s developers. To investigate a particular sample, one simply needs to run a relatively short amount of sequence (typically 500 base pairs) for the identifying gene fragments, and then compare them to what is in the database.

By understanding how microbes turn deadly, researchers hope to find new ways to treat disease and combat drug resistance. And understanding the similarities of genes that cause these traits will help in the design of better drugs to address them.

Phylogeny Finds Function
Accurate phylogenetic trees will give much better information for protein function prediction. Software to predict function based on sequence/structure similarity is not always reliable, partly because so many uncharacterized structures remain. With the wealth of protein targets now at hand, demand is rising for better tools to select among these targets and to design highly specific compounds against them.

Knowing the history of how a particular trait has evolved is critical to being able to predict function. “After all, if you simply classified animals based on whether they could fly, you’d have all the bats, bugs, and birds in one group,” Eisen says. 

A trait can evolve independently in different families, and a phylogenetic tree reveals the many pathways functions can follow. Companies such as EraGen Biosciences Inc. are already using phylogenetic informatics tools for drug discovery, for example, to predict whether an animal model has the appropriate protein pathways for studying a particular drug mechanism.

“If you want to target a drug to a protein,” Eisen says, “you can use the evolutionary history of that protein not just to find out which sites are invariant in the population, but how sites have changed in the population.”