Aug 15, 2005 | We’ve heard the term “postgenomic era” quite a bit over the past half-decade. Despite this, it seems that each week we hear about the publication of the genome of a new species — a pathogenic microbe, an agricultural staple, or an obscure new species of mammal — making it hard to imagine finding ourselves in such an era anytime soon. Nevertheless, the technology of sequencing genomes is very late ’90s. What is the postgenomic era about? Bioinformatics.
Each of the scores of sequenced genomes comprises thousands of genes that encode an even larger number of proteins and RNAs. Moreover, the expression of each of these genes is regulated by coordinated and combinatorial sets of regulatory elements and factors. It is impractical and improbable that each of the decrypted genes will ever be fully studied in detail. There are just too many. However, this is exactly the sort of challenge that bioinformatics tools can impact.
The analysis of gene expression networks is a case in point. While microarrays allow us to measure gene expression in a massively parallel format, they cannot reveal the molecular mechanisms underlying the dynamic coordination of gene expression. In contrast, in a study* in the July 2005 issue of Trends in Genetics, researchers have shown that comparative genomics can be used to discover new systems of coordinated gene regulation.
By comparing 140 sequenced bacterial genomes, the authors uncovered a regulatory system key to bacterial replication. By scanning these genomes, they identified a regulatory sequence involved and the transcription factor that acted through it. They used phylogenetic footprinting to scan the regulatory sequences of genes involved in converting ribonucleotides into deoxyribonucleotides used to replicate DNA. They identified a palindromic sequence that overlapped the promoter region of many of these genes, suggesting it might act as a repressor. They then used phylogenetic profiling to correlate the presence of this sequence in the pertinent genomes with the presence or absence of a protein factor. One protein cluster met the criteria — and it represented a protein family with characteristics of transcription factors. Using positional clustering — based on the principle that functionally related genes often inhabit the same genomic neighborhood — the team found the transcription factor they identified resided close to the genes containing its putative binding sequence. Wet-bench data confirmed the virtual findings.
Prediction and Interaction
Bioinformatics is having an impact on predicting protein structure and function, which is directly relevant to developing therapeutic agents to treat disease. Rutgers University announced a nearly $55-million research effort funded by the Protein Structure Initiative of the National Institute of General Medical Sciences to take a pilot program to develop tools to streamline and speed protein structure prediction to the production phase. Much of the work is based on the wet bench, although clearly it is impractical to crystallize and do detailed structural studies on all members of the proteomes of interest. However, the studies will continue to provide template structures for protein families that should be useful for computer-based structure modeling and drug design. Two hundred structures generated in the pilot studies have provided predictive models for some 40,000 proteins of the 100,000 or so protein families predicted to exist in nature.
A complementary technology that provides information on protein function is interaction mapping analysis, which aims to establish networks of interacting proteins in cells. Hybrigenics is a proteomics services company launched in 2003 that exploits an automated yeast-based two-hybrid process to determine protein interaction. The company identifies interacting partners for a customer’s protein(s) of interest, identifies interacting protein domains involved in the interaction, and provides statistical scores to allow the client to evaluate the quality of the results. They also offer a bioinformatics tool called PIM-Rider that allows clients to integrate the data with data available through the Protein Interaction Map. The data will be useful in modeling protein interaction families in related organisms.
Bioinformatics has quickly become an integral part of the drug discovery process. While many algorithms and tools remain open-source and public-domain, it seems only a matter of time before we see a greater number of proprietary services. This will help define the so-called postgenomic era.
Rodionova, D.A. et al. “Identification of a bacterial regulatory system for ribonucleotide reductases....” Trends Genet 21, 385-9; 2005.
Robert M. Frederickson is a biotech writer based in Seattle. E-mail: email@example.com.