October 15, 2003 | MUCH HAS BEEN made of the importance of predicting the biologically active conformation of a protein from its primary amino acid sequence — the protein-folding problem (see "Computational Biologists Join the Fold," June 2002 Bio·IT World). It is worth considering the current state of the art and asking whether, in this (post-) genomic era, protein-folding predictions can play a practical role in the drug discovery process.
The most successful structure-prediction approaches attempt to match the structure of a protein to that of an already solved protein structure in the Protein Data Bank (see "Banking on Structures," Oct. 2002 Bio·IT World, page 60) [1-4]. Indeed, a long-term goal of structural genomics is to solve enough structures so that an arbitrary protein is within modeling distance of an already solved structure . Recently, for simple (single-domain) proteins, we have discovered that the library of solved structures is already complete at the level of low-to-moderate resolution structures . The problem now is to develop algorithms that can match these structures to the sequence of interest.
In large-scale comprehensive benchmarks that take advantage of the substantial computer power available to groups such as ours, we have found that about 90 percent of proteins can be matched to their approximate folds, while for about 25 percent there are significant local mismatches but good global matches to the structure. The good news is that low-to-moderate resolution structure predictions can be made for the remaining two-thirds of all single-domain proteins below 200 residues. Accepting the premise that low-to-moderate resolution structures can be predicted on a genomic scale, what good are they? Can we tell what a protein does simply from its projected 3-D structure?
|Think Bioinformatics, Think Buffalo
|The University at Buffalo is getting high-profile help in fulfilling its goal of becoming a bioinformatics heavyweight.
Unfortunately, this is true for only a fraction of proteins [7, 8]. What about the rest? It turns out that that these low-to-moderate resolution structures are good for predicting the biochemical/enzymatic function of a protein by matching to a known active-site template extracted from some other previously solved structure (see "The Sequence-to-Structure-to-Function Paradigm"). Here, we are limited by the number of known active-site templates, but this number will certainly grow. Indeed, using such template matching, we can suggest what the protein does or, at worst, suggest a small number of experiments to establish what its function is in many cases .
Thus, we can suggest biochemical assays to be done on protein targets for eventual use as diagnostic tools. If a pair of proteins adopts the quaternary (3-D) structure of a previously seen complex (that can even be quite distantly related in evolutionary terms), we have developed a technology that can often and reasonably accurately identify these protein-protein interactions and the interacting amino acids [10, 11]. Both experimental and theoretical progress will be further augmented by the creation of a large, experimental benchmark set of interacting proteins and their associated quaternary structure.
But one could argue that the world is awash in protein targets, so what use is this? By having most members of the family in a given genome, one could design in specificity to the target of interest . But there is more. If one has a known inhibitor, one can use these predicted low-resolution structures to predict where it binds in about two-thirds of the cases, again pointing out the important regions of the molecule . It remains to be seen if such structures can be used for the large-scale virtual screening of ligand libraries. But even if it turns out that one cannot predict the best lead molecule — if we could guarantee that such a lead is in the top 100 or 1,000 compounds — this could accelerate the drug discovery process and reduce its cost.
The clear direction for the future is to elucidate the role that proteins play in cellular pathways to pick and choose those that are likely to be "druggable." Preliminary indications are that structure prediction can assist in the automated assignment of proteins to known pathways — a first step in this process.
I am not suggesting that the complete solution to protein structure prediction has arrived. There remain difficult issues: refining structures to the higher resolution necessary for a number of applications; the one-third of small proteins that cannot yet be predicted at acceptable accuracy; and the extension to larger, multidomain proteins where the fold library is incomplete. Nevertheless, protein structure prediction is no longer just an interesting and challenging theoretical problem. Rather, computational approaches to the sequence-to-structure-to-function paradigm are becoming a reality.
The bottom line: By prioritizing targets and suggesting a relatively small number of experiments, protein structure prediction can play a practical role in drug discovery.
Jeffrey Skolnick is director of the Center of Excellence in Bioinformatics, University at Buffalo. He can be reached at firstname.lastname@example.org.
1. Bonneau, R. et al. J Mol Biol 322, 65; 2002.
2. Xu, D.; Crawford, O. H.; LoCascio, P. F.; Xu, Y. Proteins Suppl 5, 140-148; 2001.
3. Zhang, Y.; Kolinski, A.; Skolnick, J. Biophys J 85, 1145-1164; 2003.
4. John, B.; Sali, A. Nucleic Acids Res 31, 3982-3992; 2003.
5. Burley, S. K. Nat Struct Biol 7 Suppl, 932-934; 2000.
6. Kihara, D.; Skolnick, J. J Mol. Biol, submitted; 2003.
7. Fetrow, J. S. et al., Protein Sci 10, 1005-1014; 2001.
8. Skolnick, J.; Fetrow, J. S. Trends Biotechnol 18, 34-39; 2000.
9. Arakaki, A.; Zhang, Y.; Skolnick, J. Proc. Natl. Sci. USA, submitted; 2003.
10. Lu, L.; Lu, H.; Skolnick, J. Proteins 49, 350-364; 2002.
11. Lu, L.; Arakaki, A. K.; Lu, H.; Skolnick, J. Genome Res 13, 1146-1154; 2003.
12. Betz, S. F.; Baxter, S. M.; Fetrow, J. S. Drug Discov Today 7, 865-871; 2002.
13. Wojciechowski, M.; Skolnick, J. J Comput Chem 23, 189-197; 2002.