Protein Structure Prediction in Drug Discovery

By BIO-IT World
Horizons
GUEST COMMENTARY 
 

October 15, 2003 | MUCH HAS BEEN made of the importance of predicting the biologically active conformation of a protein from its primary amino acid sequence — the protein-folding problem (see "Computational Biologists Join the Fold," June 2002 Bio·IT World). It is worth considering the current state of the art and asking whether, in this (post-) genomic era, protein-folding predictions can play a practical role in the drug discovery process.

The most successful structure-prediction approaches attempt to match the structure of a protein to that of an already solved protein structure in the Protein Data Bank (see "Banking on Structures," Oct. 2002 Bio·IT World, page 60) [1-4]. Indeed, a long-term goal of structural genomics is to solve enough structures so that an arbitrary protein is within modeling distance of an already solved structure [5]. Recently, for simple (single-domain) proteins, we have discovered that the library of solved structures is already complete at the level of low-to-moderate resolution structures [6]. The problem now is to develop algorithms that can match these structures to the sequence of interest.

Think Bioinformatics, Think Buffalo 
The University at Buffalo is getting high-profile help in fulfilling its goal of becoming a bioinformatics heavyweight.

Read More 
  
In large-scale comprehensive benchmarks that take advantage of the substantial computer power available to groups such as ours, we have found that about 90 percent of proteins can be matched to their approximate folds, while for about 25 percent there are significant local mismatches but good global matches to the structure. The good news is that low-to-moderate resolution structure predictions can be made for the remaining two-thirds of all single-domain proteins below 200 residues. Accepting the premise that low-to-moderate resolution structures can be predicted on a genomic scale, what good are they? Can we tell what a protein does simply from its projected 3-D structure?

Unfortunately, this is true for only a fraction of proteins [7, 8]. What about the rest? It turns out that that these low-to-moderate resolution structures are good for predicting the biochemical/enzymatic function of a protein by matching to a known active-site template extracted from some other previously solved structure (see "The Sequence-to-Structure-to-Function Paradigm"). Here, we are limited by the number of known active-site templates, but this number will certainly grow. Indeed, using such template matching, we can suggest what the protein does or, at worst, suggest a small number of experiments to establish what its function is in many cases [9].

Thus, we can suggest biochemical assays to be done on protein targets for eventual use as diagnostic tools. If a pair of proteins adopts the quaternary (3-D) structure of a previously seen complex (that can even be quite distantly related in evolutionary terms), we have developed a technology that can often and reasonably accurately identify these protein-protein interactions and the interacting amino acids [10, 11]. Both experimental and theoretical progress will be further augmented by the creation of a large, experimental benchmark set of interacting proteins and their associated quaternary structure.

But one could argue that the world is awash in protein targets, so what use is this? By having most members of the family in a given genome, one could design in specificity to the target of interest [12]. But there is more. If one has a known inhibitor, one can use these predicted low-resolution structures to predict where it binds in about two-thirds of the cases, again pointing out the important regions of the molecule [13]. It remains to be seen if such structures can be used for the large-scale virtual screening of ligand libraries. But even if it turns out that one cannot predict the best lead molecule — if we could guarantee that such a lead is in the top 100 or 1,000 compounds — this could accelerate the drug discovery process and reduce its cost.

The clear direction for the future is to elucidate the role that proteins play in cellular pathways to pick and choose those that are likely to be "druggable." Preliminary indications are that structure prediction can assist in the automated assignment of proteins to known pathways — a first step in this process.

I am not suggesting that the complete solution to protein structure prediction has arrived. There remain difficult issues: refining structures to the higher resolution necessary for a number of applications; the one-third of small proteins that cannot yet be predicted at acceptable accuracy; and the extension to larger, multidomain proteins where the fold library is incomplete. Nevertheless, protein structure prediction is no longer just an interesting and challenging theoretical problem. Rather, computational approaches to the sequence-to-structure-to-function paradigm are becoming a reality.

The bottom line: By prioritizing targets and suggesting a relatively small number of experiments, protein structure prediction can play a practical role in drug discovery.

Jeffrey Skolnick is director of the Center of Excellence in Bioinformatics, University at Buffalo. He can be reached at skolnick@buffalo.edu.

References

1. Bonneau, R. et al. J Mol Biol 322, 65; 2002.

2. Xu, D.; Crawford, O. H.; LoCascio, P. F.; Xu, Y. Proteins Suppl 5, 140-148; 2001.

3. Zhang, Y.; Kolinski, A.; Skolnick, J. Biophys J 85, 1145-1164; 2003.

4. John, B.; Sali, A. Nucleic Acids Res 31, 3982-3992; 2003.

5. Burley, S. K. Nat Struct Biol 7 Suppl, 932-934; 2000.

6. Kihara, D.; Skolnick, J. J Mol. Biol, submitted; 2003.

7. Fetrow, J. S. et al., Protein Sci 10, 1005-1014; 2001.

8. Skolnick, J.; Fetrow, J. S. Trends Biotechnol 18, 34-39; 2000.

9. Arakaki, A.; Zhang, Y.; Skolnick, J. Proc. Natl. Sci. USA, submitted; 2003.

10. Lu, L.; Lu, H.; Skolnick, J. Proteins 49, 350-364; 2002.

11. Lu, L.; Arakaki, A. K.; Lu, H.; Skolnick, J. Genome Res 13, 1146-1154; 2003.

12. Betz, S. F.; Baxter, S. M.; Fetrow, J. S. Drug Discov Today 7, 865-871; 2002.

13. Wojciechowski, M.; Skolnick, J. J Comput Chem 23, 189-197; 2002.







White Papers & Special Reports

thomson reuters image
Biomarkers: An Indispensible Addition to the Drug Development Toolkit
Examining the Potential of Biomarkers
Sponsored by Thomson Reuters

Biomarkers are becoming an essential part of clinical development. In this white paper, Thomson Reuters provides insight from experts in industry and academia, and explores the role of biomarkers as evaluative tools in improving clinical research and the challenges this presents.

Discover the potential of biomarkers to:

  • Improve decision making
  • Accelerate drug development
  • Reduce development costs


BlueArc_Scientific Data
Scientific Data Lifecycle Management: Preparing for Storage in an Uncertain Future
Sponsored by BlueArc

Managing vast and overwhelming streams of gene sequencing data today requires ultra-high performance systems and processes. With continued rapid advancement and improvements in gene sequencing, expect tomorrow’s instruments to output quantities of genomic information that will dwarf current levels. Help your organization maintain data control and prepare for the future of sequencing through this informative paper that discusses:

  • The information technology challenges of gene sequencing
  • “Intelligent” methods for data management and customization
  • System survival tips... Deciding what data to keep or delete
  • New tools to keep scientists ahead of impending data torrents


SAS Managed image
Managed Innovation, Assured Compliance
Developing, executing and managing the transformation, analysis and submission of clinical research data with SAS® Drug Development
Sponsored by SAS
Get better products to market faster. Download this white paper to discover the top ten challenges facing life science executives and how to overcome them. See how SAS Drug Development transforms clinical data into true innovation.


Life Science Webcasts & Podcasts

Presented by Trade Commission of Spain

Spain Biotech: An Engine for Economic Change 

TCS podcastDiscover how Spain is focusing on biotechnology to be an engine for economic change through gradual internationalization, development and technology transfer.

Regional governments are actively investing in public and private biology research and promoting the creation of knowledge-based companies. Spain’s human capital combined with aggressive investment in biotech research and infrastructure has led to the creation of bio-clusters.

Today, there are nearly 700 Spanish companies engaged in biotechnology, with almost 50 percent growth in funding devoted to research. In fact, spending on internal R & D in biotechnology has grown 46 percent and is close to 300 million Euros.

Access the podcast 

 



More Podcasts

Job Openings

saic_logo

MANAGER, SCIENTIFIC COMPUTING & PROGRAMMING
(Bioinformatics Manager)
SAIC-Frederick, Inc has an exciting opportunity for a Manager, Scientific Computing & Programming - Core Genoytyping Facility in Gaithersburg, Maryland.  In this role, you will lead the Bioinformatics & Analysis Group.
Master’s or equivalent required.  PhD preferred. Six years experience in development of scientific programs in high-performance computing environment including five years supporting scientific research in computational chemistry, biology, or genetics, & two years supervisory experience.  View complete job posting & apply: www.saic-frederick.com. Position #146945.

For reprints and/or copyright permission, please contact The YGS Group, 1808 Colonial Village Lane, Lancaster, PA;

(717) 399-1900 ext. 125, or via email to Ashley.Zander@theYGSgroup.com.