Protein Structure Prediction in Drug Discovery

Horizons
GUEST COMMENTARY 
 

October 15, 2003 | MUCH HAS BEEN made of the importance of predicting the biologically active conformation of a protein from its primary amino acid sequence — the protein-folding problem (see "Computational Biologists Join the Fold," June 2002 Bio·IT World). It is worth considering the current state of the art and asking whether, in this (post-) genomic era, protein-folding predictions can play a practical role in the drug discovery process.

The most successful structure-prediction approaches attempt to match the structure of a protein to that of an already solved protein structure in the Protein Data Bank (see "Banking on Structures," Oct. 2002 Bio·IT World, page 60) [1-4]. Indeed, a long-term goal of structural genomics is to solve enough structures so that an arbitrary protein is within modeling distance of an already solved structure [5]. Recently, for simple (single-domain) proteins, we have discovered that the library of solved structures is already complete at the level of low-to-moderate resolution structures [6]. The problem now is to develop algorithms that can match these structures to the sequence of interest.

Think Bioinformatics, Think Buffalo 
The University at Buffalo is getting high-profile help in fulfilling its goal of becoming a bioinformatics heavyweight.

Read More 
  
In large-scale comprehensive benchmarks that take advantage of the substantial computer power available to groups such as ours, we have found that about 90 percent of proteins can be matched to their approximate folds, while for about 25 percent there are significant local mismatches but good global matches to the structure. The good news is that low-to-moderate resolution structure predictions can be made for the remaining two-thirds of all single-domain proteins below 200 residues. Accepting the premise that low-to-moderate resolution structures can be predicted on a genomic scale, what good are they? Can we tell what a protein does simply from its projected 3-D structure?

Unfortunately, this is true for only a fraction of proteins [7, 8]. What about the rest? It turns out that that these low-to-moderate resolution structures are good for predicting the biochemical/enzymatic function of a protein by matching to a known active-site template extracted from some other previously solved structure (see "The Sequence-to-Structure-to-Function Paradigm"). Here, we are limited by the number of known active-site templates, but this number will certainly grow. Indeed, using such template matching, we can suggest what the protein does or, at worst, suggest a small number of experiments to establish what its function is in many cases [9].

Thus, we can suggest biochemical assays to be done on protein targets for eventual use as diagnostic tools. If a pair of proteins adopts the quaternary (3-D) structure of a previously seen complex (that can even be quite distantly related in evolutionary terms), we have developed a technology that can often and reasonably accurately identify these protein-protein interactions and the interacting amino acids [10, 11]. Both experimental and theoretical progress will be further augmented by the creation of a large, experimental benchmark set of interacting proteins and their associated quaternary structure.

But one could argue that the world is awash in protein targets, so what use is this? By having most members of the family in a given genome, one could design in specificity to the target of interest [12]. But there is more. If one has a known inhibitor, one can use these predicted low-resolution structures to predict where it binds in about two-thirds of the cases, again pointing out the important regions of the molecule [13]. It remains to be seen if such structures can be used for the large-scale virtual screening of ligand libraries. But even if it turns out that one cannot predict the best lead molecule — if we could guarantee that such a lead is in the top 100 or 1,000 compounds — this could accelerate the drug discovery process and reduce its cost.

The clear direction for the future is to elucidate the role that proteins play in cellular pathways to pick and choose those that are likely to be "druggable." Preliminary indications are that structure prediction can assist in the automated assignment of proteins to known pathways — a first step in this process.

I am not suggesting that the complete solution to protein structure prediction has arrived. There remain difficult issues: refining structures to the higher resolution necessary for a number of applications; the one-third of small proteins that cannot yet be predicted at acceptable accuracy; and the extension to larger, multidomain proteins where the fold library is incomplete. Nevertheless, protein structure prediction is no longer just an interesting and challenging theoretical problem. Rather, computational approaches to the sequence-to-structure-to-function paradigm are becoming a reality.

The bottom line: By prioritizing targets and suggesting a relatively small number of experiments, protein structure prediction can play a practical role in drug discovery.

Jeffrey Skolnick is director of the Center of Excellence in Bioinformatics, University at Buffalo. He can be reached at skolnick@buffalo.edu.

References

1. Bonneau, R. et al. J Mol Biol 322, 65; 2002.

2. Xu, D.; Crawford, O. H.; LoCascio, P. F.; Xu, Y. Proteins Suppl 5, 140-148; 2001.

3. Zhang, Y.; Kolinski, A.; Skolnick, J. Biophys J 85, 1145-1164; 2003.

4. John, B.; Sali, A. Nucleic Acids Res 31, 3982-3992; 2003.

5. Burley, S. K. Nat Struct Biol 7 Suppl, 932-934; 2000.

6. Kihara, D.; Skolnick, J. J Mol. Biol, submitted; 2003.

7. Fetrow, J. S. et al., Protein Sci 10, 1005-1014; 2001.

8. Skolnick, J.; Fetrow, J. S. Trends Biotechnol 18, 34-39; 2000.

9. Arakaki, A.; Zhang, Y.; Skolnick, J. Proc. Natl. Sci. USA, submitted; 2003.

10. Lu, L.; Lu, H.; Skolnick, J. Proteins 49, 350-364; 2002.

11. Lu, L.; Arakaki, A. K.; Lu, H.; Skolnick, J. Genome Res 13, 1146-1154; 2003.

12. Betz, S. F.; Baxter, S. M.; Fetrow, J. S. Drug Discov Today 7, 865-871; 2002.

13. Wojciechowski, M.; Skolnick, J. J Comput Chem 23, 189-197; 2002.









White Papers & Special Reports

sgi whp 2
Managing the Modern Genomics Data Flood
Sponsored by SGI

Managing and storing the perfect storm of multi-disciplined data pouring from next generation sequencers and other omics instruments is a central challenge in life sciences. Discover in this paper how the SGI ArcFiniti storage solution, optimized for unstructured genomics and life sciences data can: 

  • Reduce costs, proactively protect data integrity, and deliver the high performance I/O required for genomics data processing and analysis.  
  • Effectively manage capacities from 156TB to 1.4PB as a disk based, integrated hardware and software platform 


sgi - whp 1
Turning Genomics Data into Practical Insight
Sponsored by SGI

With worldwide sequencing capacity approaching 13 quadrillion DNA bases annually turning genomics data into knowledge is a true computational challenge. Read this paper and learn how the SGI UV coherent shared memory platform can:  

  • Speed results time while cost competitively tackling the most difficult computational problems across all omics disciplines. 
  • Push performance by scaling to extraordinary levels, up to 256 sockets (2,560 cores, 4,096 threads) per single system (one OS image). 

Provide support for up to 16TB of coherent shared memory in a single system image enabling extreme efficiency across a wide range of compute demands. 



accerlys-logo_2012_wh
New Complimentary Market Survey…
Collaborations and Communications Within Drug Discovery Research
Sponsored by Accelrys
This survey was conducted by the Cambridge Healthtech Media Group in January, 2012. It was sponsored by Accelrys related to their HEOS initiative to gather valid information around externalizing collaborative research while improving communications in the cloud. With 310 qualified industry respondents the survey findings reveal useful usage and trends patterns.  An insightful follow-on discussion and webinar related to this survey, and the HEOS by Scynexis SaaS portal is also available on the Bio-IT World website for complementary viewing.
 


Job Openings

tessella logo 
Scientific Software Engineer
Boston MA
$70,000 to $95,000
 
Apply at http://jobs.tessella.com   

oxford nanopore logo 


Early Access Collaborations ManagersClick here to find out more and apply   

Oxford Nanopore's GridION technology, VP, Sales and Marketing Click to  Apply  

For reprints and/or copyright permission, please contact  Tim McLucas, (781) 972-1342, tmclucas@healthtech.com .