YouTube Facebook LinkedIn Google+ Twitter Xingrss  

By Kevin Davies

March 7, 2002 | In 1994, a young Australian biologist named Marc Wilkins was grappling for a simple term to portray the complete set of proteins encoded in a genome. It took a few years for his suggestion to take hold, but "proteomics"— the catchy counterpart to "genomics"—has become a fixture of biotechnology-related research papers, press releases, and business plans. As the industry (and media) spotlight that was once trained on the race for individual genes and later complete genomes shifts to this new field of discovery—"Introducing the biology of the future" cried one recent press release—it is worth asking whether the science lives up to its breathless billing. Judging by a clutch of reports published recently in the journals Nature, Science, and Current Biology, the hyperbole may in fact be justified.

The term may be relatively new, but the basic prerequisites for proteomics have existed for several decades. Two-dimensional gel electrophoresis is the routine if tedious method for separating complex mixtures of proteins, while sequence and structural databases have been an indispensable component of protein research for years. Thanks to recent advances in mass spectrometry and genome sequencing, researchers finally have the tools to conduct systematic surveys of the proteomes of various molecular machines, tissues, and organisms. While the first priority is to take a full inventory of proteins expressed in a given cell or tissue, researchers are particularly interested in mapping the complex network of physical partners for each protein. The logic is simple: if two proteins specifically associate under physiological conditions, there is probably a functional reason.

While the goal for proteomic companies is to translate information on human protein pathways into drug targets, many groups are ramping up by studying model organisms with more tractable protein collections. The rationale is not unlike Celera's decision to sequence the DNA of the fruit fly Drosophila melanogaster before embarking on the human genome. But whereas fruit flies carry some 14,000 genes, the consensus choice for initial proteomic studies is the baker's yeast, Saccharomyces cerevisiae, the genome of which was completely sequenced back in 1996. Possessing a mere 6,000 genes, yeast has just a fraction of the roughly 30,000 to 40,000 genes in the human genome, not to mention a five-year head start in terms of functional analysis.

Introducing the Interactome 
Writing in the January 10 issue of Nature, two industrial/academic consortia—featuring investigators at MDS Proteomics in Canada and Denmark, the other the German company Cellzome AG—describe impressive progress in organizing the yeast proteome. "A formidable challenge of postgenomic biology," according to Anne-Claude Gavin, Giulio Superti-Furga and colleagues at Cellzome, "is to understand how genetic information results in the concerted action of gene products in time and space to generate function." The first step in that quest is to characterize about 30,000 protein-protein interactions—the "interactome"—in yeast, assuming each protein has 5 partners on average. The Cellzome approach is called tandem-affinity purification (TAP), but both methods are quite similar: first, prepare a "bait" protein by attaching a chemical tag. Next, introduce the DNA encoding the bait into a yeast cell. Then, fish out the bait proteins along with any attached partners by running the purified protein mixture through an affinity column. The resulting protein complexes are fingerprinted using mass spectrometry (MS) and identified using bioinformatics.

The joint effort from Cellzome and the European Molecular Biology Laboratory

Featured Reports
A-C. Gavin et al. "Functional organization of the yeast proteome by systematic analysis of protein complexes." Nature 415, 141-147 (2002).

Y. Ho. et al. "Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry." Nature 415, 180-183 (2002).

A.H. Tong et al. "A combined experimental and computational strategy to define protein interaction networks for peptide recognition modules." Science 295, 321-324 (2002).

J.S. Andersen et al. "Directed proteomic analysis of the human nucleolus." Current Biology 12, 1-11 (2002).

A.H. Fox et al. "Paraspeckles: A novel nuclear domain." Current Biology 12, 13-25 (2002).
in Heidelberg, Germany, studied more than 1,700 yeast genes and identified 589 purified tag proteins, 80 percent of which were associated with other proteins. Following MS of these proteins (some present in amounts as small as 15 copies per cell) and bioinformatic data searches to identify redundancies, Gavin's group was left with 232 distinct protein complexes, sometimes referred to as "molecular machines." Ninety-eight of these complexes had been previously catalogued and deposited in the yeast protein database, but over 130 complexes were previously unknown, and 91 percent contained at least one protein of unknown function. In several cases, the authors found large complexes that are virtually identical in yeast and human cells—not surprising given that 40 percent of yeast proteins are conserved through evolution.

At first glance, renderings of the maze of protein-protein interactions pouring out of such studies is akin to some incoherent piece of modern art, but Superti-Furga suggests that a better analogy is to a French pointillist painting. "If you stand too close," he explains, "all you see are single-colored dots. As you move away, you begin to see a coherent picture."

A similar approach called high-throughput mass spectrometric protein complex identification (HMS-PCI) was taken by Yuen Ho and coworkers from the University of Toronto and MDS Proteomics. Working from 600 baits, they identified more than 1,500 distinct interacting proteins, or 25 percent of the yeast proteome. Data on these novel complexes has been entered into the recently created Biomolecular Interaction Network Database (BIND), produced by co-authors Gary Bader and bioinformatician Christopher Hogue, which stores data on protein-protein interactions. The database includes a tool called PreBIND, which can be used to search abstracts in the scientific literature for information on protein-protein binding. (Visit for more information. Full details of the yeast proteome maps are available at and

Collaborative approach 
Another multicenter analysis of the yeast proteome demonstrates the value of combining "wet lab" and computational approaches for protein identification, thereby helping to discard some of the inevitable false positives. Four groups—led by Charles Boone and Hogue (University of Toronto), Stanley Fields (University of Washington), and Gianni Cesareni (University of Rome)—used two complementary methods to identify proteins that bind to a well-known protein-binding domain called SH3. The first involved a computational search for ligands that could potentially bind one or more of the 24 yeast proteins containing the SH3 domain. The second used the classic two-hybrid method developed by Fields to identify gene products binding to the SH3 domain. Combining both datasets revealed 59 interactions in common.

These important studies are simply the first round in attempts to characterize functionally the yeast proteome, but given that the human genome may contain only five times as many genes as yeast, the Cellzome group concludes the current technology "may provide drug discovery programmes with a molecular context for the choice and evaluation of drug targets."

Indeed, in results published contemporaneously in Current Biology, the first steps toward that goal have been taken. In what is the largest proteomic study so far for a single human organelle, the groups of Angus Lamond at the University of Edinburgh and Matthias Mann in Denmark have teamed up to compile an inventory of the components of the human nucleolus. Originally described more than 150 years ago by Rudolph Wagner, the nucleolus is a dynamic, membrane-free compartment of the cell nucleus where many components of the ribosome (the cell's protein synthesizing machinery) are produced. Using nanoelectrospray tandem MS to analyze the purified components, the Lamond-Mann group identified 271 nucleolar proteins, of which fully 30 percent were novel. Many were quite unexpected, including factors typically associated with protein synthesis and the cytoskeleton. The authors also describe dynamic novel nuclear compartments called paraspeckles, which are thought to be involved in the processing of RNA.* 

For reprints and/or copyright permission, please contact  Jay Mulhern, (781) 972-1359,