BY MALORYE BRANCA
Senior Informatics Editor
August 13, 2002 | Long before the human genome was completed, proponents of proteomics were already trumpeting the virtues of proteins over genes. If the goal is to better understand biology and quickly develop new products, they argued, why dally with genes when you could be looking at proteins?
A single gene can give rise to multiple protein isoforms, expressed at different times in different cell types. Moreover, the complex series of biochemical steps from gene to protein make it difficult to interpret the physiological significance of an upregulated (turned on) gene. Proteins, on the other hand, are the molecules that carry out the gene's will. Find the protein, and you've found the biological beef.
As a result, excitement about proteomics has been building steadily for the past few years. Hoping to do for proteins what has been achieved for the human genome, several companies, including Myriad
|Big Picture Proteomics
|In the past few years, a number of commercial groups have toyed with the idea of mapping the human proteome. But the question has always concerned what to map, followed quickly by how to turn that into a profit.
Genetics Inc., MDS Proteomics Inc., and Cellzome AG, have mounted ambitious programs to catalog the entire panoply of proteins and protein-protein interactions in human or yeast cells (see "Big Picture Proteomics," right).
By filling in critical pieces of the genomic puzzle, these types of studies provide a glimpse into a future in which researchers may probe detailed biochemical pathway maps for precise answers to their questions. Unfortunately, to many observers, those rewards appear to be a long way off. Many pharmaceutical companies feel that they have yet to reap benefits they were promised from genomic tools, such as sequence databases, bioinformatics software, and gene expression analysis.
"I think the big pharmas overpaid for a lot of technologies, and they did not get what they wanted," said Michael E. Lytton of venture capital firm Oxford Bioscience Partners, speaking at a recent roundtable discussion at the Harvard Biotechnology Club in Boston.
"A lot of money was deployed, and it turned out that wonderful nature is more complicated than we thought," said Edwin M. Kania Jr., senior managing director and chairman of Flagship Ventures, at the same event.
So, while many experts argue that all that's needed is more time, patience is running out in the boardroom. As a result, there has been a huge downturn in the genomic tools market, putting tremendous pressure on those trying to peddle tools to large pharmaceutical companies.
"It's been tough. Big Pharma is in a wait-and-see mode," says Clarissa Desjardins, executive vice president of corporate development at Caprion Pharmaceuticals Inc. in Montreal. "They are clamping down, and there have not been a lot of deals."
The Proteomics Pipeline
Keenly aware of this trend, many proteomics companies are trying to deliver results quickly, anywhere
Each colored region above represents a distinct protein expression pattern. The first three are from ovarian tumors, the next four from healthy patients. Correlogic's Proteome Quest software distinguishes the differences between them. X axis, nodes or clusters; Y axis, signal intensity; Z axis, features (mass value derived from mass spec).
they can in the drug discovery and development pipeline.
The goal is to build an ironclad business model and offer a value proposition that can break down doors. For example, Caprion's expertise is in protein localization and bioinformatics; its first goal is to find valuable new protein targets on the cell membrane. "What it takes is data, and we are presenting more and more compelling data," Desjardins says. "So we are confident we will close a major proteomics deal."
Working with proteins is a cumbersome process, especially when compared to genomics researchers, who can effortlessly generate copious amounts of data using gene sequencers and DNA arrays. Anyone who has seen a 2-D gel, let alone tried to analyze one, would admit that traditional proteomics is by no means high-throughput. Until now, the choice has been to trudge through the proteome, or race through the genome.
But proteomics is catching up. Upgraded tools, new alternatives, and some fancy accessories — such as breakthrough 2-D gel image analysis software — are making it much less grueling to do large-scale protein expression and abundance studies.
Australia's Proteome Systems Ltd. even offers a wholesale solution — the ProteomIQ system. This "soup-to-nuts" platform contains everything from the basic reagent kits, 2-D gel apparatus and imaging equipment, TOF and MS/MS mass spec instrumentation, IT hardware, and the informatics required to draw meaningful data from it all. Instruction and consulting services are also offered to help customers get a quick start.
Some labs are bypassing 2-D gels altogether, thanks to new mass spectrometry-based approaches and proteins chips. "We've been doing proteomics for 10 years, but only since we have been using arrays has it been parallelized, reproducible, and user friendly," says Ian Humphery-Smith, chief scientific officer of Glaucus Proteomics B.V. in the Netherlands. Glaucus is developing antibody and protein arrays for lead optimization, target identification and validation, and diagnostics.
Advances in protein chip technology (see "Protein Chips Go Public," right)
|Protein Chips Go Public
|It might be tough to tear some protein researchers away from their 2-D gels, but the majority would embrace protein chips if the technology was affordable, accessible, and offered the proteins that interest them.
could eventually help proteomics eclipse gene expression analysis as the most popular activity in genomics laboratories. Some problems still need to be ironed out, including the challenge of gathering and correlating the information necessary to turn proteins into products, but the promise of proteomics is evident through the recent explosion of compelling data on a variety of fronts.
For example, several proteomics studies using Ciphergen Biosystems Inc.'s ProteinChip System have found biomarkers for ovarian cancer using protein chips. These include one from researchers at Boston's Brigham and Women's Hospital, and another published in February in The Lancet from Lance Liotta and colleagues at the National Cancer Institute, in cooperation with bioinformatics firm Correlogic Systems Inc.
For proteomics toolmakers, the first challenge has been to push proteomics beyond the discovery arena. "I think that genomics was oversold as a target identification technology," says Ruth VanBogelen, head of genomics and proteomics for global research and development at Pfizer Inc. "I was at a meeting recently where therapeutic heads were saying, 'RNA profiling, we don't see the value in it.'"
She says the value lies in lead optimization — the refinement of drug candidates. "We applied proteomics to lead compound optimization from the beginning, and no one here is saying, 'proteomics has no value,'" VanBogelen says. A couple of years ago, Pfizer also shifted RNA profiling over from new target identification to lead optimization. "After two years of trying to resell it, I think we are making good strides," she says.
VanBogelen hasn't given up on genomics and proteomics for target identification and validation, but she says, "The new targets will come later on." The problem is that genomics represents a true paradigm shift for the industry, a completely different way of doing things. "The data we are generating will get us there, but we need more time to figure out the biology," she says.
She is pleased with the progress thus far, which she attributes to now having a better understanding of molecular physiology. "After 10 years of using proteomics for lead compound development, I'm now able to give the therapeutic areas some ideas about how to screen for new mechanisms or targets," VanBogelen says. "I can now say, 'This is what I think your best target is,' because I'm starting to understand how the cell is thinking."
But not everyone is on the fence about using proteomics in the discovery arena. The trick, some argue, is to combine proteomics, genomics, and other data in a holistic, systems biology approach. "Proteomics is one part of functional genomics; it does not stay by itself," says Dalia Cohen, vice president and head of functional genomics at Novartis AG. "Proteomics is a piece of the overall puzzle, which is the disease."
The strategy at Novartis is to combine proteomic data with gene expression, functional assays, and
"A few years ago people thought DNA chips would solve all our problems. But the strength is in having all the tools you need to solve the problem."
Dalia Cohen, Novartis AG
model organism studies using a variety of tools. For target discovery, one approach is immunoprecipitation and 1-D gels. The group is using cutting-edge mass spectrometry techniques, like those used in the recent studies from MDS Proteomics and Cellzome for protein interaction mapping."We are looking for associated proteins that do the same work in the cell," Cohen says. "The best approach is to map them to a pathway, and then study them further."
The Novartis researchers are also applying proteomics in looking for disease markers and in lead optimization, typically using model organisms and 2-D gel electrophoresis. But the trick, Cohen says, is how you pull the information together. "A few years ago people thought DNA chips would solve all our problems," she says. "But the strength is in having all the tools you need to solve the problem."
For example, DNA chips can sometimes point to markers of disease. "But deciding whether you want to go after those markers for drug discovery requires more data from other tools," Cohen says. Markers, she points out, don't necessarily explain the role of a gene in a disease, which is necessary to truly develop novel drugs.
But biomarkers still make good prognostics and diagnostics, and many proteomics companies will be happy just getting some in hand, without necessarily understanding the role of these markers in disease.
Fremont, Calif.-based LumiCyte Inc. is one such company, "We are now capable of routinely generating protein maps with upwards of 1,500 proteins from just microliter quantities of unfractionated human serum," says LumiCyte CEO T. William Hutchens.
With help from clinicians, LumiCyte researchers try to answer two questions. First: How common are a set of protein expression changes to a given population? And second: How many other health aspects do these changes correlate with? Their goal is to generate biomarkers that can be used to determine drug response and diagnose disease.
This involves building and maintaining a set of health- and disease-specific protein profiles. "We can take a sample, run it, and compare it to an image of the protein profile of a patient with early-stage prostate cancer," says Hutchens. LumiCyte uses SELDI (surface-enhanced laser desorption/ionization) with protein arrays to generate these profiles. The proteins are captured and then analyzed and identified directly on the surface of the chip using mass spectrometry.
The Impact of Informatics
Making the leap from data to products involves major informatics challenges. It's not that analyzing proteomic data is that much different from gene expression analysis. "The problems associated with mass spectrometry data are the same as with microarray analysis," says Sandip Ray, president and CEO of Canada-based X-Mine Inc. "The number of variables you work with are in the thousands, and out of that, you are trying to find a few that are important."
Ray says he believes that for these types of analyses researchers should use statistical approaches called "supervised methods." X-Mine offers a set of supervised analysis tools and related services. In supervised methods, the researcher starts by putting the data into certain categories, such as tumor or nontumor, and a computer is programmed to pick out the markers that distinguish the types. "Supervised analysis is important, because you are focusing on the domain expertise of the experimenter," Ray says. "All those clustering models do not let you exploit your experimental annotation enough."
The trick is distinguishing false associations from those with real statistical power. "It's a pattern recognition problem," says Nikolaj Ivancic, vice president of software engineering at LumiCyte.
As a result, researchers not only have to rigorously validate the markers, they must also comb through as much information related to their samples as possible. LumiCyte's researchers use about a half-dozen analytical tools to generate their own data, and they employ several search engines to peruse hundreds of different databases, including DNA sequence, gene expression, crystal structure, and biochemical pathways, as well as databases of clinical and disease information.
It's not just the amount and range of data, but the types of calculations that must be made. One of Glaucus' aims is to create arrays for studying all the proteins in any tissue. Because there could be as many as a half-million proteins in the human body, Glaucus will approach this goal one chip at a time. Once all the biological, clinical, and other data needed to characterize proteins and antibodies are added, making associations between these data becomes a massive computing task.
"Most human diseases are multigenic," Humphery-Smith says. "So they are caused by combinations
|Our list of proteomics suppliers.
and permutations of different genes. This puts you immediately into high-dimensional space, and you shut down most computers once you hit the return key on that kind of a problem."
As their studies are completed, the Glaucus researchers will also be storing scanned images from the processed arrays. "We are looking at imaging hundreds of thousands of biochips designed to screen antibodies for specificity," Humphery-Smith says.
To tackle this challenge, Glaucus has teamed up with Dutch IT heavyweights SARA Computing and Networking Services and GigaPort. SARA hosts the national supercomputer for the Netherlands and gives Glaucus access to a 1,024-CPU system comprising two SGI Origin 3800 512-CPU systems with a maximum processing speed of 1 teraflop per second. Glaucus will connect with SARA through the GigaPort network, currently providing a 1gbps connection.
"We get speed, fantastic storage, security, optimized systems, and the ability to do huge amounts of computation," Humphery-Smith says, "and we didn't have to spend $10 million and 10 months to get it up and running."
But some observers say Glaucus' informatics foresight is the exception.
"Every person I talk to has failed to budget adequately for their informatics needs," says John Macchia, vice president of sales and marketing at Proteome Systems. "They think about the mass spec data, but they forget they also have to account for where the sample came from, how you treated it, and assemble that with other data to get results."
The bulk of proteomic data is generated by mass spectrometers. Caprion's eight instruments run around the clock, and generate 60GB of data per machine, per hour. The company uses 1-D gels, and feeds identified proteins directly into the mass spec machines. "This forced us to innovate and develop a new method for measuring protein abundance," Desjardins says, which produced a proprietary profiling system.
"People have known for a long time that the amplitude of the ions flying through the mass spec is proportional to the abundance," Desjardins says. The challenge was to create software that could measure the abundance of every ion, and correlate that back to a particular protein. "Everyone else is working on this, so having done it first gives us a unique competitive advantage."
To support their CellCarta high-throughput protein identification system, Caprion researchers have a range of systems including a Sun Fire 6800 server, capacity for up to 100 terabytes of online storage, and Sun grid software to support a distributed computing system.
Clearly, IT will continue to be a major part of proteomics.
"This won't go away," Desjardins says. "Studying tens of thousands of proteins is a new paradigm that requires IT infrastructure and integrating your IT with research from the beginning."