YouTube Facebook LinkedIn Google+ Twitter Xingrss  

Gene expression signatures are becoming powerful diagnostic tools, but work remains to make them consistent enough to have an impact in the clinic. 

By Malorye Branca 

April 7, 2002 | As an oncologist, Todd Golub knows exactly what he wants from a diagnostic test: high accuracy, low cost and ease of use. But as a researcher, Golub is measuring gene expression patterns in cancer by using DNA microarrays, one of the most complex, expensive, and, until recently, unreliable tools available.

Now, a series of reports suggests that microarrays are on the verge of becoming powerful diagnostic tools. "Over the last 24 months there's been a shift in the kinds of papers coming out," says Stephen Friend, vice president at Merck & Co. and a co-founder of Rosetta Inpharmatics. "More people are using arrays to identify distinguishing patterns, not just to find new genes."

And the timing of these findings is no coincidence. Researchers are capitalizing on the introduction of better arrays and improved informatics to obtain more reliable results and to tackle harder questions. "There's been a blossoming of the field," Golub says.

Creating diagnostics from gene expression data will depend on refining these tools even further, getting more details on the biology of cancer, and developing highly accurate markers of disease, called molecular signatures.

Among the recent reports are several from Golub's teams at the Whitehead Institute for Biomedical Research in Cambridge, Mass., and the Dana Farber Cancer Institute in Boston. In one study, published in the January issue of Nature Medicine, the group used microarrays to examine gene expression patterns in cells from patients with diffuse large B-cell lymphoma, the most common form of adult lymphoid cancer. Data from 13 genes was enough to accurately separate the patients into two groups: those likely to die from the disease, and those with a high chance of surviving it.

In a similar study published in the January 31, 2002, issue of Nature, researchers from The Netherlands

"People are beginning to leverage other experiments. They have realized that if you keep the experimental conditions coherent, you can use the data you generate to ask other questions later on." 

Stephen Friend, Merck & Co. 

Cancer Institute and Friend's team measured gene expression patterns in breast tumors using DNA microarrays. They described a molecular signature that spells a high probability of recurrence and a poor prognosis for the patient. Such markers, the researchers suggest, could be used to select patients who need further therapy, as well as those who can be spared it.

Other researchers are finding signatures that distinguish certain types of stomach tumors, leukemias, lung cancers and more. This wave of findings is one of the most encouraging signs so far that the widely-heralded technology of DNA microarrays is about to play a major role in clinical diagnostic testing. These studies stand out not just because of their increasing number, but also because of the strength of their findings, and the fact that many use marker genes with known effects on cancer-related processes. "The fact that gene expression patterns that are both biologically and clinically relevant actually exist is a huge step forward," Golub says. "Three or four years ago that was just a theoretical consideration."

The challenge on the commercial front is to optimize the arrays and informatics in a clinic-friendly kit that gives better results than the standard tests, quickly, and without requiring a surgical procedure or a technician with a doctorate to run the analysis.

Dozens of companies are hoping to take the lead in this nascent field, which offers some of the earliest and most significant commercial opportunities from genomics. In a recent report published by Waltham, Mass.-based Decision Resources, consultant Ken Rubenstein projected that by 2007, tests based on gene expression analysis will account for $70 million of a $900-million worldwide post-genomic diagnostic market. But growth of this market may be slow initially because such tests will be radically different from those in use. Once doctors and technicians become familiar with the new tests and convinced of their value, sales are expected to soar.

Another obstacle is that microarrays and other genomic- or proteomic-based tools generate far more data points than current diagnostic tests, which typically examine just one or two markers. This profusion of data is both a great challenge and a strength. On the one hand, it makes the tests more complex, harder to validate and interpret, and more likely to draw scrutiny from regulatory agencies. At the same time, Rubenstein says, "Looking at so many data points should give you greater certainty and also provide the opportunity to answer multiple questions. And there is tremendous value in that."

A New Era
Capable of measuring the expression of tens of thousands of genes in parallel, DNA microarrays caught on immediately after their development in the mid-1990s by academic labs such as Patrick Brown's at Stanford University and companies such as Affymetrix Inc. (see "What Sits on A Chip," below, and "Making DNA Microarrays," at right). But early chips for gene expression analysis produced results that were sometimes inconsistent or, worse, incorrect.

The overall quality of the chips has improved recently with a crush of new competitors entering the field. Among

What Sits on a Chip 
They are tiny and quick, but beyond that, most chips used in genomics have little in common with their silicon counterparts.

Read More 
them are companies such as Agilent Technologies Inc. and Motorola Inc. — instrument manufacturers that are quality control experts. Agilent, for example, tests its arrays for multiple parameters that include sensitivity, array uniformity and specificity. Affymetrix has taken steps to improve results from its chips. The probes applied to the chips are being more carefully selected, and their sequences are now publicly available. In all, about 20 companies are competing in the microarray market, and Rubenstein estimates that market will grow from approximately $500 million in 2002 to $1.2 billion by 2006.

The stakes are much higher in the clinic than in the research lab, and thus the demands on the microarray technologies are also different. Today's research tools for gene expression need serious makeovers to become diagnostic kits. Companies are already taking the necessary steps. Management at Affymetrix is confident its chips can be transitioned to the clinic. "Diagnostics is a promising application for microarrays," says Elizabeth Kerr, Affymetrix' director of marketing for gene expression. "We think we are well-positioned because of the reliability of our product, and we are working closely with customers that have data sets we think could lead to great diagnostics." Affymetrix's key partners in this arena include Roche and bioMerieux.

But others question whether DNA microarrays will ever be consistent enough to have an impact in the clinic. It is still not uncommon to find variation as high as 20 percent in the results from manufactured chips, and that's not good enough for the clinic. "Microarrays won't cut it, at least not soon," says Rubenstein, who considers bead-based systems, such as those from Luminex Software Inc. and Illumina Inc., the leading contenders. In these systems, probes are attached to tiny coded beads. Potential advantages of bead-based systems are that the beads can be reliably manufactured in bulk, and when they are optimized, these systems have extremely high throughput.

Another possible player is Applera Corp., the parent company of Celera Genomics and Applied Biosystems. "Celera is a content factory, and Applied Biosystems has access to key technologies," Rubenstein says. Applied's TaqMan, for example, is regarded as one of the most accurate tools on the market for quantitative gene expression analysis, but it is also one of the most expensive. TaqMan is a popular form of real-time PCR (polymerase chain reaction) and is used with Applied's 7700 Sequence Detection System.

The eTag system from Aclara BioSciences Inc. is another hot prospect. Aclara has developed more than 300 of these tiny fluorescent molecules, which have unique properties that allow them to be separated easily by electrophoresis. Notably, eTags can measure protein as well as gene expression. Over the next few years, the company plans to create a LabCard that accommodates the entire test — from assay through electrophoresis. Each card would include 96 capillaries, and more than 30 data points could be evaluated in each capillary. "It can generate more than 3,000 data points in about four hours, it's extremely sensitive, and it has high sample throughput," Tina Tian, Aclara's director of genomics, says. "That makes a very reasonable diagnostic."

Stepping Up to Statistics
During the early years of microarray studies, researchers relied largely on two methods for analyzing gene expression: fold change and hierarchical clustering. Today, few experienced researchers would limit

Gene Chips in Action 
Stephen Friend and colleagues used gene signatures to predict whether breast cancer was likely to recur. They reported that their method was more accurate than conventional tests.

Read More 
themselves to these tools (see "Taking AIM to Track Analysis," below). Fold change, in particular, works simply by calculating the proportional change in expression across experiments for each gene and setting an arbitrary threshold for declaring a change to be significant. "Fold change with arbitrary cutoffs is appealing because it's simple and intuitive," says Nat Goodman, a bioinformatics consultant and the author of DNA Microarray Informatics (a report from Cambridge Healthtech Institute). "But you cannot get reliable conclusions from a study involving tens of thousands of data points if you aren't using real statistics."

What's needed are methods that can deal with a large number of data points and a large variation in results. Variation can creep into experiments through manufacturing processes, experimental designs, or the selection of analytical parameters. There is also a wealth of biological variation, which cannot be controlled. Particular DNA sequences, for example, will bind their probes more tightly than others simply because they contain a different proportion of A's and T's to C's and G's — the bases that make up the nucleotide sequence. This can affect the measurement of expression at a particular spot on the array.

Recognizing the analytical challenges, many microarray users have begun adopting hard-core statistical methods, such as t-tests and ANOVA error models. Academic statisticians and companies such as Rosetta Biosoftware, Silicon Genetics, and Partek Inc. are leading the charge in this regard. With better quality data points in hand, researchers are also seeking more sophisticated numerical analysis tools, including principal components analysis, neural networks and support vector machines. More than 20 companies now offer statistical and numerical analysis packages for microarray researchers. Many specialized tools are also emanating from the academic and government research communities, with some making their way to commercialization. Many resources, including analysis software, guides and evaluations, are available on the Web (see below).

The new tools are already having important effects. "People are talking about how these robust statistical methods are helping them get past the large-scale effects to the small-scale effects, and that is a critical development because it takes you beyond the obvious," says Goodman, adding that the best is yet to come. "There is a huge body of mathematical and statistical methods available based on work in other fields, and we are only at the first stages of fitting those methods into microarray research."

Goodman predicts that it will take another year or so for statistical methods to fully penetrate the field. The next hurdle will be to introduce more powerful numerical analysis methods to supplement the simple clustering methods now in vogue — an improvement that will be essential for accurate diagnostics.

Already, new methods are emerging to pull patterns from microarray data, such as plaid clusters, terrain maps, theme maps and relevance networks. "We don't yet know what the best clustering methods are for microarrays," Goodman says. "But it is likely there will end up being several, and some of them may be completely new to this field."

Good analytical tools are not all that is needed to handle data. The fact that the field lacks a universal platform, set of protocols, and language has been a serious problem (see "Setting the Standards for Microarray Data," page 9). But more data is becoming publicly available for testing and evaluation, and wider scrutiny by an increasing number of researchers should cull out mistakes and improve the data's quality. New tools for tracking steps in analysis are also becoming available.

Signatures of Disease
With these improved tools in hand, researchers are now trying to get more from their data. "People are beginning to leverage other experiments," Friend says. "They have realized that if you keep the experimental conditions coherent, you can use the data you generate to ask other questions later on."

They are also asking new kinds of questions. "Before, most people did hierarchical clustering," Friend says. "That tells you what the predominant pattern is." Now researchers are more likely to pick out genes that seem to be at the root of a process, such as cell cycle control, and see what genes are clustering in it — what Friend calls "molecular seeding." "By doing molecular seeding instead of hierarchical clustering, you end up with less intense numbers, but with more specific signatures. It is microclustering rather than macroclustering."

Some groups will use DNA microarray data as a pointer to potential protein markers, while others may use a

Making DNA Microarrays 
The four steps involved in making DNA microarrays.

Read More 
combination of techniques to identify a variety of marker types. For example, researchers at Variagenics Inc. are studying gene expression, genetic variation and other tumor-specific biological markers, such as loss of heterozygosity. Their strategy is to focus on well-understood pathways, such as those involving the activity of common chemotherapy drugs, and then identify any markers that will predict whether a tumor will respond to the drug or resist it.

Understanding function is key. Finding a marker with a known and relevant function helps to validate a test. Conversely, finding a marker linked to an unknown pathway leads to other questions. "One of the big questions is: How heterogeneous is cancer?" says Charles M. Perou, a cancer researcher at the University of North Carolina. "Many of the patterns we are seeing indicate unique biological properties. Once we understand the biology behind these new cancer subtypes, we might be able to find ways to treat them, and then these tests go from being prognostic to being predictive."

At the University of Pennsylvania, computational biology director Michael Liebman's group is using gene expression and a variety of clinical data to build models of disease progression. "We can then link pathology to genomics data at a much higher degree of resolution," he says. This program covers multiple diseases, including several cancers. The goal, he says, is to find signatures that are finely detailed, "Knowing it's a ductal carcinoma in situ and what stage it is at is not enough," Liebman says. Researchers need red flags that tell them how a tumor is going to progress, and what might stop it.

To be validated, findings will need to be replicated by multiple, large and independent studies. "What's left," Golub says, "is to show that if you identified the signature in one data set, it still works when another researcher applies it to the same identical question, but in a completely different data set." This will not be straightforward, however, because most researchers still do their studies quite differently, appropriate clinical samples are hard to come by, and these samples are diagnosed using methods — such as histopathology — that are somewhat arbitrary and imprecise themselves. "If the data from the array are precise, but the diagnosis of the tissue was inaccurate, then the result is still inconsistent," Liebman says.

Putting It All Together
Despite these challenges, the potential of microarrays to improve diagnostics is attracting a growing numbers of scientists. Several trends supporting diagnostic development are emerging, including the demand for more focused chips that contain specific subsets of the human genome. The use of new platforms, such as the bead-based systems, is also expanding, as researchers seek higher throughput and greater reproducibility. More groups are also combining their expression results with information about function and other data, either for validation or to optimize their choice of markers.

Cancer is the lead target for new microarray-based tests. "Many of the drugs in use for cancer are very toxic

Taking AIM to Track Analysis 
Getting a satisfying answer from a microarray experiment requires managing a lot of data, and doing many manipulations of the data. However, researchers have few efficient methods available for keeping track of those critical analysis steps.

Read More 
and have low success rates," says Anne Bailey, vice president of diagnostic and process development at Variagenics. Oncology is also an area of particularly high unmet need — and one in which pharmaceutical companies see a large opportunity with few risks to their established drug markets. "A lot of the older drugs are off-patent, and the newer drugs have high toxic effects, too," Bailey says. "By giving the physicians better tests to help them choose the right therapy for their patients, you don't decrease drug use, you may actually expand it."

During the next few years, budding gene expression-based diagnostics will be tested as "home brews" in research settings, where Food and Drug Administration (FDA) approval is not required. And this is where developers will put the finishing touches on their products, readying them for widespread clinical use.

But there is intense pressure to produce profits quickly. Lab costs to make a specific array range from $15,000 to $55,000, and manufactured chips run from $50 to $1,000. Many companies use hundreds, even thousands of arrays a year, and then there are the costs of reagents and staff. A single, successful diagnostic could ease that pain substantially.

From the FDA's perspective however, microarrays have a long way to go. "I think the big problem is the lack of standardization of the platforms and the statistical programs," says Joseph Hackett, the FDA's associate director for clinical laboratory devices. "We are very concerned about variability from test to test." Another concern for Hackett is the challenge posed by the number of data points. "We ask companies how many data points they have, and they just don't answer," he says. The FDA and the Pharmaceutical Manufacturers Association task forces are examining some of these regulatory issues.

Experienced diagnostics companies will have a huge advantage over new entrants. "Companies like Beckman Coulter, Abbott, Bayer and Roche are the ones who will be first to turn these out," says Variagenics' Bailey, whose company will develop its kits in partnership with diagnostics firms. Other genomics firms may follow the same strategy.

But with so many intriguing results, it is almost certain some newcomers will try to break into the burgeoning field of molecular diagnostics, and that's when things will really get interesting. "With these results, we are already adding value far above and beyond what you can get from the current clinical markers," UNC's Perou says.

And Friend offers these words of caution: "One thing that will slow this emerging industry is if people think that they already have the best method, or that a single method will serve them. Eighty to 90 percent of the most potent tools are yet to come." * 

Essential Web Resources 
The Web offers an assortment of gene expression databases, analysis tools, and useful guides. Here is a representative sampling:

ABRF Microarray Research Group Study: A Current Profile of Microarray Laboratories

The National Human Genome Research Institute (NHGRI) Microarray Project

The National Center for Biotechnology Information's Gene Expression Omnibus

The Stanford Microarray Database genome

European Bioinformatics Institute's Microarray Informatics site

Gary Churchill's site at the Jackson Laboratory

Y. F. Leung's site

Terry Speed's site

Tool Suppliers 
Following is a list of some suppliers of DNA microarrays and/or related informatics tools:

Affymetrix Inc., Santa Clara, Calif.

Agilent Technologies Inc., Palo Alto, Calif.

Amersham Biosciences, Uppsala, Sweden

Applied Maths, Sint-Martens-Latem, Belgium

BioDiscovery Inc., Marina del Rey, Calif.

BD Biosciences/Clontech, Palo Alto, Calif.

GeneData AG, Basel, Switzerland

Gene Network Sciences, Ithaca, N.Y.

Genomic Solutions Inc., Ann Arbor, Mich.

InforMax, Bethesda, Md./Oxford, U.K.

Iobion Informatics, La Jolla, Calif.

Invitrogen Corp., Carlsbad, Calif.

LION Bioscience AG, Heidelberg, Germany

MiraiBio Inc., Yokohama, Japan

Motorola Life Sciences, Northbrook, Ill.

Nanogen, San Diego, Calif.

NuTech Sciences Inc., Atlanta

Partek Inc., St. Charles, Mo.

PerkinElmer Life Sciences, Boston

Rosetta Biosoftware, Kirkland, Wash.

Silicon Genetics, Redwood City, Calif.

Spotfire Inc., Somerville, Mass.

Stratagene, La Jolla, Calif.


For reprints and/or copyright permission, please contact  Jay Mulhern, (781) 972-1359,