YouTube Facebook LinkedIn Google+ Twitter Xingrss  

The Data Deluge: Deal or No Deal?

By Kevin Davies

June 14, 2006 | In these days of “omics” overload, researchers and executives are often heard bemoaning the mountains of data that they have to manage. Assorted high-throughput systems are producing torrents and terabytes of data — gene expression patterns, protein networks, DNA variants, imaging data, and clinical records — that must be stored, integrated, and analyzed effectively. Commentators talk of “drowning in data,” and suggest there is already too much information pouring into the drug discovery pipeline, when beleaguered researchers have a hard time dealing with what is already inside.

Writing last month in the journal Genome Research*, Joseph Nevins and colleagues from the Duke Institute for Genome Sciences & Policy argue that, far from decrying the data glut, we should embrace the complexity of genomic and other sources of data, particularly for its predictive properties in the field of personalized medicine.

Since 2000 or thereabouts, advances in microarray profiling, as well as mass spectrometry analysis, have revolutionized the molecular diagnostics of cancer, offering new classifications and prognostics. “The ability to find structure in the transforming biology from an observational molecular science to a data-intensive quantitative genomic science,” Nevins and colleagues write. “The dimension and complexity of such data provide opportunity to uncover patterns and trends that can distinguish subtle phenotypes in ways that traditional methods cannot.”

In 2002, Laura Van’t Veer and colleagues identified a group of 70 genes that serves as a prognostic indicator of the risk of metastasis in women with stage I or II breast cancer. That test has since been commercialized by Agendia, an Amsterdam-based biotechnology company.


PREDICTION: Integrated use of genomic, clinical, and other data to predict clinical and biological phenotypes.

However, analysis by an Israeli group (Ein-Dor, L. et al. Bioinformatics 2005) concluded that this 70-gene cluster is far from unique. In fact, at least eight separate gene sets could be used equally effectively to predict low- and high-risk patient groups. “Simply choosing one “best” gene expression signature, but ignoring multiple other choices that reflect other relevant aspects of cancer biology, is an oversimplification and a potentially dangerous, misleading strategy,” say Nevins and colleagues. Moreover, studies from Stanford University have shown that the prognostic value of the 70-gene subset can be markedly improved by integrating data on a wound-healing set of genes, first described in 2004, along with other pertinent clinical data.

But such integrative models do rely exclusively on gene expression data. An integrated body of data, incorporating mutation analysis, gene expression signatures, serum protein markers, and clinical data, will be necessary for individualized prognosis. Recent reports (e.g., Yanigasawa, K. et al. Lancet 362, 415; 2003) demonstrate the utility of proteomic profiling of cancer patient samples, predicting good and poor outcomes. While there will likely be general similarities between protein and gene expression results, there may be radically important distinctions between the two data sets.

Nevins and colleagues put forward several recommendations in adopting genomic data in pursuit of personalized medicine. The first is using these methodologies to stratify patients in current clinical practice. “No longer should drug treatment studies be performed without a component that attempts to identify those patients most likely to respond to a particularly therapeutic regimen,” write the Duke authors. However, care is required not to sacrifice key molecular information in favor of convenience in designing an accessible “kit” suitable for widespread medical usage.

A final barrier is elucidating a physiological basis for the observed expression patterns that distinguish disease states and patient groups. To that end, new tools such as gene set enrichment analysis (GSEA) developed by Jill Mesirov and colleagues at the Broad Institute, in which genes are analyzed in functional groups or pathways, based on prior biological knowledge, rather than as individual entities, show great promise for cancer, diabetes, and other diseases.

* West, M. et al. “Embracing the complexity of genomic data for personalized medicine.” Genome Res 16, 559-66; 2006.

Click here to login and leave a comment.  


Add Comment

Text Only 2000 character limit

Page 1 of 1

For reprints and/or copyright permission, please contact  Jay Mulhern, (781) 972-1359,