YouTube Facebook LinkedIn Google+ Twitter Xinginstagram rss  

June 11, 2002 | Comparatively speaking, searching and matching gene sequence data is straightforward compared to the tasks data miners face today. The nature of genomic data is changing dramatically and now includes content as complex as photomicrographs of single cells and pictures of tissue slices. Most of the new data are "high-dimensional," meaning they contain many variables and require sophisticated analysis tools. In addition, much of the data are numeric, generated as long tables of numbers representing, for example, the relative expression level of genes in a gene chip experiment.

Stir into this confusing jumble of data types the fact that the flood of biological data is growing to truly tsunami proportions, and the scope of the analysis challenge becomes clear.

"In statistics, the best situation is one where you are asking one question but have multiple data points from which to derive your answer," explains Tom Downey, Partek's president and CEO. "With something like a gene chip, each gene represents a question, so you end up with 10,000 questions and usually few data points to answer each question."

The change in the quantity and quality of data has forced researchers to find new ways to analyze it. "Most bioinformaticians are experts in databases, data integration, and sequence analysis, not in statistics or numerical analysis," explains Soheil Shams, BioDiscovery's president and chief scientific officer. As a result, life science researchers can spend years just learning how to wring useful data from their new technologies. This situation contrasts sharply with fields such as space exploration, financial forecasting, and defense, in which data-intensive problems are old hat and appropriate tools are plentiful.

Finally, there is a trickle of tools that allow life scientists without statistics expertise to ask sophisticated questions of their data. "A lot of people aren't going to become proficient with Matlab or SAS [professional statistical packages]," says Bill Ladd, director of analytic applications at Spotfire Inc. "So one thing we have focused on is making it possible for a specialist to pick out useful analytics, and deploy those to end users."

At the front line of data mining, however, only experts should apply. "To get consistently good results, you need statisticians and real data miners who can be innovative about their approaches rather than using the same technique over and over again," says John Hotchkiss, chief technology officer at AnVil Inc. "You have to be able to understand the techniques and to tune them."

—Malorye Branca

Back to Deep Sequence Diving 

For reprints and/or copyright permission, please contact Angela Parsons, 781.972.5467.