Stir into this confusing jumble of data types the fact that the flood of biological data is growing to truly tsunami proportions, and the scope of the analysis challenge becomes clear.
"In statistics, the best situation is one where you are asking one question but have multiple data points from which to derive your answer," explains Tom Downey, Partek's president and CEO. "With something like a gene chip, each gene represents a question, so you end up with 10,000 questions and usually few data points to answer each question."
The change in the quantity and quality of data has forced researchers to find new ways to analyze it. "Most bioinformaticians are experts in databases, data integration, and sequence analysis, not in statistics or numerical analysis," explains Soheil Shams, BioDiscovery's president and chief scientific officer. As a result, life science researchers can spend years just learning how to wring useful data from their new technologies. This situation contrasts sharply with fields such as space exploration, financial forecasting, and defense, in which data-intensive problems are old hat and appropriate tools are plentiful.
Finally, there is a trickle of tools that allow life scientists without statistics expertise to ask sophisticated questions of their data. "A lot of people aren't going to become proficient with Matlab or SAS [professional statistical packages]," says Bill Ladd, director of analytic applications at Spotfire Inc. "So one thing we have focused on is making it possible for a specialist to pick out useful analytics, and deploy those to end users."
At the front line of data mining, however, only experts should apply. "To get consistently good results, you need statisticians and real data miners who can be innovative about their approaches rather than using the same technique over and over again," says John Hotchkiss, chief technology officer at AnVil Inc. "You have to be able to understand the techniques and to tune them."
Back to Deep Sequence Diving