By Salvatore Salamone
March 17, 2004 | In one of the first large-scale diagnostic applications of neural networks, researchers at Children's Memorial Hospital in Chicago are using neural net algorithms to evaluate brain tumors in children. Hospital researchers have found that the algorithms can help them search for gene-expression patterns in microarray data of tumor samples in order to determine appropriate treatment.
Medicine has traditionally relied on pathologists and neurologists to visually classify tumors. However, classification is tricky because some tumors are often not significantly different in appearance.
There are 12 types and subtypes of pediatric brain tumors, and four stages ranging from 1 (benign) to 4 (malignant). Proper identification is critical; incorrectly characterizing a tumor as stage 3 when it is actually stage 2 would result in subjecting a patient to more aggressive chemotherapy treatment than would be necessary.
In situations where visual interpretation is not conclusive, researchers have turned to microarray analysis of tumor samples. But there's a problem -- while some cancers have unique gene-expression patterns that make classification easier, pediatric brain tumors do not.
Patterns must be picked out from the thousands of data points that come from a microarray's output. "You can get 7,000 to 20,000 data points for each sample, and there are about 100 to 150 genes that you are looking at," says Eric Bremer, director of brain research at Children's Memorial. "The problem is how to make sense of them." This led Bremer to explore a variety of data analysis routines; the neural net analysis worked best.
"In a typical data set, you're looking at about 100 samples for each 10,000 variables," says Gregory Piatetsky-Shapiro, president of the research and consulting company KDnuggets, which focuses on data mining and knowledge discovery as applied to bioinformatics. "You want to capture the real differences in the genes, but it is very difficult to separate noise [in the data] from true variations."
Bremer agrees. "You want to find biological variations, not differences introduced in sample preparation," he says.
In addition to running the data through neural net algorithms, Bremer uses decision-tree analysis routines. The neural nets, while more accurate, do not show how they derive their results. That's not the case with the decision-tree routines. So in cases where the neural net and the decision-tree programs agree, reseachers get some insight into how the conclusion was reached.
To run these different analysis routines against the same data, Bremer uses an SPSS software product called Clementine, which incorporates a variety of analysis tools, including neural net algorithms. "It's a [data mining] workbench that makes it easy to try different methodologies," Bremer says.
Using Clementine makes it easier to incorporate data from different sources and subject that data to the same analysis. Bremer has used inhouse data as well as publicly available data (Pomeroy, et al, Nature, 415, pp. 436-442. 2002). This has enabled him to test his classification model on 133 tumor samples and classify them with better than 95-percent accuracy.
Children's Memorial's application of neural nets to microarray data is not unique. Over the past few years there have been about a dozen papers in scientific journals on applying neural net analysis to microarray data where fewer genes are under consideration.
What is significant about the Children's Memorial efforts is the scope of the application, both in the number of genes under consideration (up to 150) and the method of analysis. Most previous efforts used one or perhaps a few different neural net algorithms to search for patterns in microarray data. Bremer uses 40 neural net routines and then looks for a consensus.
Classifications are made only when more than 20 algorithms come to the same conclusion. The higher the number of algorithms that come to the same conclusion, the more certain the diagnosis. In fact, each evaluation of a tumor sample is assigned a confidence level based on the percentage of routines that come to the same classification conclusion.
"It's majority rules," Bremer says.