By Malorye A. Branca
August 13, 2002 | An international collaboration sponsored by the Hereditary Disease Foundation (HDF) has created one of the world’s largest public gene expression databases. The NeuMetrix Data Repository, which will be posted on the HDF Web site (www.hdfoundation.org) this month, contains more than 15,000 data files generated from DNA microarray experiments using a variety of animal models for Huntington’s disease and related disorders.
“There are discoveries to be made from this data,” says Gary Churchill, a staff scientist at the Jackson Laboratory and an authority on gene expression data analysis. “People need access to it to make them.”
The data were generated by the Hereditary Disease Array Group (HDAG), which includes approximately 60 investigators at 20 universities. “In the past, [these] types of data would have been collected by 20 competing labs, using many different models and techniques,” says project coordinator Jim Olson, an assistant member of the Fred Hutchinson Cancer Research Center and assistant professor at the University of Washington. “And many of their interpretations would have been incorrect.”
Having data generated on the same platform, in the same laboratory, should yield higher quality results. “You reduce a big chunk of the variability right there,” says Churchill. Microarrays can be finicky to work with, and results between laboratories or across platforms are often difficult to compare.
Samples for the studies came from a variety of rodent models for triplet repeat, or polyglutamine, diseases. These diseases, such as Huntington’s disease, are caused by heritable expansions in repetitive DNA triplet repeats such as CAG (which codes for the amino acid glutamine).
“It takes 30 to 40 years for Huntington’s to show up in a person,” says Edmond Chan, an HDAG investigator and a post-doctoral fellow at the Center for Molecular Medicine and Therapeutics at the University of British Columbia. ÒSo it’s very complicated to model it. Some models replicate earlier stages of the disease, while some resemble the later stages."
Most of the samples were prepared and studied using Affymetrix GeneChips in Olson’s laboratory in Seattle. The researchers then convened to analyze the data and review the findings. Several papers stemming from this work were published in the journal Human Molecular Genetics this month.
The NeuMetrix database boasts what Olson calls “the most powerful search engine I have seen.” The database and its Web-accessible interfaces were all developed by Cambridge, Mass.-based 3rd Millennium Inc., as part of the company’s ongoing Advanced Technology Program grant to develop software for managing biological pathway data.
“We wanted researchers to be able to ask very fine-grained questions, such as ‘Show me all the experiments where the striatal neuronal cells show an increase, at least half the time, for enkephalin probes,’ ” says Jack Pollard, the company’s principal investigator in bioinformatics.
Participants were thrilled by the experience and the results—a wealth of data, some valuable publications, and a great learning experience, particularly for those who had no prior experience with microarrays. “It was hugely successful,” says HDAG collaborator Elena Cattano, professor of pharmacology at the University of Milano. “I’m convinced we would never have been able to reach this goal without Jim and the consortium.”
But the big rewards are clearly still ahead. “Once you have worked with these microarray data sets, you realize immediately that there is so much data it would take an army of people to sort it all out and to comb it fully,” says Ruth Luthi-Carter, another of project’s investigators. Put another way: It’s nice to be analyzing data instead of just generating it.