By Malorye A. Branca
February 10, 2003 | In a dramatic preview of post-genomic research, a team of bioinformaticians and geneticists has identified the gene defect underlying Leigh Syndrome, French Canadian type (LSFC) -- a fatal hereditary disease prevalent in the Saguenay-Lac-Saint-Jean (SLSJ) region of Quebec. The discovery, which resulted from cross-referencing DNA, protein, and gene expression databases, is published in the January 14, 2003, issue of the Proceedings of the National Academy of Sciences.
One out of 2,000 children born in SLSJ suffers this recessive form of Leigh Syndrome, which causes mental retardation and ultimately premature death. About one out of 23 inhabitants in this region carries a copy of the defective gene. Children who inherit two copies of the faulty gene suffer the disease. The new finding will form the basis of a genetic test to screen gene carriers, and identify children with the disease before serious symptoms arise.
Two years ago, researchers at the Whitehead Institute Center for Genome Research pinned the location of the LSFC gene to a section of chromosome 2 containing about 30 known and suspected genes. During a weekend in fall 2001, Whitehead fellow Vamsi Mootha ran a custom-built software program through two large gene expression databases: READ (Riken Expression Array Database) and a compilation of cancer study results published online by the Whitehead group.
Mootha – who has a B.S. in mathematical and computational science from Stanford and an M.D. from Harvard -- was investigating whether any of those 30 candidate LSFC genes showed similar expression patterns to known genes involved in the synthesis and function of mitochondria (microscopic energy-generating organelles within cells). In the early 1990s, researchers at the Hospital for Sick Children in Toronto had shown that LSFC patients suffer defects in energy metabolism.
Why use a cancer gene database to hunt down a metabolic disease gene? Gene chip studies typically produce data on tens of thousands of genes, thereby providing information about many genes that aren’t directly involved in cancer.
Within a few days of starting the analysis, one intriguing “mitochondrial-like” gene kept “popping up,” Mootha says. He then checked data from a proteomics study of mitochondria that he had previously (and conveniently) worked on. He also ran the same analysis on two expression databases from the Genomics Institute of the Novartis Research Foundation’s Gene Expression Atlas.
Everything pointed to the same gene, LRPPRC, which codes for an RNA-binding protein likely involved in the processing of mitochondrial gene transcripts
Finally, genetic testing verified what the software search had indicated, revealing the precise abnormality in LSFC. Twenty-two patients and 32 parents were tested for the mutation. All had mutations in LRPPRC, including one patient with a distinct mutation in each copy of the gene. This last finding, Mootha says, is the “crowning evidence” that proves this is the genuine defect.
It took about seven months to go from identifying the gene to confirming it was the culprit. But that’s lightning speed compared to traditional approaches to disease-gene hunting, which typically took years, sometimes decades, in the pregenome era.
In the Neighborhood
Mootha’s neighborhood analysis software finds genes that have similar gene expression patterns. “If you ask, ‘Who does this gene travel with?’ you can use that information to find new genes that do the same general thing as ones you already know about,” he says, such as genes involved in energy metabolism. Because the expression databases Mootha used are public, anyone could follow his lead, in principle (Mootha will even send the program to those requesting it by e-mail).
“Many platforms are now generating a lot of genomics information,” says Scott Jokerst, senior informatics product manager at Affymetrix Inc. “Research like this provides clear evidence that people can put data together in a way that can be applied.” Mootha hopes this approach can be used to find more disease genes, even in cases where multiple genes may be involved.
“Complex diseases are still going to be difficult, but it’s heartening when you see this happen with even simpler diseases, where the genetic location wasn’t known,” Jokerst says.
“We’re excited about using it [the technique] again,” Mootha says, “and a few people have already approached us to suggest projects. But this is not a program that you just drop onto the data. It has to be tailored for each case.”
Several Canadian groups collaborated with Whitehead on this study, including the Genome Quebec Innovation Centre, McGill University, the Montreal Genome Centre, and The Hospital for Sick Children in Toronto. The proteomics studies were conducted at MDS Proteomics in Denmark.