Databases Down Under

November 2, 2010

By Allison Proffitt

November 2, 2010
| BRISBANE—At TRX10, Translational Research Excellence, held last month in Brisbane, amid a wide-ranging program covering everything from stem cell models to commercializing academic innovations, translational approaches to cancer and central nervous system disease to trial design, discussion turned to repositories for genetic and clinical data. Researchers encouraged one another to share raw data, analyses, and clinical findings to further research and enable advances in medicine.

GWAS Data in Japan
Nao Nishida of the Department of Human Genetics at the University of Tokyo presented his work on GWAS predicting treatment for hepatitis C. With only 154 Japanese hepatitis C virus patients, Nishida identified strong association of the SNP rs8099917 near IL28B gene with patients that did not respond to treatment.

Nishida posits that the study’s success comes from homogeneity in the Japanese population, penetrance of risk genes among populations, and the selection of samples with extreme symptoms.

With this success in mind, the group launched the Human Genome Variation Database (
 https://gwas.lifesciencedb.jp), part of the integrated database project of the Japanese Ministry of Education, Culture, Sports, Science & Technology. The database will house data from genome-wide association studies (GWAS) and copy number variation (CNV) research.

The goal of the database, said Nishida, is to achieve continuous and intensive managements of GWAS/CNV data, open GWAS/CNV results to researchers, and share GWAS/CNV data among researchers.

The database contains four databases for data deposit: control SNP, case-control (GWAS), control CNV, and CNV case-control.
The SNP Control database contains one million data points of SNP information including allele frequency, genotype frequency, and HWE test results for 700 Japanese control samples. The case-control GWAS database includes data on nine diseases, and allows filtering by statistical model (allelic, dominant, recessive, and additive models), and a graphic viewer to enable searching across diseases by SNP candidates. For the Control CNV database, CNV regions were estimated using genome-wide genotyping data for 200 Japanese samples. The CNV case-control database is still under construction.

To add data to the database, researchers are invited to email (
gwas@lifesciencedb.jp) or upload either raw data or analysis. Access is granted to the database in three security levels, after a special application and review.

Nishida says that submitted data undergo data cleaning to ensure quality control and that all datasets are in comparable form.

Linking Institutions with BioGrid Australia
In Australia, the BioGrid project (
www.biogrid.org.au) was initiated by clinicians to link hospitals, medical research, and health organization data in an ethically approved, privacy protected and controlled way. The goal of BioGrid is to address medical research problems, said Marienne Hibbert, project director for BioGrid Australia. BioGrid includes data from 30 hospitals, 4 universities, government records, and research institutions from Australia and partners worldwide—190,000 patients are represented. 

BioGrid includes data from cancer, neuroscience, diabetes, irritable bowel disease, well woman data, and more, said Hibbert. And data include clinical outcomes, treatment regimens, images, genetic data, and a specimen bank.

Each organization stores its own data, but shares it with the central BioGrid repository. BioGrid will manage database linkage with other data sources or between clinical data and biospecimen data. BioGrid also offers audit and reporting services and project management.
Hibbert lists many BioGrid successes. Cart-Wheel.org is collecting data for their patient registry for rare tumors. Researchers studying epilepsy used the database to identify five SNPs causing adverse events in anti-epilepsy drugs. BioGrid is collating data on lung function and treatment in 800 Cystic Fibrosis patients from three major Victorian hospitals.

“All of these [research projects] could have been done without BioGrid,” Hibbert says, “but the database enables much faster research.” Since BioGrid handles data collection, researchers can spend more time on their research.