TGen’s Discovery Pipeline in the Desert


By Kevin Davies

Aug 15, 2005 | The Human Genome Project has provided gene hunters with a rich terrain to search for errant genes responsible for a host of diseases. But as countless researchers can attest, some disease genes are easier to pinpoint than others. In particular, the quest for genes that underlie extremely rare disorders or contribute to complex traits confounded by environmental factors remains a formidable challenge.

 

Category: Discovery and Basic Research
Title: Whole-Genome SNP Scanning Pipeline
to Identify Disease-Causing Lesions
Organization: Translational Genomics
Research Institute (TGen)
Partners: Affymetrix and Silicon Genetics

At the Translational Genomics Research Institute (TGen) in Phoenix, Ariz. (see “A Diamond in the Desert,” December 2003 Bio-IT World, page 26), Dietrich Stephan, David Craig, and colleagues have developed an industrial-style informatics strategy and discovery infrastructure that has pinpointed some 25 disease genes in the past 24 months, many of which are pending identification.

The TGen pipeline uses Affymetrix microarrays and the VARIA algorithm from Silicon Genetics (recently acquired by Agilent Technologies) to locate and identify disease-causing mutations. TGen claims a 100-fold increase in throughput using array-based genetic scanning approaches, automated data extraction and warehousing, and automated analysis.

“This pipeline has allowed us...to scan the genomes at ultra-high density [more than 11,500 positions in each individual] in over 10,000 individuals — and identify the genetic bases of five human conditions,” says Stephan. These include a form of sudden infant death syndrome (SIDS), intractable epilepsy, and forms of mental retardation and spinocerebellar ataxia.

The genotyping strategy uses single nucleotide polymorphisms (SNPs) rather than microsatellite markers, which provide superior “accuracy, informativeness, marker density, availability of analysis options, and throughput,” says Stephan. The core of the TGen pipeline is Affymetrix 10K SNP GeneChip arrays. “We’ve fully equipped the lab with the Affy hardware and find it very easy and reproducible,” says Stephan. “We did buy a Sequenom and an Illumina for the Institute, but haven’t yet worked our way into those technologies.”

 TGen's Dietrich Stephen
 DATAGENIC: TGen's Dietrich
Stephan and colleagues
assisted in pinpointing 25
disease genes in two years.
Setting up the pipeline required an extraordinary amount of validation for accuracy and replication quality. The quality and reproducibility of the Affymetrix arrays were checked against sequence data from various sources, yielding excellent concordance and reproducibility rates from 99.5 to 99.99 percent. Although a SNP carries less information than a microsatellite marker, the sheer number and speed of SNP genotyping produces much greater information content at any point in the genome.

Stephan says the SNP pipeline workflow “generates two orders of magnitude increase in throughput.” With a genome-wide panel of 400 microsatellites, it would typically take the TGen group two weeks to type 384 individuals — an annual pace of 3.5 million genotypes (8,640 individuals). By contrast, a single SNP array (which takes three days to run) can produce 11,555 SNP genotypes. The TGen lab runs 1,000 arrays per week in parallel, or close to 600 million genotypes per year, with no additional call checking. A bar-coding system tracks all data through the pipeline and into an Affymetrix GCOS warehouse, before they are exported into Silicon Genetics’ VARIA algorithm for analysis.

“What sets VARIA apart,” explains Stephan, “is that it works seamlessly with the Affymetrix 10K genotyping data. The algorithm is able to handle the huge number of genotype calls where other freeware fails... Additionally, the software has wonderful graphics which make visualizing pedigrees with haplotypes trivial.” Stephan’s group served as an alpha site for the software. “Working with [Silicon Genetics], we ironed out the bugs and got it working at a practical level.”

Linkage Analysis
Stephan’s team’s first real test was to apply VARIA on data from a family with SIDDT (SIDS with dysgenesis of the testes). After tracing the errant gene to chromosome 6, DNA sequencing revealed a single-base insertion in a gene called TSPYL. Impressively, the linkage scanning and gene identification portion of the SIDDT project was finished in just five days. These results were published in 2004 (see “Genome Scan Yields SIDS Clue,” August 2004 Bio-IT World, page 10).

The TGen data management and analysis engine workflow is recognized as the largest linkage scanning genotyping pipeline in the country. Stephan’s division is an Affymetrix Center of Excellence in Genotyping and Resequencing. More recently, using Affymetrix 100K and 500K arrays, the group is partnering with other disease consortia, including Alzheimer’s, bipolar disorder, and multiple sclerosis. It is tracking samples in such a way as to enable pharma partners to select subsets of patients with certain clinical nuances or genomic subtypes and then enroll them in prospective trials. Stephan has devoted about one-third of his lab over the past six months to the National Alliance for Autism Research’s Autism Genome Project, performing linkage scans on more than 7,500 DNA samples from 1,500 families.

Click here to login and leave a comment.  

0 Comments

Add Comment

Text Only 2000 character limit

Page 1 of 1

White Papers & Special Reports

Waters white paper image
Software Helps Doping Control Lab Streamline Results Management
Sponsored by Waters
The Karolinska University Hospital’s Doping Control Lab tests thousands of samples annually for stimulants, diuretics, and other masking agents. Increased regulatory pressure and new technologies increased the number of samples analyzed creating data management challenges. Waters® NuGenesis® Scientific Data Management System and TargetLynx™ Application Manager software were used to reduce the time required to calculate, review and search results.


sas whitepaper92
Managed Innovation, Assured Compliance
Sponsored by SAS
Discovery organizations are identifying a lot of promising compounds, but clinical research processes haven't kept pace with timely testing of all those potential therapies. This white paper describes how SAS® Drug Development supports true innovation across the clinical trial process.

In this white paper you will learn how to:

  • Assemble data to foster better collaboration
  • Get up-to-date information during clinical trials
  • Make informed decisions earlier in the trial process


BlueArc white paper image
Addressing Life Sciences Constantly Growing Data Challenges Research Environments
Sponsored by BlueArc
The continued explosion of raw experimental data, the increased use of video, the growing adoption of new data retention practices, and the move to high throughput computational workflows are all placing new demands on the way life sciences organizations store and manage their data.

Download this white paper to learn about:

  • Factors driving the data explosion in the life sciences
  • New data management issues that must be addressed
  • HPC trends that are placing new demands on storage
  • Storage solution attributes that address performance, manageability, and energy efficiency.


Life Science Webcasts & Podcasts

Medidata Solutions

Rising Clinical Trial Delays and Costs - Addressing the Cause, Not the Symptoms 

medidata podcastProtocol complexity is taking a toll on clinical study speed and efficiency: increasingly complicated and ambitious protocols are not only burdening sites and study volunteers but are also prolonging trials and increasing expenses. In response, sponsors have turned to global study placement, restructured site relationships and new site management practices, but the problem remains.

This podcast will discuss:

  • Why these responses address only the symptoms, not the underlying cause, of rising clinical trial delays and costs.
  • Results of a recent joint Tufts University / Medidata Solutions study.
  • New metrics benchmarking protocol design trends.
  • Systematic protocol design improvements and why they are essential to clinical trial performance excellence.

Speakers: Ken Getz, Senior Research Fellow at the Tufts Center for the Study of Drug Development, and Ed Seguine, General Manager, Trial Planning Solutions at Medidata.

Download Now 



More Podcasts

Job Openings

Director, Center For Information Technology (CIT) - National Institutes of Health  (NIH), Department of Health and Human Service
Located in Bethesda, MD. This position requires:
• High-level vision, leadership, management, and modernization of CIT programs and services.
• Strategic direction and policy development for CIT long-term operations and objectives.
• Serve as a key IT advisor to the NIH Chief Information Officer.
A TOP SECRET security clearance will be required.  More job detail is found at:  http://www.jobs.nih.gov under the Executive Jobs section.Or contact Ms.Winnie Garner at seniorre@od.nih.gov.  Applications must be received ELECTRONICALLY by (11:59 p.m.), December 17, 2008.  DHHS and NIH are Equal Opportunity Employers

Bioinformatics Manager- Lilly Singapore Centre for Drug Discovery
For more information click here 

For reprints and/or copyright permission, please contact The YGS Group, 1808 Colonial Village Lane, Lancaster, PA;

(717) 399-1900 ext. 125, or via email to Ashley.Zander@theYGSgroup.com.