YouTube Facebook LinkedIn Google+ Twitter Xingrss  

VAAST Potential for Genome Analysis Software


Rapid identification of disease-causing mutations speeds diagnosis.

By Kevin Davies
September 27, 2011 A new software program called VAAST, developed by scientists at the University of Utah and a Bay Area software company, Omicia, can rapidly identify disease-causing mutations by individual genome sequencing. The software was introduced in a paper in Genome Research, while its potential for rapidly identifying a deleterious mutation was illustrated in the American Journal of Human Genetics. The Variant Annotation, Analysis and Selection Tool was developed by Mark Yandell’s group at the University of Utah School of Medicine in collaboration with a team at Omicia in San Francisco, led by CEO Martin Reese. It was developed using stimulus funding from the National Human Genome Research Institute (NHGRI), specifically a “Grand Opportunities” (GO) grant.

“VAAST is an integrative tool that uses a number of inputs to rank the [DNA] variants based on clinical gene importance in an automatic way,” says Reese. The mutation tracking software is designed to screen individual human genome sequences for clinically significant mutations.

In the Genome Research paper, Yandell, Reese and colleagues show that VAAST can accurately and swiftly analyze the variations in a handful of personal genome sequences to identify the causative mutation. In fact, this can be done in as few as three genomes from unrelated children, or the parents and two children. Other approaches to filtering genomic data for causative mutations are also paying dividends (see “Endeavor Aids Disease Gene Hunt.”)

The program works like the classic sequence homology program BLAST—hence the name. “In BLAST, you take a sequence and run it against a background database, asking: How similar is my sequence to the other databases? VAAST does the same thing for personal genomes, but does it for dissimilarities,” says Reese.

“SIFT and Polyphen, and other academic [mutation prediction] platforms—they’re all a collection you have to run,” says Reese. “But VAAST does it integrated in one program. It looks at a mutation and its [putative] physiological function. Then it also looks at the frequency of that mutation in a background distribution. You can use the frequency from the 1000 Genomes project or other sources, or you can put in your own background distributions.”

By incorporating variant frequency data in the algorithm, VAAST is able to rapidly zero in on the likely causative mutation, which would be expected to be present very rarely in the population. The program compares variations from a patient against dozens or hundreds of healthy genomes, and automatically scores those mutations in the form of a gene-by-gene ranking summary.

In the Genome Research paper, Yandell and colleagues describe a proof-of-principle studying patients with Miller syndrome, a rare genetic disorder whose genetic basis was uncovered last year. The group looked at six Miller patients, each of whom is a compound heterozygote (harboring two different mutations in the same gene, one inherited from each parent).

When VAAST was run on a single patient, the known Miller syndrome gene ranked #86 out of 20,000 genes in the human genome. Adding a second patient, the gene rose to #2 in the list, and jumped to the top with just three or more patients. (Like BLAST, VAAST provides a P value of statistical significance in the results.)

Reese adds that running the VAAST program retrospectively on the family that was sequenced last year to discover the Miller syndrome gene, the analysis took about a day, compared to several months.

“VAAST solves many of the practical and theoretical problems that currently plague mutation hunts using personal genome sequences,” said Yandell. “This tool substantially improves upon existing methods with regard to statistical power, flexibility, and scope of use. Further, VAAST is automated, fast, works across all variant population frequencies and is sequencing platform independent.”

X Marks Spot

Writing in the American Journal of Human Genetics, Gholson Lyon at the Children’s Hospital of Philadelphia, Yandell and colleagues show how VAAST can be applied to tease out the mutation responsible for a devastating childhood syndrome of unknown etiology.

Lyon, who was formerly at the University of Utah, had been working with a family in the area in which four affected boys had severe neurological damage and signs of progeria (premature aging), and died by the age of 4. Recognizing that the disorder was X-linked, Lyon restricted the next-generation sequencing to the coding regions of the X chromosome. But even then, using traditional tools, he was only able to narrow down the list of candidates to five genes.

Lyon gave Yandell the data, which he ran through VAAST. Within an hour, he was certain he had found the gene, NAA10, which had been one of Lyon’s original candidates. A few weeks after the manuscript was originally submitted, one of the reviewers contacted the authors, as he had seen a family with similar characteristics. Affected members in that family proved to have the same gene mutation. The disorder has been preliminarily called Ogden syndrome.

“One of most important and exciting opportunities in genomic medicine is the newfound ability to pinpoint the root cause of an unknown idiopathic disease in an individual,” commented Eric Topol, director of the Scripps Translational Science Institute. The VAAST tool will markedly facilitate this and represents a major advance in the field.”

The VAAST IP is shared by the University of Utah and Omicia says Reese. Omicia plans to commercially release its Genome Analysis System in late 2011, and integration with VAAST in 2012. Meanwhile Yandell is offering academic collaborations under academic-only licenses via his website.

Reese says his team is starting to apply VAAST to cancer and other areas. “VAAST works really well right now on rare genetic diseases, but we need more feeling on it. There’s a whole bunch of applications where VAAST can work—we just need to run it through and improve it in the next 6-12 months.”  

Endeavor Aids Disease Gene Hunt

In a paper published earlier this year in Genome Research, Yaniv Erlich, a fellow at the Whitehead Institute, showed the value of layering disease network analysis into the hunt for a rare disease gene. Erlich translated early bioinformatics success with DNA Sudoku into a coveted position as a Whitehead Institute Fellow. Earlier this year, he scored another hit by identifying a novel disease mutation in a single family by applying a new informatics strategy.

Erlich has a strong computational background that he is applying to solving genetics questions. A meeting with Dor Yeshorim, the organization that performs carrier screening for ultra orthodox Jews, resulted in a successful collaboration with Hadassah to find a disease gene.

Hadassah then told Erlich about a clinic it runs serving various populations around Jerusalem, including Palestinians. One family presented with an orphan disease—a form of hereditary spastic paraparesis (HSP). Hadassah invited Erlich to try exome sequencing to find the mutation.

HSP, characterized by weakness of the legs and an abnormal gait, is a very heterogeneous cluster of diseases—with some 20 known genes. “We couldn’t identify any mutations in the known genes in the family, so we thought, maybe it’s a new gene.” For economic reasons, and to simplify the search for the mutation signal, Erlich opted for exome sequencing. “I could do whole genome [sequencing], but if I have a false negative, I have only one mutation. It’s a dead end. We wanted to focus where we might find a signal.”

In successful genome sequencing studies over the past two years, researchers have used two main arguments to exclude bystander variations. One is genetics—excluding variations that don’t segregate per the disease model. The other is a functional approach—using protein prediction algorithms to identify the most likely deleterious variants. But Erlich faced a quandary. “Genetic arguments are very strong when you have multiple families,” he explains. “The problem in a single family is you run out of information quite quickly… We only had one family, so we had to use another layer of argument.”

Erlich turned to a set of algorithms that rank genes by biological function and similarity. “It doesn’t depend on the size of the family and it doesn’t incorporate any sequence information. It asks, what is the signature? We wanted to stratify genes we couldn’t exclude by the previous arguments.”

Given the significant genetic heterogeneity in HSP, Erlich elected to use the data about those 20 known HSP genes and look for the same signature in our candidates. He focused on a promising genetic signal from the tip of chromosome 2. “With a single family, it’s very risky to exclude variation you can’t detect,” he says. “We cannot tolerate any false negatives—or that’s it, we lose the signal. We have only one target.”

Erlich looked at three programs—Endeavor, Toppgene, and SUSPECTS (the latter has now been shut down, which Erlich says is unfortunate). “Endeavor is, I think, the strongest one [of the three],” he says. “It’s extremely clever and the GUI is very intuitive. The algorithm takes multiple layers of data—expression data, sequence similarity, protein networks, text mining—and you train it with a subset of genes. Then you introduce a list of new genes, and it ranks the genes for you.” The program is so accessible that “any biologist can use it.”

By including functional analysis, Erlich’s team identified the novel HSP gene as TIF1A. Erlich is now working on other genetic disorders, and says he hopes other scientists try this approach. “This is a way to incorporate that old information to find new genes, not just to find deleterious mutations but mutations with a specific signature.” K.D. 

FURTHER READING:
Erlich, Y. et al. “Exome sequencing and disease-network analysis of a single family implicate a mutation in KIF1A in hereditary spastic paraparesis.” Genome Res. 21 (2011). doi: 10.1101/gr.117143.110
 

Yandell, M. et al. “A probabilistic disease-gene finder for personal genomes.” Genome Res. 21 (2011). doi:10.1101/gr.123158.111

Rope, A. et al. “The use of VAAST to identify an X-linked disorder resulting in lethality in male infants due to N-terminal acetyltransferase deficiency.” American Journal of Human Genetics, June 23 (2011). doi: 10.1016/j.ajhg.2011.05.017

This article also appeared in the 2011 September-October issue of Bio-IT World magazine.

Click here to login and leave a comment.  

0 Comments

Add Comment

Text Only 2000 character limit

Page 1 of 1

For reprints and/or copyright permission, please contact  Jay Mulhern, (781) 972-1359, jmulhern@healthtech.com.