Working Group Recommendations for Investigating Causality for Sequence Variants

By Allison Proffitt

April 23, 2014 | An excellent Nature Perspectives piece, published today, takes on the issues surrounding causality. The paper represents the conclusions of a working group of experts in genomic research, analysis and clinical diagnostic sequencing convened by the US National Human Genome Research Institute. The authors* make recommendations for study design; gene-level implication; variant-level implication; publication and databases; and implications for clinical diagnosis.

Much of what the group highlighted is the continued murkiness in pathogenicity (See, As Genetics Moves to the Clinic, Pathogenic Variants Still Subject to Doubt and Debate). In their general guidelines, the authors caution researchers: “Do not regard prior reports of gene or variant implication as definitive.”

They highlighted challenges in experimental design. “Optimal approaches to discovering rare pathogenic variants in complex diseases remain unclear: exome sequencing, deep and low-coverage whole-genome sequencing10 and/or next-generation genotyping arrays with enhanced coverage of protein-coding variants are all being applied in research settings,” the authors said.

They predicted that the falling cost of sequencing would make whole-genome sequencing the preferred method, but also noted the challenges inherent in that much data. “It is worth emphasizing that the whole-genome sequence data sets are in some ways more prone to misinterpretation than earlier analyses because of the sheer wealth of candidate causal mutations in any human genome, many of which may provide a compelling story about how the variant may influence the trait,” they said.

The authors considered both gene- and variant-level implication and stressed the need for statistical integrity, watching carefully for statistical significance and maintaining false discovery rates below 5% and fully reporting “It is now important to consider a conservative baseline threshold for declaring significance purely from sequencing data of cases, in the absence of other genealogical information,” they said.

They encouraged researchers to carefully present and report the statistical support for any disease association, and to follow up with family studies and larger population studies where possible.

Finally, the authors acknowledged that the amount of false positive associations is high, and the only way to improve the landscape is to build, “robust, centralized repositories of mutation data, incorporating explicit, structured evidence for variant pathogenicity and systems for rapid correction of entries.” Sharing sequence and phenotype data “to the fullest possible extent” is necessary, the authors stated, to advance the field. They highlighted the efforts of ClinVar, LOVD, and other databases, but called on funding bodies, journals, research consortia, clinical organizations and others to continue to push forward.

In setting priorities for research and infrastructure development, the authors stressed the need for improved databases for variants, improved incentives for data sharing, and the “development and benchmarking of standardized, quantitative statistical approaches for objectively assigning probability of causation to new candidate disease genes and variants.”

Nature 508, 469–476 (24 April 2014) doi:10.1038/nature13127

* One of the authors, Heidi Rehm, will be speaking next week at the Bio-IT World Conference & Expo.