Evidence-Based Variant Interpretation For The Modern Era

September 4, 2019

By Mark Kiel

September 4, 2019 | In April, we announced the winners of the Bio-IT World Best of Show Awards during the Bio-IT World Conference & Expo. These awards are given with the goal of recognizing the best of the innovative product solutions for the life sciences industry on display at the conference. We wanted to highlight these products as they measurably improve workflow and capacity, enabling better research. – The Editors

Next Generation Sequencing (NGS) has ushered in a new era of high-throughput genomic analysis and increased our ability to understand the molecular basis behind disease. While the capacity for generating genetic data has exploded, the capacity to interpret the meaning of this data has lagged. Information about variants and mutations discovered by NGS and its predecessors (Sanger sequencing and array genotyping) are dispersed throughout the scientific literature and often clouded by the vagaries of variant nomenclature.

With over 500,000 new genomic publications expected this year, it is impossible to obtain a comprehensive view of the published genomic research with manual searching alone.

When genetic data is used in a clinical setting, having complete information about a variant is critical, as these tests can inform diagnostic, prognostic, and therapeutic decisions, and can often have life or death consequences.

Manual curation of a single variant by searching Google Scholar or similar tool and reading each article can take 5 to 10 hours or more, depending on the volume of literature that exists for a particular variant. Even with this effort, manual curation is likely to be incomplete due to the complexity of variant nomenclature; a variant scientist must do the up-front work to figure out all of the ways a variant can be represented to ensure completeness. But differences in the search method of individual variant scientists can introduce interpretation bias.

It was these issues that led the team at Genomenon to create Mastermind and, more recently, Mastermind Reporter. A team curating variants from the same comprehensive source that is organized by clinical relevance and uses unified variant nomenclature greatly reduces this troubling variability.

Genomenon uses Artificial Intelligence (AI) and Machine Learning (ML) to accelerate the literature curation process in Mastermind, the most comprehensive database of genomic information in the world. Mastermind scans the titles and abstracts of the entire scientific literature comprising over 30 million scientific papers, selecting them for genomic information. The full text of papers with genomic information is then indexed to develop a comprehensive view of the genomic landscape. To date, Mastermind has indexed nearly 7 million genetic publications and over 500,000 supplemental published data sets to cover over 4.1 million variants.

The genomic data found in the publications is processed through Genomenon’s patented Genomic Language Processing (GLP). GLP extends Natural Language Processing to cover genomics. It identifies hundreds of ways that authors describe a gene or variant and filters out erroneous information that can be mistaken for genomic data. Reconciling and normalizing variant nomenclature solves one of the most intractable problems in variant interpretation and provides a platform that can be searched with different nomenclatures to provide the same results.

The search results are presented in a web interface, which uses sophisticated algorithms designed to show the most clinically relevant results first. Advanced filtering options allow the user to search broadly to maximize sensitivity or narrowly to optimize specificity. If the user has licensed access to a journal, they can see the article directly in the interface.

Mastermind can cut variant curation time from several hours down to 30 minutes or less per variant. A search producing no results also cuts the variant curation time dramatically; the user can have confidence in the comprehensiveness of Mastermind compared to any other search approach available for genomics.

Genomenon’s latest offering, pre-curated variants in Mastermind Reporter, can further reduce the variant curation time to minutes. Mastermind Reporter presents a pre-curated set of genes selected by the client. Mastermind performs an automated review of every mention of gene-variant-disease relationship that exists in the literature, using its AI and Machine Learning algorithms to expedite the process. Once the literature has been exhaustively interrogated, the evidence is aggregated and pathogenicity is determined according to the American College of Medical Genetics and Genomics (ACMG)/ Association of Molecular Pathologists (AMP) guidelines for pathogenicity determination. Each variant is then manually reviewed by Genomenon’s scientific curation team to ensure quality.

The result is a fully curated dataset of variants with a recommendation for pathogenicity, the evidence used to make that determination (displayed in ACMG/AMP categories), and a link to that evidence for each variant. This allows variant scientists in a clinical setting to review the pre-curated call on any variant along with the evidence before making the final determination of pathogenicity for their patient report.

Mastermind Reporter provides a UI (User Interface) that enables researchers to view, search, and filter large collections of the curated data, displaying a complete functional variant landscape for the disease in question. Mastermind Reporter data sets can be used by both variant scientists in clinical diagnostics and pharmaceutical scientists in drug discovery. Leading pharmaceutical companies have used Mastermind data sets as evidence in their companion diagnostic (CDx) submissions to the FDA to aid in selecting patients for clinical trials.

The Mastermind Genomic Search Engine, together with data sets in Mastermind Reporter, gives pharma, bio-pharma, and clinical diagnostic labs the most comprehensive genomic landscape for any disease assembled from the published research for applications in drug discovery, clinical trial target identification, and clinical diagnostics.

Mark Kiel is Co-Founder and Chief Science Officer at Genomenon, where he oversees the company’s scientific direction and product development. After 15 years of academic research, he founded Genomenon and created the Mastermind Genomic Search Engine, which connects doctors with evidence in the literature to help diagnose patients with genetic diseases and cancer. He can be reached at mark.kiel@genomenon.com.