GA4GH Holds Annual Event, Reports On Driver Projects

October 5, 2018

By Bio-IT World Staff

October 5, 2018 | The Global Alliance for Genomics and Health (GA4GH) held its annual event this week in Basel, Switzerland. Speakers presented the first set of deliverables developed under GA4GH Connect, the group’s 5-year strategic plan to enable real-world genomic data sharing by 2022, and opened the call for new GA4GH Driver projects.

The core of the GA4GH mission remains a focus on accelerating progress in genomic research and ensuring responsible sharing of genomic data, said Peter Goodhand, President of the Ontario Institute for Cancer Research in Toronto, Canada, and GA4GH Executive Director, last October when GA4GH Connect was announced. GA4GH will focus on “cultivating a common framework of standards and harmonized approaches” for achieving the goals of progress and shared data, Goodhand said then.

Ewan Birney, co-director of EMBL-EBI and chair of GA4GH, echoed those sentiments during this year’s plenary session, adding that we are currently undergoing a change in how research and healthcare are brought together.

Earlier this year, representatives from the six GA4GH technical work streams and two foundational work streams, along with representatives of the first 15 Driver Projects, developed a strategic roadmap consisting of 28 deliverables spanning everything from data discovery, access, and transfer to mechanisms for secure cloud-based storage and analysis.

Now the first of those deliverables are being announced and reported on.

“A year ago I said we were going to do some stuff,” David Glazer from Verily, and co-chair of GA4GH’s Cloud Workstream, said during his presentation Thursday morning. “We did it.”

Among the projects are:

  • the GA4GH Search API, built on Matchmaker Exchange;
  • the GA4GH refget API, to allow access to reference genomic sequences from different databases and servers using a checksum identifier unique to the sequence itself;
  • updates to the Beacon Project;
  • the Workflow Execution Service (WES) API that facilitates running a single workflow on multiple cloud environments using either Common Workflow Language (CWL) or Workflow Definition Language (WDL);
  • and htsget, a genomic data retrieval specification that allows users to download read data for subsections of the genome in which they are interested.

The Story So Far

In a lightning round of speakers, members of GA4GH gave updates and announced new standards from the strategic roadmap.

The new releases most notably include Beacon API V1.0.0, refget API V1.0.0, and WES API V1.0.0.

“The new release of Beacon builds on existing work by adding support for additional types of genomic variants, making it an even more powerful tool for molecular geneticists around the globe to use in their variant classification efforts,” Marc Fiume, Co-Founder and CEO at DNAstack and co-lead of the GA4GH Discovery Work Stream, which, together with ELIXIR, maintains the Beacon API, said during the event. “By demonstrating the ELIXIR Authorization and Authentication Infrastructure (AAI) with the reference implementation, we have also enabled additional risk mitigation strategies to ensure the data served by Beacons is as secure as possible.”

When introducing the first version of refget, Andrew Yates, Team Leader, Genomics Technology Infrastructure at EMBL-EBI and co-lead of the refget subgroup of the GA4GH Large Scale Genomics Work Stream, said that the deliverable offers a service that serves as the “baseline of knowledge used for interpretation analysis.”

“Reference sequences are fundamental to how genomic analysis is performed as they provide a baseline of knowledge of the human genome. Without a clear unambiguous link back to that baseline it can be challenging to compare, aggregate and share knowledge between researchers and clinical settings,” Yates said. “I am confident that in time refget will be so fundamental to how reference sequences are referred to and accessed we will wonder how we managed without it.”

The inaugural version of the WES API takes on the motto, “Write once, run everywhere,” according to Glazer.

“If I build a workflow, I should be able to take that workflow wherever I want and on whatever data I have and be confident I’ll get the same answer,” Glazer said.

WES allows users to execute the same scientific tools and workflows in a variety of clouds, platforms, and environments without modification. In particular, WES enables users to submit workflow requests to workflow execution systems, and to monitor their execution.

WES implementation can be used in multiple cloud services, including AWS and Google.

Driver Project Call

In addition to the new standards announced, GA4GH opened the call for new Driver Projects. Last year the group announced 15 inaugural projects, real-world genomic data initiatives that help guide development efforts and pilot tools. GA4GH wants to enable an ecosystem of data stewards, he said, who uphold their responsibility not just to protect the data they own, but to use it, Goodhand said last year, when he announced the first 15 projects.

GA4GH will accept up to five new Driver Projects in 2019. Projects should increase GA4GH’s global representation, including data from populations typically underrepresented in genomics, being located in a country where GA4GH does not currently have Driver Project representation, and representing an international collaboration.

Projects should have scientific merit, impacting the international community’s genomic understanding of human disease or an important clinical or health problem, and be committed to a culture of data sharing.

Finally, projects must be able to pilot a subset of GA4GH standards upon completion, and contribute at least $250K USD in in-kind resources over two years.

The entry window officially opens October 5 and applications are due October 31. New Driver Projects will be announced and onboarded in early 2019.