Instant Genome Browsing Over the Web with the NextCODE Exchange
By Bio-IT World Staff
October 20, 2014 | This Sunday, at the American Society for Human Genetics meeting in San Diego, NextCODE Health launched a new genomic data sharing environment called the NextCODE Exchange. Data shared on the Exchange can be accessed over the web at single-base resolution, letting users share not only their top-level findings about genetic variants, but also their raw sequencing data mapped base-by-base across the human genome. NextCODE, which was founded to commercialize technology platforms first developed at Kari Stefansson’s deCODE genetics, expects the Exchange to be a useful tool for diagnosing rare disease, as a well as a home for collaborative research on more complex conditions.
The NextCODE Exchange is an extension of genomic discovery tools that NextCODE has been building since its launch as an independent company last October. The Exchange’s online interface, and associated genome browser, come equipped with analytics tools for finding variants in a patient’s genome that may explain unusual symptoms and phenotypes. These tools can filter variants by inheritance patterns and their predicted impact on protein expression, as well as draw on the massive database of variants from the Icelandic population assembled by deCODE genetics over more than a decade of population-scale sequencing.
Different data sharing options on the Exchange will also give users access to clinically annotated, anonymized data from geneticists around the world. Researchers will have the option to make the variants they comment on, or even their raw sequencing data, visible to selected partners or to anyone on the Exchange — with patient-identifying information stripped out. The idea is to easily find patients with similar conditions at other institutions, and compare their genetic mutations in order to make or confirm a diagnosis.
“You’re not uploading your sequence data to somebody else’s computer,” stresses Jeff Gulcher, President and CSO of NextCODE. “You’re simply allowing them to see a little sliver of your raw data where it overlaps with a particular mutation.” Users can also send messages to one another requesting more in-depth collaborations, making it possible to build on a chance discovery.
For more extensive research partnerships, data administrators can also grant collaborators full access to their data. This feature can be used to create a central data repository for research consortia, or in clinical institutions where labs, universities and care centers may be widely dispersed and struggle to share the sheer volume of data involved in looking at a patient’s whole genome. “When you’re trying to solve a case, you’re really looking to others to help you solve it,” says Gulcher. “The only way that can be accomplished is if you’re willing to share a lot of your own data with others.”
Built for Whole Genomes
When NextCODE splintered off from deCODE genetics one year ago, it took with it the Genomic Ordered Relations (GOR) database, a system for storing DNA data developed during Stefansson’s mass genotyping campaign in Iceland in the early 2000’s. The GOR database was built as a solution to a fundamental problem of working with large amounts of genetic information: the sheer volume of sequence makes it difficult for computers to search for and retrieve relevant gene regions once the data has been stored. “The traditional relational databases don’t store the data very efficiently,” says Gulcher. “There’s a very long lag pulling the data out.”
The GOR system gets around this problem by storing sequence according to its physical position on a human chromosome. A DNA fragment from the long arm of chromosome 1 would be stored at the top of a table, while the short arm of the X chromosome goes at the bottom, saving applications built on top of the GOR database a huge amount of time searching for individual reads. NextCODE has built all of its commercial platforms on top of this architecture, including its flagship Clinical Sequence Analyzer, used mainly to search for the rare variants behind undiagnosed genetic disorders.
The GOR database also plays a key role in the NextCODE Exchange, making it possible to browse patients’ genomes at the level of individual base calls, even when accessing a collaborator’s data over the web. That’s an important quality control measure when analyzing someone else’s data, letting users confirm the accuracy of a variant call, or see any conflicting reads.
“Typically you might have fifty or a hundred reads that help confirm a mutation,” says Gulcher. “But in some cases, next generation sequencing can be a little noisy, so you end up with some false positives.” While dissenting base calls are often stifled when DNA data is reformatted for sharing, in the NextCODE Exchange they can be made visible to any user.
That’s been an important feature for the launch partners who have had early access to the NextCODE Exchange during its development. Sean Ennis, Director of the Academic Center on Rare Diseases at University College Dublin, has been working in the Exchange for around six months to resolve rare disease cases.
“We have groups in three or four different hospitals, a diagnostic center, and the university,” Ennis tells Bio-IT World. “Logistically, to transfer whole exome or whole genome data is a serious task. With the Exchange, if somebody is working on a project in one of the hospitals, I can look at it from the university or even from home, and we can have a discussion.”
NextCODE has also announced that the Exchange will act as a new central access point for data collected by the Simons Foundation Autism Research Initiative (SFARI), an international consortium aiming to find meaningful subtypes of autism spectrum disorder that could reveal the genetic and cellular pathways underlying the condition. Among SFARI’s assets is the Simons Simplex Collection, a set of nearly 10,000 whole exomes collected from more than 2,600 families with children affected by autism.
The Collection will now be hosted on NextCODE’s servers and available to all members of SFARI through the Exchange, saving the tedious step of distributing this massive set of exomes to the nearly two hundred researchers working under the SFARI umbrella. In addition, the Simons Foundation hopes to vet new outside researchers for access to the Collection, effectively crowdsourcing research into the genetics of autism to interested Exchange members.
“You can not only look at the data, you can also run your own algorithms, or algorithms we’ve already baked into the system,” says Gulcher. “You’re able to work with the entire set of 10,000 exomes simultaneously… They want as many different groups as possible to tap into this data.”
Meanwhile, NextCODE’s core focus remains on enabling genomic medicine with rapid analytics tools for clinical geneticists. The NextCODE Exchange, if it can pick up a critical mass of users, is designed to foster unexpected connections between centers that have so far tended to keep their genomic data in silos.
“From a rare disease point of view, I think it’s quite possible this will make a big impact down the line,” says Ennis. “Because of the small sample sizes in rare disease, it’s hard to get right down to the [causative] mutation. Maybe you’re left with twenty or thirty mutations in a study, and you just can’t figure out which one it is — it’s only when you get someone else coming alongside with a similar case that you can narrow that down. So I think it will have an impact on some of those unsolvable cases in the near future.”
More on NextCODE Health: