Oxford Nanopore's Software Side
By Aaron Krol
November 19, 2015 | Oxford Nanopore Technologies has built an impressive hardware operation from its home in the UK. Its MinION device, a handheld DNA sequencer smaller than a remote control, once seemed like the stuff of science fiction, built on the molecular engineering of “nanopores” at the scale of just a few atoms. Today the company is churning out hundreds of them to ship around the globe.
But Oxford Nanopore is also a software company, and its computer programs for use alongside the MinION are likely to become a larger part of its business in the future.
When the MinION was first launched, Oxford’s software efforts were limited to getting usable DNA data off the instrument. The MinION’s unique method of nanopore sequencing feeds out “squiggle plots” of fluctuating electrical signals, which have to be translated into the familiar genetic language of As, Ts, Cs and Gs before scientists can begin to interpret them. Oxford released the MinION with a platform to do this base calling in the cloud, giving the company its minimum viable product: a sequencer that professional geneticists can use to feed data into all their usual analysis tools.
Oxford’s real aim, however, is to push genetics outside the highly specialized world of bioinformatics. The MinION sequencer is radically easy to use without advanced training in laboratory techniques, and Oxford’s software spinout, Metrichor, wants to build analysis tools to match.
“What we really want is to empower people who are not confident doing their own bioinformatics,” says Dan Turner, Oxford Nanopore’s Senior Director of Applications. “We want to be able to bring sequencing into the hands of people who are a little bit intimidated by the analyses, and make it as easy and user-friendly as we can.”
Earlier this month, Turner’s Applications group, working with the Metrichor team, described a workflow called What’s in my Pot? (WIMP) in the pre-peer-review server bioRxiv, introducing Oxford Nanopore’s first user-facing tool for interpreting the results of sequencing. WIMP is designed to identify the species and strains of microbes in an unknown sample, swiftly and automatically. Turner and his colleagues envision the program as a first port of call for real-world functions like monitoring outbreaks, diagnosing infections, testing foods and other consumer products for contamination, and a host of environmental research.
“WIMP is something that was made between the Applications group and the Metrichor group, to show off all the features we have, like long reads and fast, real-time sequencing,” says Sissel Juul, lead author of the bioRxiv paper and Applications Manager at Oxford Nanopore’s New York office.
WIMP was built as a useful complement to the on-the-go MinION, analyzing data almost as quickly as the sequencer can generate it. The application needs as little as one second to match a single base-called MinION read to its species; in test runs, Juul’s team has gone from raw sample (in the bioRxiv post, a volume of unpasteurized milk) to a list of species in three and a half hours, fast enough for a variety of field situations.
WIMP is far from the first program to tackle the problem of species identification, and the Metrichor team didn’t start from scratch. At the heart of WIMP is an algorithm called Kraken, first published last March by researchers at Johns Hopkins University.
Most genomic search tools scan through large reference databases of DNA data, trying to find the closest match to a new DNA read. Kraken takes a simpler approach: it looks only for exact matches of very short DNA sequences, a much faster operation. As a result, its uses are narrower than other search tools, but it excels at rapid classification.
“It’s very good at looking at single reads and making a decision,” says Turner. “It’s a very good fit for the platform.”
To use Kraken to identify species, WIMP stores the genomes of thousands of bacteria, viruses, and fungi collected from the public RefSeq database, and breaks them into 24-base-long sequences, or 24-mers. Each of those 24-mers is labeled with the narrowest taxonomic unit it’s unique to: strain, species, or often a higher level like genus. When a MinION user sends a new read to WIMP, that read is also broken into 24-mers, and Kraken swiftly finds matches, which together produce a consensus call for what organism the sample DNA belongs to.
Depending on the proportion of 24-mers that successfully match, WIMP will also give a confidence score for each read, which might indicate, for instance, that the program is very sure of the species but not of the strain.
This approach is pretty typical for metagenomics platforms. For a variety of reasons, however, it’s especially well-suited to MinION sequencing. First, the MinION generates data in real time; it can feed its squiggle plots to the cloud for base calling even while it’s still in the middle of a sequencing run. That means WIMP can run in real time too, calling species as reads come through.
Second, the MinION sequences DNA in much longer fragments than most other instruments. A typical sequencer delivers reads of around 100 to 200 bases, not always enough to clearly identify the species. The MinION can keep reading for thousands of bases at a stretch. In practice, the Oxford Nanopore team says that reads of at least 700 bases give the best results for WIMP.
And third, the MinION works directly with DNA taken from sample, not DNA that’s been copied by polymerase chain reaction. That makes it especially reliable for quantitation: using the number of DNA reads from various species to guess how abundant they are in the sample. This could be important, for example, in studies of human gut bacteria, where the same basic mix of species tends to be found from person to person, but in very different ratios.
Juul’s group has tested this feature with artificial DNA potpourris, as well as by spiking a known species at low abundance into a natural sample. Because they know what quantities of these species to expect, the team can see that WIMP is correctly reporting their abundance.
WIMP also stands out for its clean user interface, where results are shown in color-coded lists, graphs, and taxonomic trees. Users can quickly see the relationships between species in their sample, the confidence scores assigned to each organism found, and the proportions of different species in the mix. Changing parameters, like the minimum confidence score needed to call an organism, is done with sliders. As much as these kinds of features are standard in consumer software, they’re still a rarity in bioinformatics.
That’s a clue to the kinds of users Oxford Nanopore is designing for. “We want everyone to use it,” says Juul. “Doctors. People who do quality control of food. We have people in the MAP [MinION Access Program] community who have sequenced Ebola in Guinea, and any kind of viral or bacterial outbreak at point of care we’re interested in.”
Metrichor already has several other programs in the works, including a tool for RNA sequencing and another for finding antibiotic resistance mutations. These, too, are likely to be geared to an expanded customer base who need user-friendly ways to view and interact with their results.
Meanwhile, Metrichor, which is wholly owned by Oxford Nanopore and shares the same executive team, will also be keeping an eye on the more radical promises of its parent company’s sequencing technology.
An especially exciting proposal is to perform analysis directly inside what Turner calls “squiggle space.” The MinION is already reading DNA in terms of electrical signals; members of Oxford Nanopore see translating those signals into base calls as potentially a waste of time. In the future, Metrichor may write algorithms to recognize and search for patterns in the squiggle plots themselves.
There are no existing tools, like Kraken, for this kind of analysis, so Metrichor would be going back almost to the drawing board. But there are some convincing advantages to working this way. “Various bits of information can be overlooked or lost during the base calling process,” says Turner. “In squiggle space, you keep a lot more information in, which means you can get a very accurate signal.”
That higher accuracy could also be a major time saver. At present, the MinION relies heavily on “2D” reads, where double-stranded DNA is unzipped, tied together at one end, and then fed through the sequencer such that the sequence is read twice. That lets the MinION double-check each base call, which is important to give programs like WIMP accurate enough information to do their jobs.
But it demands some time-consuming chemistry upfront, which might be unnecessary if the more information-dense squiggle plots can deal with “1D” reads just fine.
“The time-consuming part is still making the sample,” says Juul. “The whole analysis part is fast, and WIMP is especially fast.”
Squiggle space analysis is tempting for another reason, too: it could let Metrichor build applications that run even before the MinION has finished sequencing a particular DNA strand. One new program the company is building would let users specify in advance what gene regions they want to sequence, perhaps to capture a particular part of the human genome associated with a genetic disease. As reads pass through the MinION, this tool would watch the squiggle plots, and reject any that don’t match the target — letting the sequencer discard the read and move on.
At the same time, hardware changes will continue to extend the reach of applications like WIMP. Oxford Nanopore has been tinkering with prototypes of a device it calls VolTRAX, a handheld microfluidic chip that prepares DNA for sequencing automatically, replacing more complex lab equipment.
“Ultimately, VolTRAX will be capable of taking a raw sample, whether it’s blood or plant extract or whatever, getting DNA, and then doing the library prep of your choice,” says Turner. “We have people who have been sequencing in remote locations, like in Guinea, or in jungles, and even though the sequencer itself is completely portable, they still have to carry around a centrifuge and other bits of laboratory equipment with them.”
Before long, it may be possible to run entire experiments far from any kind of conventional lab, with almost nothing but a MinION and a laptop. Users may even be able to do away with Internet access when needed, as Metrichor adds alternate versions of its applications that run locally, without connecting to the cloud.
Local analysis is coming sooner rather than later, as Oxford Nanopore gears up to ship its second sequencer, the high-throughput PromethION, to early access users. Although the company hasn’t made any data from the PromethION public yet, it has announced that the instrument will contain 48 MinION-style flow cells sequencing DNA in parallel.
“With the PromethION, it won’t really be feasible to do cloud-based base calling, just because it’s so much data,” says Juul. As Oxford’s hardware advances, Metrichor and the Applications team will be hard at work making sure the software keeps up.
For more on the MinION and Oxford Nanopore, see the Bio-IT World feature, "Nanopore Sequencing Is Here to Stay."
UPDATED 11/19/15: The text of this article has been updated to clarify the corporate relationship between Metrichor and Oxford Nanopore Technologies.