By Kevin Davies
September 27, 2012 | Knome, the informatics company co-founded by George Church that bills itself as the “human genome interpretation” company, is launching a “genome supercomputer” to enhance the interpretation of genome sequences.
Designed chiefly to run Knome’s kGAP genome interpretation software, the compute system is designed – metaphorically perhaps -- to sit next to a sequencing instrument, and has been soundproofed for that purpose. The unit weighs in at two pounds shy of 600 pounds, and comes with a starting price tag of $125,000.
Knome will start accepting orders for the knoSYS 100 today, and will begin shipping before the end of the year. The first units will process one genome/day, but with headroom for much higher throughput later on.
“The advent of fast and affordable whole genome interpretation will fundamentally change the genetic testing landscape,” commented Church, Harvard Medical School professor of Genetics. “The genetic testing lab of the future is a software platform where gene tests are apps.”
The launch of the so-called genome supercomputer represents “an evolution of our thinking,” says Knome president and CEO Martin Tolar. While the larger genomics research organizations have dedicated teams and datacenters to handle genome data, for the majority of Knome’s clients, Tolar says, “you really want to have integrated hardware and software systems.”
With some 2,000 next-generation sequencing (NGS) instruments on the market, each close to sequencing a genome a day, Tolar asks: “Why not have a box sitting next to it to do the interpretation?” Ideally, he says, every NGS instrument should have a companion knoSYS 100 nearby. The system is a localized version of Knome’s existing genome analysis and interpretation software. “We’ve localized it, shrunk it, and modified it to work on a local system that sits behind the client’s firewall,” says Tolar.
“The software is still our core ‘value add,’” says Jonas Lee, Knome’s senior vice president of marketing and corporate development. “But most institutions have trouble putting together the precise [IT] system to run the software, so why not take that burden off their hands? Not everyone has a really deep IT bench -- so we did it for them.”
But is there real value in and pent-up demand for a “server-in-a-box”? Lee says many clients “need handholding” and makes the case: “This is new ground in the genomics business. Everyone’s saying, ‘How do we solve this informatics interpretation problem?’ Home solutions have come out, open source solutions, everyone’s cobbling stuff together to roll their own solution. We’ve heard there needs to be a stable solution with support that makes it easy. Once you do that, it’s not forcing them to develop their own hardware platform.”
Getting to Know You
The specs of the knoSYS system were designed internally, says Lee, who notes that Knome has several expert hardware professionals in its ranks. “The difficult part is optimizing the system,” says Lee. “You have processing, storage and bandwidth requirements -- any of these can be bottlenecks. So you have to design to what the software can do. We’ve created the optimized system around that.”
The system is optimized for the processing and I/O requirements required by genome informatics. Running on Linux, it includes four 2.4-GHz 8-core/16 thread Intel Xeon E5-2665 processors (20 MB cache); 64GB of DDR3 ECC 1600 memory; 2x250-GB SATA drives; 18 to 54 Terabytes of useable disk storage; and Gigabit Ethernet. The system boasts 1.2 TeraFLops of processing power, “which is a lot,” says Lee, providing a typical analytical throughput of 1 genome/day.
The “server room in a box” weighs 598 pounds, measuring 40 in (h) x 29.5 in (w) x 44.5 in (d). It requires a 30-Amp electrical source. The soundproofed enclosure provides nearly 20 decibels of noise reduction, such that it could in principle be housed in a laboratory environment. The knoSYS 100 is platform agnostic, and handles Illumina and Complete data from launch, with the Ion Torrent sequence format being readied for early 2013.
In addition to kGAP, the unit ships with two other Knome software products – knomeVARIANTS and knomePATHWAYS – as well as 76 annotated whole genome sequences. Several other data sources are also provided, including the human reference genome (HG19, dbSNP, Ensembl, HapMap III, 1000 Genomes, Human Protein Reference db, KEGG, and Human Gene Mutation db (HGMD). kGAP accepts call files (either GATK or Complete Genomics). The output format includes the Human Genome Fomat (.HGF), and utilities to convert to VCF files.
Knome says one of the benefits of the knoSYS pipeline is the ability to run in silico gene panels, dropping in subsets of gene and/or variant searches as needed against the full exome or genome dataset.
“This idea -- the ability of developing in silico superpanels -- is pretty groundbreaking,” says Lee. “It’s simple – once you have the genome, it’s on file, you can keep banging software tests up against it until you’re blue in the face. Many of these tests are being created as we speak.”
“In silico super panels allow hundreds of conditions to be tested simultaneously and open the door to the development of a new class of molecular diagnostics for complex, multi-gene disorders,” said Church. “Moving from a world of assays to apps will expand the definition of what a gene ‘test’ actually is.”
Lee cites the work of Knome scientific advisory board member Heidi Rehm on a cardiomyopathy gene panel and the studies of Stephen Kingsmore and colleagues in developing a panel of more than 600 rare early-onset disease genes. “These are all panels that can be run in the software,” says Lee. “Our technology allows you to save the genes that you care most about. Once you know the genes, or variants of pathogenic significance, you can save those variants so you can run those tests.”
“You’re not limited by cost to a handful of genes anymore,” Lee continues. “Take cardiomyopathy – you can layer onto this co-morbidities, other related diseases, or drug response… You don’t have to go through this cycling process.” The knoSYS 100 includes pre-installed “templates” or panels for cardiomyopathy, epilepsy, and breast cancer.
Inside the Black Box
Bio-IT World shared the specs of the knoSYS 100 system with two NGS/IT experts for expert comment. Shawn Levy, director of the Genome Services Lab at the HudsonAlpha Institute for Biotechnology, said the knoSYS 100 had potential as a turnkey solution for some groups, but argued that the amount of storage was “too small for this to be a long-term data storage and pre-processing solution.”
“It really needs to be double or more the disk storage listed,” Levy said. “Labs concerned with clinical data or labs interested in investing in a dedicated compute solution will find storage limiting in a reasonably short time.” Levy thinks 100 TB would have been preferable.
As for potential user concerns about depositing data in the cloud, Levy noted that “Life Technologies and Illumina are touting the cloud. “I appreciate the concerns with clinical data being in the cloud but with cloud security options evolving, I think you will see those concerns alleviated soon.”
Given Knome’s reputation for good interpretative software, Levy says he would not be surprised if the knoSYS 100 finds a receptive audience. “It will also be interesting to see if a single software provider can achieve a significant market share,” he added. “One consistency in genomics so far has been that no one software provider, whether academic or commercial, has been able to be dominant for any significant time.”
Chris Dagdigian, a bio-IT computing expert with the BioTeam, said the system was “an interesting idea” that could find a market. “There are a ton of instrument owners who operate in an island without IT support that really understands scientific computing. A fully packaged -- and more importantly supported -- system that is clean looking and quiet enough to run in a lab is an interesting idea. The popularity of systems like this may rise as the cost of sequencing falls so low that anyone can have an NGS instrument,” he said.
But some people will hate the idea of a joint hardware/software solution, Dagdigian said. The knoSYS 100 will not fit conveniently into datacenters because its width exceeds a standard rack, whereas its height means that a lot of vertical airspace can't be used. Many research IT professionals have “fought long-running battles to keep this sort of IT gear out of labs,” he said.
One issue is that the 30-amp circuit will require some advanced electrical work, but a bigger issue, Dagdigian says, is that “the larger companies and bigger organizations will hate an odd form factor system” and may prefer to stick with Knome’s software for integration with existing resources. Alternatively, “a purchase option that makes the soundproofed enclosure optional so that the innards can be mounted in traditional racks.”
Dagdigian says at first glance the processing/storage specs are not particularly astounding. “It's a four-node dual-socket HPC cluster with infiniband interconnects attached to a single storage server… Maybe the exotic stuff is in the packaging, support, form factor and functionality.”
Although they haven’t tested the actual unit, Knome’s early access partners “really liked this plug-and-play system,” says Lee. About a dozen institutions have signed on to the early access program to pilot the system, including Cedars-Sinai Medical Center, Cincinnati Children's Hospital Medical Center, Hyundai Cancer Institute at Children’s Hospital of Orange County, ARUP Laboratories (University of Utah), University of Liverpool (UK) and the University of Verona, Italy.
The knoSYS might appeal in particular to users who prefer to keep human genome data behind a firewall while concerns linger over the security of patient data in the cloud, from consent issues to privacy and legal concerns. “Many institutions say they want something that’s not cloud based,” says Lee. Indeed, the knoSYS 100 is “for research purposes only,” Tolar stresses, although that might change in time.
If one genome/day sounds like a rather modest analytical output, Lee says the system will become quicker over time. “We do an incredible amount of processing upfront, not just annotation but standardization and also comparing exomes to genomes.”
“My vision,” says Tolar, “is that in personalized medicine, if you have a patient with specific problem, you can go to the network [and ask], who has been in this situation before? What was the outcome? Which drug, which treatment [worked]? For that, you’d need to run a mini trial in that population. That’s what we’re enabling upfront.”
Tolar says Knome will invest more than $50 million in R&D in the coming years. “This is where we intend to make a lasting contribution to molecular-based, precision medicine.”
*9-27:This story was updated to include quotes from Chris Dagdigian (BioTeam).