Edited Oct 16, 1:42: Quotes added from the Broad and David Altshuler in paragraphs five through seven.
By Allison Proffitt
October 16, 2012 | The Broad Institute and Appistry announced an arrangement this morning for Appistry to distribute and support the Broad’s Genome Analysis Toolkit (GATK) for next-generation sequencing data analysis. For a subscription fee, Appistry will distribute the GATK and its suite of analysis tools to for-profit users and provide commercial-grade customer support.
First developed and introduced by the Broad Institute in 2009, the GATK framework is an industrial-strength, computational engine empowering the development of analysis tools for next-generation sequencing. On top of this engine, the Broad Institute has built a set of analysis tools (“Apps”) to process data from any next-generation sequencing platform, and to identify changes in the sequences that may be associated with disease.
Introduced as an open source option, the toolkit’s user base currently numbers about 100,000 users a month—bioinformatics professionals, biomedical researchers, and clinicians from both non-profit and for-profit organizations. GATK has been used on initiatives ranging from the 1000 Genomes Project to The NIH’s Cancer Genome Atlas, and by many sequencing centers and leading researchers to conduct population studies and explore the genetic origins of disease.
Moving forward, Appistry will make the GATK 2.0 available to users at for-profit companies under a license agreement with a subscription fee that will cover commercial-grade support for installation, configuration, and documentation as well as long-term support for each commercial release. Nonprofit organizations can still use the GATK as an open source tool, and will apply for their licenses directly from the Broad.
“The GATK framework has been—and remains—an open source computational framework for next generation sequencing,” stressed David Altshuler, deputy director of the Broad Institute, to Bio-IT World. “The Broad Institute has also built, and continues to build, a variety of core and premium apps, if you will, that run on the GATK framework for doing particular analyses... Other people can and will continue to develop their own apps on the framework.”
Altshuler continued: “We are motivated to try and continue to advance the GATK for everybody’s use, but it’s actually difficult to raise grant money to support software development. So while continuing to give out the GATK framework in open source to everyone, and having a license such that all nonprofit research users can have all the tools without cost, we’ve set up this relationship with Appistry to provide higher levels of support to for profit users or anyone else that wants to pay for it, and in return the Broad receives financial support that we use then to continue developing the software so it will remain a vibrant and ongoing capability for the community rather than being static.”
Altshuler already has plans for further development of the GATK with the income from licensing the toolkit. “There are other kinds of variants that still remain either difficult to identify or where the sensitivity or specificity is not as high, for example, as they are for SNPs. So we will use the resources both to extend… which types of sequence variation can be detected with the same levels of sensitivity and specificity that we all expect and need, [and] to increase the speed so that larger and larger samples can be analyzed and the computational efficiency.”
Appistry’s flavor of high performance computing—they call it “fabric computing”— weaves together high performance computing and analytics to provide throughput and scale for big data problems (see, Appistry’s Fabric Computing). The company’s own life sciences offering, Ayrris Bio, combines HIPAA-compliant data storage with fast analysis.
The company is already “in the business of scaling the science in this genomic domain,” Appistry’s VP of sales and marketing, Trevor Heritage, told Bio-IT World. Appistry was a good fit for this project because, “on a day to day basis we interact with exactly the same target customer base that GATK would have. We already have an established go-to-market operation that the Broad can leverage, thereby enabling their activities to focus on what they do best, which is the scientific innovation.”
Appistry’s CEO, Kevin Haar, told Bio-IT World that the arrangement would ensure the toolkit’s position in the market. “I think there’s so much happening in this space and this space is moving so aggressively forward, that there’s more and more opportunities for a toolkit like GATK to make a contribution. I think the Broad really sees this as the best way of securing the future of GATK in terms of being the best available tool and I think that’s really important that we all enable the acceleration of the science and the acceleration of what’s possible in the space.”
Appistry’s experience in compliance is also an advantage, Heritage said, for personalized medicine applications. “You start to see requirements from the customer base for HIPAA compliance and CLIA compliance in both their sequencing and their data management and the data analysis. So the combination of GATK 2.0 here with Appistry actually provides a solution that meets those requirements. In terms of our high performance compute infrastructure, we’re able to offer our customers a HIPPA compliant capability, and GATK 2.0 provides, through Appistry, the ability to have a version-controlled data analysis pipeline.”