By Kevin Davies
July 18, 2012 | Genomatix is one of a rapidly growing number of software companies competing in the genome analysis, annotation and interpretation space. But its executives believe that the company’s forte – transcription and gene expression analysis – still give it a distinct edge in the marketplace.
The company was founded by Thomas Werner in 1998. “Thomas is a world-renowned figure in the world of transcription factor binding site science,” says Peter Grant, CEO of US Operations for Genomatix.
Werner studied gene regulation and promoter organization. “We still get people who come to us and say, ‘Jeez, I've got a gene of interest. It's a transcription factor and I want to find out if it binds to this promoter.’ And I’m so tempted to say, ‘I'll bet you $1 it is’ … The current consensus is that there's a transcription factor binding site every ten base pairs across the entire genome,” says Grant.
Werner was heading up a research group at the National Genomic Research Center for Environment and Health in Munich (today it's called the Helmholtz Center) when he was encouraged to spin out a software company focusing on eukaryotic (mostly mammalian) gene regulation.
Genomatix management (from L-to-R):
Klaus May, Martin Seifert, Peter Grant
(photo: Kevin Davies)
“The whole field started to evolve,” says chief executive Martin Seifert. “In those days, you were happy if you had one promoter sequence. And then the Human Genome Project took off and the first publication was chromosome 22. We had to think about methods to identify a promoter in a mammalian genome. It sounds easy, but it isn't.”
Back then, Seifert recalls, there were only rudimentary promoter prediction tools. “We had to think about clever ways of doing mammalian promoter recognition and there was one breakthrough paper -- a program called Promoter Inspector – developed by our CTO Matthias Scherf -- which boosted the specificity up to over 80 percent. Today we know it's more than 97 percent, because all the predictions over time now have been experimentally verified.”
The early business model was to target academic labs and some pharma companies. Genomatix developed a chromosome-wide promoter annotation for chromosome 22, later expanded to the entire genome. The company licensed this promoter database to DoubleTwist, but just as Genomatix attempted to move into genome annotation, the Bay Area bioinformatics company folded.
“We had to think about our own ways to annotate,” Seifert says. “We had a good foundation of knowledge about how to deal with mapping, so we created our own annotation pipeline. Our annotation database is largely based on the re-mapping of genomic features, including spliced mapping, so we do get novel findings and can verify existing knowledge.”
Genomatix conducted exhaustive cross-mapping of transcripts from many mammalian organisms. “We had just filed for this annotation patent – we were researching mapping and alignments years before genome sequencing -- and that's why we think we are pretty good in this area,” says Seifert.
The move into gene annotation was a natural progression from transcription factors and promoters, encompassing literature mining, network building, pathways, and systems biology. “We built analysis pipelines for microarray experiments to understand how genes are co-regulated and how they go together – this was our ChipInspector system,” says Seifert.
With the growing popularity of microarray platforms, notably from Affymetrix, Genomatix quickly appreciated that their database presented a much more complex, nuanced picture of gene isoforms and alternative transcription. “We figured out that many of these probe sets may be located in one transcript but only half in another transcript from the same location,” he says. Genomatix shifted its algorithm approach to a single-probe approach to call a transcript with a certain number of single probes. “A lot of data which we initially thought was not really worth very much turned out to be a really good driver of basic science.”
In 2005, Genomatix scientists published a paper in Trends in Genetics on microarray data mining, laying out how microarray data could be put into context to obtain a real biological picture. “With the arrival of the first next gen sequencing data, we thought, ‘OK, let’s focus on that,’” says Seifert.
In 2006, Genomatix entered a collaboration with the Max Planck Institute for Molecular Genetics in Berlin, which had one of the first Solexa next-gen sequencing machines. The collaboration bore fruit, including the sequencing of two human cell lines and publication of the first human RNASeq paper in Science.
“We are convinced that to identify what is really new, you have to know what is already known and you must have a good genome annotation in place,” says Seifert. “You have to look into literature to be able to put everything into context. But if you want to discover a new transcript you should better know what known transcripts are there and this is where we really get it right -- very good data content with the algorithmic layer above that. Our aim always is to create biological value for our customers and sometimes also for the company.”
From its historic strength in transcriptomics, Genomatix has broadened its offerings in response to the NGS data explosion, including data analysis of DNA sequencing, re-sequencing, small RNA sequencing, epigenomics driven sequencing strategies and discovering patterns in the DNA with transcription factors from ChIP-Seq experiments.
“We have an integrated solution where you don't need to take your data from one step to another manually. Everything is in one place and you don't have to do data conversions. We’ve tried to bring together all the different bits and pieces to reveal a full biologic picture at the end,” says Seifert. It is a fully integrated annotation system, from raw reads to biological interpretation.
With greater interest in clinical genome analysis, Seifert says that “an integrated solution is important to have control of the different steps so that you can … standardize your system.”
To a degree, Genomatix is adding that information as it curates the literature, which is done manually by an expert team in Germany. “Together with partners, we’re trying to build up databases containing medical information and bringing this together with the research part of the analysis pipeline,” says chief business officer Klaus May.
One key collaboration is with the Center for Prostate Disease Research (CPDR) in Washington D.C., part of a Department of Defense-funded consortium. Other collaborators include the National Eye Institute, the University of Miami’s Department of Ophthalmology, and the University of Pittsburgh.
Genomatix is also a partner in the BLUEPRINT Project, part of the IHEC Consortium, and a systems biology effort in breast cancer, funded by the European Union. The company also has good relationships with Novartis, Pfizer and Boehringer-Ingelheim.
The CPDR study looks at expression analysis among a phenotypically differentiated group of prostate cancer samples from patients who are part of the U.S. Department of Defense healthcare system. “We can work retrospectively and prospectively,” says Seifert. “Together with our partners we’ve found that there are differentiating aspects between those groups which have the potential to be used as a prognostic marker [such as elevated FLT3 expression] for metastatic processes. The CPDR tested this on more than 110 additional patients and the predictive power looks very promising.”
With the growing trend in cancer genome sequencing, Seifert says scientific knowledge of key genomic regions may be critical in evaluating a particular therapy. “This information can be pulled out from this data and can change a patient’s therapeutic regime,” he says. “In many cases, we think there is enough background knowledge to make a better informed decision for the patient. And by presenting the background data and bringing it together with the experimental data in a proper way, we can help the physicians to make a better choice for the patient.”
“We want to come up with a standardized system which can be intuitively used by physicians, to strip down the richness of the scientific information to a level where a physician can immediately use it as a support for diagnostics” says May. “It's not finished but we’re integrating medical data on the one hand, and doing meta-analysis on the other. In many cases it's necessary to not only look at the DNA but also the epigenome and expression and then overlay.”
The genome interpretation space is getting crowded, but Grant doesn’t see Genomatix butting heads with established DNA analysis vendors such as Partek, CLCbio and Ingenuity. “The real differentiator of Genomatix is its richness in background data in an integrated solution combined with the scientific expertise of the team,” he says.
Genomatix runs a private cloud in-house, with significant volumes of EMC data storage. Seifert stresses the system’s scalability and security. “We have complete control about everything, but in principle we can also run our technology in a public cloud like Amazon.”
“From very early on, we had an online business model,” says May. “We already had a cloud model in the year 2000, deploying the software as a service to our customers -- computing on demand.”
However, legal regulations in some countries about distribution of patient data make the public cloud a less priority. “We're doing business with Arabian countries but normally sequencing data from people there is not allowed to leave the country. The in-house solution is the only feasible one,” says Seifert.
One exciting initiative is the launch of a new service in plant genome annotations. “Plant genomes are a more difficult than mammalian genomes to annotate because of all their specific aspects. But still our technology can help to speed up this process significantly,” says May.
In addition to deploying its NGS software as a turnkey solution or as a cloud-based solution, Genomatix now offers a third component: mygenomatix.com. “The idea was to make the technology available to a broader scientific community, as well as those who don't do higher next-gen sequencing on their regular databases, but are just doing a couple of experiments per year. This is picking up very nicely now as a pay-per-project basis,” says Grant.
One tagline from Genomatix has always been ‘multiple lines of evidence,’ says Grant. “If you can find information coming from expression, from methylation, from the literature, then let's step outside our domain [and consider] proteomics, clinical -- the more information you can start bringing together that all point in the same direction the better your confidence level.”