The Druggable Genome Is Now Googleable

By Aaron Krol

November 22, 2013 | Relationships between human genetic variation and drug responses are being documented at an accelerating rate, and have become some of the most promising avenues of research for understanding the molecular pathways of diseases and pharmaceuticals alike. Drug-gene interactions are a cornerstone of personalized medicine, and learning about the drugs that mediate gene expression can point the way toward new therapeutics with more targeted effects, or novel disease targets for existing drugs. So it may seem surprising that, until October of this year, a researcher interested in pharmacogenetics generally needed the help of a dedicated bioinformatician just to access the known background on a gene’s drug associations.

Obi and Malachi Griffith are particularly dedicated bioinformaticians, who specialize in applying data analytics to cancer research, a rich field for drug-gene information. Like many professionals in their budding field, the Griffiths pursued doctoral research in bioinformatics applications at a time when this was not quite recognized as a distinct discipline, and quickly found their data-mining talents in hot demand. “We found ourselves answering the same questions over and over again,” says Malachi. “A clinician or researcher, who perhaps wasn’t a bioinformatician, would have a list of genes, and would ask, ‘Well, which of these genes are kinases? Which of these genes has a known drug or is potentially druggable?’ And we would spend time writing custom scripts and doing ad hoc analyses, and eventually decided that you really shouldn’t need a bioinformatics expert to answer this question for you.”

The Griffiths – identical twin brothers, though Malachi helpfully sports a beard – had by this time joined each other at one of the world’s premiere genomic research centers, the Genome Institute at Washington University in St. Louis, and figured they had the resources to improve this state of affairs. The Genome Institute is generously funded by the NIH and was a major contributor to the Human Genome Project; the Griffiths had congregated there deliberately after completing post-doctoral fellowships at the Lawrence Berkeley National Laboratory in California (Obi) and the Michael Smith Genome Sciences Centre in Vancouver (Malachi). “When we finished our PhDs, we knew we would like to set up a lab together,” says Obi. At the Genome Institute, they pitched the idea of building a free, searchable online database of drug-gene associations, and soon the Drug Gene Interaction Database (DGIdb) was under development.

In Search of the Druggable Genome

Existing public databases, like DrugBank, the Therapeutic Target Database, and PharmGKB, were the first ports of call, where a wealth of information was waiting to be re-aggregated in a searchable format. “For their use cases [these databases] are quite powerful,” says Obi. “They were just missing that final component, which is user accessibility for the non-informatics expert.” Getting all this data into DGIdb was and remains the most labor-intensive part of the project. At least two steps removed from the original sources establishing each interaction, the Griffiths felt they had to reexamine each data point, tracing it back to publication and scrutinizing its reliability. “It’s sort of become a rite of passage in our group,” says Malachi. “When new people join the lab, they have to really dig into this resource, learn what it’s all about, and then contribute some of their time toward manual curation.”

The website’s main innovation, however, is its user interface, which presents itself like Google but returns results a little more like a good medical records system. The homepage lets you enter a gene or panel of genes into a search box, and if desired, add a few basic filters. Entering search terms brings up a chart that quickly summarizes any known drug interactions, which can then be further filtered or tracked back to the original sources. The emphasis is not on a detailed breakdown of publications or molecular behavior, but on immediately viewing which drugs affect a given gene’s expression and how. “We did try to place quite a bit of emphasis on creating something that was intuitive and easy to use,” says Malachi. Beta testing involved watching unfamiliar users navigate the website and taking notes on how they interacted with the platform.

DGIdb went live in February of this year, followed by a publication in Nature Methods this October, and the database is now readily accessible at http://dgidb.org/. The code is open source and can be modified for any specific use case, using the Perl, Ruby, Shell, or Python programming languages, and the Genome Institute has also made available their internal API for users who want to run documents through the database automatically, or perform more sophisticated search functions. User response will be key to sustaining and expanding the project, and the Griffiths are looking forward to an update that draws on outside researchers’ knowledge. “A lot of this information [on drug-gene interactions] really resides in the minds of experts,” says Malachi, “and isn’t in a form that we can easily aggregate it from… We’re really motivated to have a crowdsourcing element, so that we can start to harness all of that information.” In the meantime, the bright orange “Feedback” button on every page of the site is being bombarded with requests to add specific interactions to the database.

Not all these interactions are easy to validate. “Another area that we’re really actively trying to pursue,” adds Malachi, “is getting information out of sources where text mining is required, where information is really not in a form where the interaction between genes and drugs is laid out quickly.” He cites the example of clinicaltrials.gov, where the results of all registered clinical trials in the United States are made available online. This surely includes untapped material on drug-gene interactions, but nowhere are those results neatly summarized. “You either have a huge manual curation problem on your hands – there’s literally hundreds of thousands of clinical trial records – or you have to come up with some kind of machine learning, text-mining approach.” So far, the Genome Institute has been limited to manual curation for this kind of scenario, but with a resource as large as the clinical trials registry, the Griffiths hope to bring their programming savvy to bear on a more efficient attack.

In the meantime, new resources are continuously being brought into the database, rising from eleven data sources on launch to sixteen now, with more in the curation pipeline. DGIdb is already regularly incorporated in the Genome Institute’s research. Every cancer patient sequenced at Washington University has her genetic data run first through an analytics pipeline to find genes with unusual variants or levels of expression, and then through DGIdb to see whether any of these genes are known to be druggable. This is an ideal use case for the database, which is presently biased toward cancer-related interactions, the Griffiths’ own area of research.

The twins have a personal investment in advancing cancer therapeutics. Their mother died in her forties from an aggressive case of breast cancer, while Obi and Malachi were still in high school, and their family has continued to suffer disproportionately from cancer ever since. Says Obi, “We’ve had the opportunity to see [everything from] terrible, tragic outcomes… to the other end of the spectrum, where advances in the way cancer is treated were able to really make a huge difference to both our cousin and our brother,” both in remission after life-threatening cases of childhood leukemia and Ewing’s sarcoma, respectively. “Everyone can tell these stories,” Malachi adds, “but we’ve had a little more than our fair share.”

DGIdb can’t influence cancer care directly – most of the data available on drug-gene interactions is too tentative for clinical use – but it can spur research into more personalized treatments for genetically distinct cancers, and increasingly for other diseases as more information is brought inside. Meanwhile, companies like Foundation Medicine and MolecularHealth are drawing on similar drug-gene datasets, narrowed down to the most actionable information, to tailor clinical action to individual cancer patients. The Griffiths are cautiously optimistic that research like the Genome Institute's is approaching the crucial tipping point where finely tuned clinical decisions could be made based on a patient’s genetic profile. “We’re still firmly on the academic research side,” says Malachi, but “we’re definitely at the stage where this idea needs to be pursued aggressively.”