January 15, 2005
Bio·IT World invited five experts from academia and industry to discuss the burgeoning field of integrative genomics.
Q: What do you think 'integrative genomics' actually means in the post-genome era? Just another buzzword, or a meaningful concept that is important for the industry?
pictured(left to right)
Darrell O. Ricke, executive director, Functional Genomics, Life Sciences Informatics, Novartis Institute for BioMedical Research
Ricke manages a team of eight bioinformaticians in the life science informatics (LSI) group, part of a Swiss-U.S. team that supports the core functional genomics group at Novartis.
Christopher W. Botka, computational biologist, Center for Genomics Research, Bauer Laboratory, Harvard University
Botka's group manages IT infrastructure for Harvard University's life science division, and fosters collaborations between computational and bench biologists.
J. Andrew Whitney, director, Research Informatics, Cellular Genomics
Whitney manages the use of chemical genetics technology for functional selectivity profiling of lead compounds and biomarker identification and runs IT infrastructure and knowledge management at Cellular Genomics, which focuses on kinase drug discovery.
John Hotchkiss, director of IT and informatics, EnVivo Pharmaceuticals
Hotchkiss handles assay automation and analysis at EnVivo, a biopharma that screens 20,000 transgenic Drosophila melanogaster (and generates up to 25 GB of movies) each week to find drugs for neurodegeneration. He also handles bioinformatics support and software integration.
Michael J. McManus, vice president, BioSciences Group, Fujitsu America
McManus' main focus is on desktop tools for high-throughput docking of small molecules into proteins, in combination with genomics-based methods to study the effect of genetic variation. Fujitsu produces bioinformatics and cheminformatics software and other hardware in Japan, aimed at the U.S. market.
BOTKA: I think it has successfully made it past the buzzword stage because they're naming programs at universities 'Integrated Genomics.' You know, if they're going to chisel it into the side of a building ... It's tough to define. If you Google it, it's different things to different people. Data integration is sort of the name of the game for us all the way around, so the 'integrative' is definitely a key word.
RICKE: You can go as extreme as saying it's personalized medicine, by integrating genomics into your whole discovery and clinical process. But that word has some bad connotations, depending on whom you are talking to. Some of the larger pharmas don't like the term 'personalized medicine.' Novartis is more open to the concept of actually working on it, but with the understanding that it will never reach the extreme of 'one person, one drug.'
Integration of platforms makes a lot of sense. Once you get a couple of different science technologies functionalized, you want to do the data mining, using the genome as a scaffold. A lot of people are focused on protein-coding genes; now we can move into regulatory regions. The noncoding genes have been largely ignored — how important a role will they play in the future ... We've been studying cell biology for centuries now, but suddenly the technologies have high-throughput capability. It's a pretty exciting time.
WHITNEY: Data integration is obviously a key element, but from my perspective I'd like to get to the biology in more integrated fashion. Ultimately, we're going to have to have a mind shift of people who can train in one area who will have to have something above this. I'd refer to that as integrative biology, but for now it's integrative genomics. You have to have something to climb on.
HOTCHKISS: We're using genomics both for disease models and to manage the production of 20,000 flies a week. We have the females fluoresce so we can use the cell sorter to sort them out! So we use integrative biology in both senses — we look at the whole organism — instead of trying to track from the bottom up through pathways, we'll go back, understand the pathology of disease, and do target identification.
WHITNEY: We take individual targets, kinases, then use our chemical genetics-based technologies to get more predictive power about what would happen to an organism when you acutely inhibit this protein. We're using this internally and with partners to help biotechs lower late-stage attrition rates, and better predictive tools and strategies for clinical development. That's obviously very important.
HOTCHKISS: The business sense is, what's the ROI? In our case, we'll be better about making decisions in mammalian systems by using Drosophila, which will save money and time.
McMANUS: John has an interesting comment: Rather than mess around at a low level to see the top, he's looking at what's more important, drilling top down to see what's underneath. But ultimately, aren't we all trying to construct a large network diagram that says, 'This is how all these things interact, it starts here and ends up with a person that looks like you'?
WHITNEY: A company like ours is less focused on defining complete pathways, and more focused on finding the elements of the pathway we need to move forward.
HOTCHKISS: It doesn't necessarily make any sense to define the entire pathway.
What sort of integrative approaches are paying off?
McMANUS: I can give two examples from an IT perspective. We're working with a pharma to use high-throughput docking to examine genomic variations — SNPs in proteins — and how those proteins interact with the same molecules. We look at docking scores on a variety of different proteins. The chemists, working at the desktop, are finding this very interesting. IT was typically reserved for the realm of the computational chemist, but the technology is becoming more commonplace — it's been worked over enough for the chemists to use themselves. We haven't discovered any drugs doing this yet, but we've highlighted an unexplained variation in the efficacy of certain small molecules that they hadn't been able to rationalize.
Another is our in situ hybridization (ISH) platform. Japan tends to prefer oral to inhaled medicines — they have a long history of herbal remedies. The main target gene for our Japanese pharma partner, for which they had an effective inhibitor, was found in non-target tissues using this ISH platform, occurring in small intestines. That taught the company that they should go back to an inhaled formulation. So they are now producing that compound as an inhalable drug.
RICKE: Even using basic expression technology, we have a candidate in the pipeline discovered using that technology. You take expression and integrate it with the literature; we're doing a lot of fundamental research. We're taking commercial products for literature mining and creating networks of genes, combining that with screening assays and expression data, and also looking at protein binding; then coupling that to groups making animal models in Drosophila.
Sometimes we'll develop models within our group — for example, Drosophila, zebrafish — and we'll build models for partnering with disease areas they want to target. They may ask us to help analyze the data, or devise experiments; other times they'll say, 'We want to use your technology as a platform.' Paul Herrling saw the vision for Novartis and built it up, and Mark Fishman has continued to build on this. Some groups may have some hesitation in this, but it's viewed as being very successful within Novartis.
HOTCHKISS: Our biggest success story is histone deacetylase inhibition. We confirmed the neuroprotective properties, and used that as a basis for a relationship with a company with chemistry in that space. We're very excited about it. Work in the fly has led to a compound that is showing some efficacy. So that's our biggest success story.
We have some others as well. When you get inside a screen, you see the power of it to pick up compound families that seem to have a certain kind of efficacy.
WHITNEY: Half of our technology is to identify direct substrates of kinases, as well as to study the effects of potent and selective kinase inhibition. With the substrate ID technology, we've been able to extend the biology of known kinases through identification of new substrates in relevant cells and tissues.
We've also used this to accelerate assay development. We're wrapping up a collaboration with Affymetrix to define the molecular fingerprint of the potent and selective potent inhibition of a kinase, and have used this fingerprint to assess functional selectivity of lead compounds.
BOTKA: The statistics thing is a running question: Do you normalize microarray data — that still hasn't been answered. There's basically an entire lab of statisticians working at the [Bauer] Center, but none are trained as biologists. There's another group that just does image processing, looking at microarray and proteomics data. I think that's a second front of integration that's decidedly different than having a biologist who can understand what questions they want to ask — these are people who have a completely different perspective focusing on biology.
The biggest thing we've gotten is when we brought people who are not traditionally thought of as life scientists into the field and said, 'What can you bring to the table for us?' That's been great; everyone's been very happy with those relationships. We're just beginning to ask questions you didn't even know you could consider. Most biologists don't come through graduate school with a strong math or physics or statistics background, and to be able to talk to people about whether their data are meaningful, and to talk about data from a statistical perspective, is incredibly powerful.
How important is it to get biologists savvy to computational mathematics?
BOTKA: That's the Holy Grail right there! The training-versus-education question comes into play. Is it worth training people how to do things and understanding it before they start doing it? The jury's still out on that.
McMANUS: There's a paradigm shift going on. We've gone on intuition, largely driven by what we feel and hear and see in the lab. That was fine for the low-hanging fruit.
HOTCHKISS: But it goes both ways. The Drosophilists come back to me and say, 'Look, I'm seeing this effect with this drug, but your stats aren't showing it!' It's created a great market for companies like Ingenuity, who take the fact that people can't compute that, because they [Ingenuity] can sell it.
RICKE: I see two paradigms. The statistician is learning the biology ... I'm embedded in a team of bioinformaticians, statisticians, chemists, even a vet, bringing in those expertise ... The vet is a wonderful person mining the data. I'd love to see an M.D. join the team next.
To be a biologist, you need stats, programming, etc. Previously, I was with a group of 200 scientists in San Diego. We ran a class on Perl and had to offer it three times, and then an intermediate version. About half the scientists attended that class to learn those skills needed to work with their data. What it means to be a scientist in biology is changing rapidly.
BOTKA: People want to learn Perl, but it's too slow right now. One thing that needs to happen is the integration of good, intuitive tools to start asking questions, so you won't have to learn Perl and retrain yourself. I don't think a biologist should necessarily have to retrain as a computational scientist to ask the questions they want to ask, but that's where it is right now.
RICKE: Biologists are users of software tools. With the large volumes of data, in order to move the science forward, they didn't have enough tools. Most of them don't want to be programmers, but they were doing it because they saw it as the next step. They'll have the tools, they'll become experts in using tools like Spotfire and GeneSpring and MatLab and other tools in the future to work with their datasets.
BOTKA: They're stuck in the purgatory between Perl and an Excel spreadsheet right now. It's just a bad place to be! Taking something they can find in large quantity and adding a column to a spreadsheet or having a database and not knowing what to do with that.
McMANUS: LION thought they could sell a large platform. But people want components ... I like the Lego approach. You can buy Legos, and make your own tank. But no one wants your tank — they want the Legos to augment their own stuff. So you need to have tools that could be integrated loosely into a platform conceptually but that could be plucked out individually and integrated into someone else's platform.
Is predictive modeling far-fetched, or do we have the computational capability to model pathways and systems in a meaningful way?
McMANUS: I think there's value in in silico technology, but what's missing is the feedback loop. What we're pushing at Fujitsu is in silico prediction to experimental validation. We have some wet-lab techniques, like in situ hybridization and microinjection, and we have predictive techniques like high-throughput docking. What's missing is, we might use 10 compounds I predicted, but what about the knowledge that went along with those that failed? Where is that information being stored? We're not using the negatives to inform us in our next choice. For in silico technology to really be worth anything, you've got to capture the failures, and use them in future steps. Until we can do that, we're just throwing away reams of data that could potentially inform us about things we've missed.
BOTKA: I believe that at some point, we'll be able to make a model. I've no idea when that is. Not in my lifetime ... but only because my lifetime will probably be quite short! We can barely manage the data from one experiment in 10 that is successful, let alone all the ones that failed. There are a lot of people sighing with relief because they don't have to analyze those data as well.
McMANUS: The guys in physics have said this for years. Physicists are always ahead of the chemists and biologists. It's the negative data that often holds a lot of promise.
BOTKA: A lot of the effort around systems biology has been more in the development of ontologies and ways of representing data, and making data that haven't otherwise been computable. There are people publishing pathways that are being generated by in silico data or whatever, but that's largely not where the main effort of systems biology is right now. It's trying to figure out how to make the things that will end up becoming the networks and the pathways, how to turn the information we have computable ... [The Entelos approach] is the Lego approach — you're not getting paid to discover the pathway, but to discover a drug. The people where I am are getting paid to discover the pathway.
What are some of your major IT challenges and hurdles today?
HOTCHKISS: Where to begin?
WHITNEY: Can we say yes to that?
HOTCHKISS: In my ideal world, I'd have this neat LIMS system capturing every aspect of experimental processes. We'd feed in the assay results, the results would be elegantly available, we could browse them, step back through the data, they'd be concisely available to the chemists. That's the system I'm trying to put in place.
We use Spotfire ... that's worked well for us. But the LIMS system has been tough. There need to be better commercial products out there. We tried to integrate existing products without great success — I don't want to name names ...
McMANUS: A lot of LIMS systems came out of analytical chemistry. They started in an area where the concept of a biological entity was not even present, and have evolved from that distant point. They still have a lot of discrete, measurable things about them for things that aren't necessarily discrete and measurable ... A company doing in situ hybridization ends up building a custom LIMS system with 20 GB per day of images, and all the biology around it. Images, let alone movies, weren't part of the data [the LIMS] would normally retain.
HOTCHKISS: We're still young, still changing our lab processes. If we were in a perfectly static environment, it would be a much easier problem to solve — supporting the type of throughput we do, the amount of data we generate. I bought the best one for the job — it wasn't good enough. We'll depend more and more on our custom work to capture results.
WHITNEY: We started customizing a commercial product for one project, and decided that expanding that to the rest of the company was not really feasible. But the nice thing about this LIMS is it took care of the lab process, especially from the chemistry side. I want to know everything we've done with this compound — we can do that now.
RICKE: We've evaluated different commercial LIMS products, but not actually bringing them in ... Even on an existing project right now, we're working on a custom LIMS, tailored specifically to the science, focused on supporting just this type of data. There are a lot of generic LIMS systems out there, but when it gets to biotech things tend to be highly specialized. The data are evolving very rapidly — we have new data types we didn't have a year ago. That's a big challenge.
We talk about XML databases; if we can have some accepted standards for how to represent that XML, people would readily adopt it. Or some standards for Web services ... It's either that or the next-step Holy Grail will be the 'Microsoft Office' biological tool, which doesn't exist. So the challenge is integrating the data. What are the common ways of connecting the data? One of the key things is the genes mapped to the genome, but that's not enough. These literature-mining tools, interaction networks, are just one element of the problem. They're very useful, but very early stages.
Five years from now, we'll have tools that are much more integrative. It's only just beginning today — we see a couple of different tools linking products together. Pathway tools, you put that into tools for expression like GeneSpring, other things with Spotfire (BioConductor), putting these components together ... Biologists want to use these tools to model the data.
How important is the open-software community?
BOTKA: It's clearly important for us, but it's why comparative genomics tools have not been marketable — it's all been open source. You can't really sell the algorithms, so that's why getting those integrated with the new technologies has languished a bit.
McMANUS: On the chemistry side, it's not open source, it's all closed, all for sale. You've got a process that involves chemistry and biology — they have completely different models or paradigms for how to get access to this stuff. That's a barrier right there.
What is your hope for the field in the next 10 years?
WHITNEY: It's not personalized medicine, but a step on that path. Better clinical trial design, being able to focus on patient subpopulations. We're hearing that the FDA is thinking about these things — what technology will be applicable to the development process, not just the discovery process. I hope we can see more integration of that.
McMANUS: In a clinical trial, you choose what you hope is a representative population, but when you get out there you find the diversity is greater than you thought. We need genomics tools to select a better clinical trial population, so you have a better idea of how the drug will perform.
RICKE: Science will get a good handle on a cell. We've got the genome today, but given the dynamics with proteins, RNA, there'll be a lot of inroads, but cell-cell interactions, tissues, will be beyond 10 years. The fundamentals of a single cell, the different cell types, a lot of that will come in the next 10 years.
In drug development, there are a lot of trials now, where they know the specific gene targeted with the drug, but do they go out to the patients and look at their alleles, and the other genes that might interact with the drug? There might be a duplicated gene, another enzyme that interacts with the drug. My guess is in 10 years a fundamental part of the clinical trials will be looking at those parameters — what we know about the targets, all the way through the clinical trials, adverse drug reactions, the nonresponders, the negative data, capturing that in the process ...
McMANUS: The Iressa data were a 'shocker,' but they knew this from day one.
HOTCHKISS: There are any number of drugs that haven't been looked at that way. They become the bastard child, but you have to think there are some interesting compounds out there.
So 10 years from now, what are they going to be chiseling on the side of your department?
Within 10 years, the most progress in my world will be standardizing a way to take data from microarray or genomics to proteomics experiments and put them into one format that can be computable by one set of algorithms, more readily than they are now. Whether the technology will be there to predict halfway, I don't know, but enough momentum is in place to be available for the analysis.
Special thanks to Mario Fante of Aviator PR, a life sciences consultancy, for helping to organize this discussion.