A visit with Genentech’s new senior director of bioinformatics and computational biology.
By Wendy Wolfson
May 18, 2010 | While a statistics professor at the University of Auckland in the mid-90’s, Robert Gentleman and his colleague Ross Ihaka developed R, an open-source statistical programming language for data analysis and graphics (www.r-project.org). R is now ubiquitous, used in applications that go beyond drug discovery to the financial sector and defense. Gentleman later moved to Harvard and the Dana-Farber Cancer Center. Most recently he was at the Fred Hutchinson Cancer Center leading the Bioconductor project, www.bioconductor.org, which he had started in 2001 to develop open-source tools for bioinformatics and computational biology. In September 2009, Gentleman joined Genentech as senior director of bioinformatics and computational biology. He is on the board of Palo Alto-based REvolution Computing, a commercial provider of software and support for R. Wendy Wolfson spoke to Gentleman about his new role.
Bio•IT World: What is R and how is it being used in life sciences these days?
Gentleman: R is primarily a command line tool. Most biologists would probably use a different tool, unless they start to become more computational and then they want the flexibility from a system you get from R. It is a very customizable system with a very deep toolbox for doing things and it will make life easier but you need to know a fair amount of statistics, and be reasonably comfortable with computing and programming.
Ross [Ihaka] and I started R around 1992, but at the time we didn’t have the notion that anybody else would use it, let alone that we would actually finish it. It was sort of a toy. We were playing with ideas and they got bigger and bigger over time. Around 1996, we said that people who want to use it can try it out. It was certainly not a candle to what it is now. Making it open source with a particular license was a way to get our university to agree to let it out. The success of R really came from the formation of what is called the R core team, something on the order of 20 people who were heavily invested in making that project work. Ross and I started it but we got out of the way pretty quickly.
R is in widespread use—they were using it at Genentech before I arrived. Applications, R packages, are being developed all the time. I believe that R is used in many industries, including pharma. We use it for all kinds of problems that come up here: the next-gen sequencing, visualization, any sort of data analysis, microarray analysis. There are lots of people who have developed products on top of it.
What about the Bioconductor project?
Bioconductor is different than R. It had funding, mainly from the NIH, which R never had. That experience for me was more germane to what I am doing now, but at a smaller scope. For that project I was really a day-to-day manager. I haven’t stopped doing the scientific advisory part and am still actively involved in the Bioconductor project.
Commercial software has a tendency to lag 4 or 5 years behind what the real leading scientific groups need to do. For a place like Genentech Research, I don’t see commercial software tools as a good solution. We are buying end tools to analyze data, and are probably using fairly old computational ideas, whereas Bioconductor in particular is able to go way out in front. Bioconductor is primarily a tool for research scientists because it doesn’t have a customer base, it has a scientist base.
What attracted you to try a stint in industry and Genentech in particular?
There are not many industry jobs that look like this one. It is a pretty attractive place to be. I am in Genentech Research and Early Development (gRED.) It is the very early phase of drug development, not the more downstream clinical trials and marketing side. What I do every day is not actually that different now from what I did as an academic. It is a lot more directed toward making drugs and knowing whether drugs work and how bioinformatics and computational biology play a role in that process.
What are your chief goals in your new job?
I’m a manager more than anything, so one of my goals is getting my team to function well as a department. I have a fairly tight research group of about 6 or 7 people that takes explicit directions from me, and I head a department of about 50 people. Our research group currently comprises about 1300 people out of the 2000 employees of gRED. My research group is starting to focus in particular on things such as comparative genomics. How do we learn about the human genome and human diseases by looking at model organisms? We have long relied on model organisms for a variety of things. Mice are model organisms for a lot of therapies. Sometimes they are great model organisms and sometimes they aren’t. And starting to understand why they aren’t such good model organisms will help us to better interpret our mouse data in the context of wanting to go forward in humans. There is a lot of anecdotal and experimental evidence that we don’t see the same kind of behavior in mice that we do in humans. Hopefully the genomics will lead us through that.
We collaborate with all scientists in gRED and indeed with others outside of gRED in Genentech or in the larger Roche Group. The only important question is whether we can provide benefits for the project.
Will you be focusing on any particular applications? What are your research directions?
In broad strokes, the applications of next-generation sequencing capability to relevant problems would be one area. One specific application which is hard—maybe too hard, but it is always fun having something hard to play with—is to try to understand whether repetitive DNA plays a bigger role than we currently know about. That is probably a fairly low bar because we pretty much ignore repetitive DNA. That is one area where I think there is a lot of room to understand variation between people and to find associations with human disease. GWAS [Genome-wide association] studies have been very popular, finding associations between single nucleotide polymorphisms and diseases. But that is just the beginning. There are many more ways where your genome and somebody else’s genome will differ and so can we understand those and what role they play?
I am also interested in transcription factor behavior. The problem with that is that it doesn’t necessarily lead to druggable targets. Transcription factors are typically not good things to target with drugs. I do not believe that there are any drugs that target wild type functional transcription factors, for fairly good reasons. But it is important to decide how the cells are wired, so you can understand what happens when they become deregulated and things go bad. I am interested in trying to understand how we can use next-generation sequencing information, a whole genome view of things, to better understand what is happening inside a cell when mutations are present.
All of biology is fascinating to me. Mostly cancer biology, I’m pretty motivated to understand what is going on and help understand how to devise therapies for that. The same for the immune system. The way that the technology has changed to allow us to explore the genome and variability in human systems is really going to open up great opportunities. I think there are some really big challenges to try to understand how we can use that data to better understand what treatments are effective and don’t hurt people. I was reading a paper today that said that the problem isn’t trying to find drugs that will kill cancer. That’s easy to do. Drugs that kill cancer and do not kill people are hard to find.
A few years ago personalized medicine was very popular. I don’t doubt that my children will grow up in a world where they will get treated for diseases that are much more tailored for them and their particular genomic characteristics. But it is pretty slow going.
Evidence-based medicine, clinical trials, all of these things really started in the 1940s and 1950s. We learned how to say that this drug is only useful if we can show in a reasonable setting that it helps more people than it hurts. And that is great, but as soon as you start talking about personalized medicine, you have a sample size of one and no control group. How do we start dividing people up so we can make those statements? It is pretty complicated. We have to figure out what to average and not to average. •
This article also appeared in the May-June 2010 issue of Bio-IT World Magazine.
Subscriptions are free for qualifying individuals. Apply today.