These are just a few of the questions J. Craig Venter answers in the second half of Bio·IT World's two-part interview with the controversial scientist and entrepreneur. Venter talks through his latest to-do list, describing in detail his not-for-profit institutes, congressional lobbying efforts, plans to create a synthetic organism, and vision for genomics' future.
Some editing was necessary to fit this expansive interview into two issues of Bio·IT World magazine. To hear the interview in its entirety, go to the Bio·IT World Web site, where the full interview will be broadcast during December.
Q: Do you have prescriptions for the data glut that's mounting by the day? Mining these data for something meaningful has become exceedingly difficult.
A: It's only difficult because people don't know what the knowledge is to extract from the data. Ninety-nine percent of [the human genome] probably is not particularly relevant to our biology except at an early stage. So with the human resequencing project that we're [The Institute for Genomic Research (TIGR)] just starting, we're going to look at less than 2 percent of the genetic code.
It was important once or twice to get the whole structure, or most of it. I think this effort going on for the last year spending hundreds of millions of dollars of public money to sequence Alus [repetitive DNA sequences] is of questionable intellectual significance, and we could've accomplished a lot more science with that. At the same time, having all the gaps and things closed one time is probably not a bad thing. I don't think it will advance science or understanding of anything, but emotionally it's nice. It's an expensive emotion to satisfy when people are dying of breast cancer. But at the same time, the information that will matter to you about your life is a fraction of your genetic code, probably less than 1 percent in reality. So, there is an information data glut, but with just decent computing we're able to handle that quite easily.
Q: Was sequencing the human genome worth the effort? Would it have been better to simply compile a library of expressed genes?
A: Certainly not better. I proposed that early on. Early advocates of that [approach in the 1980s] were Sydney Brenner [Molecular Sciences Institute] and Paul Berg [Stanford University] (see First Base). I learned arguments about cDNA approaches from them. In fact, we held up publishing our [1991 Science] paper for a long time, offering to co-publish with Sydney with what he was doing.
I didn't invent a cDNA approach. I invented ESTs [expressed sequence tags], which made a cDNA approach actually viable and gave scientific data to the hypotheses of people who were being shouted down because of the complete misunderstandings about gene expressions that were out there. I mean, you can look at [Jim] Watson's book [Molecular Biology of the Gene] and it basically says a few genes would be overexpressed and you wouldn't see any of the other RNAs. And that's why people shouted down the cDNA approach.
I think having the complete genetic code is an essential thing for finding the gene sets, because so many genes are going to be rarely expressed. We jump-started it. We had apparently over half the human genes by '92-'93. Some people made really good use of it. Bert Vogelstein [Johns Hopkins University] did with colon cancer and really moved that field ahead. But it was almost an underground movement because, politically, ESTs were a four-letter word, because Watson and people made them that way. But yet, they changed what people could do scientifically.
If you go back and read the original EST paper [Adams, M.D., et al. "Complementary DNA sequencing: expressed sequence tags and human genome project" Science 252,1651-6 (1991)], I said this is the method that will be used to annotate the human genome. And it totally was! None of my critics that tried to shout down ESTs from the beginning, or used them extensively as the only method they had to annotate the human genome, have acknowledged that ... but that's all right.
Q: We should talk about your new genome center, if that's the right title. How is it organized? And when did you establish your foundation and the endowment?
A: The foundation is a recent thing. But the endowment [is not], I mean TIGR just finished 10 years. So we started in June of '92, and I started an endowment in TIGR that initially tapped the shares that I could have taken for Human Genome Sciences [Inc.]. One of the motivations for agreeing to some of the terms for Celera [Genomics Group] is that the endowment would be built with Celera stock. My goal was to try and create a situation where I could do the science I wanted to do by having the resources to do it.
What I'm trying to do now [is] driving genomics forward, trying to [resequence] a thousand human genomes. I mean, you'd probably love to be a fly on the wall of genome study section reviewing that, right? I'm sure they'd get quite a chuckle over that. So I'm using money that I've put into the endowment to jump-start the basic science. We're building this new sequencing center. Just the basic equipment is going to be on the order of $30 million.
Q: Have you made a decision as to whom you're going to go with?
A: It's coming down to the wire. It's going to be either Amersham [Biosciences] or ABI [Applied Biosystems Group]. In terms of today's equipment, there's really no other choice. They are in there pitching, and we've had a 15-year, very successful, track record with ABI equipment, but on these large facilities there are all kinds of cost differentials.
Q: Can we go over the goals of your new organizations?
A: So, we transferred the endowment from TIGR to what's now called the J. Craig Venter Science Foundation, and that's set up to support three, now four, centers [including the planned new sequencing center], but basically three: TIGR, TCAG [The Center for the Advancement of Genomics], and IBEA [the Institute for Biological Energy Alternatives]. Instead of being a hierarchical organization, it's truly support. We provide money and infrastructure support in terms of legal and HR support to all three organizations.
TIGR doesn't need much because it's largely independent and has a per capita grant portfolio that probably exceeds that of any other organization in the U.S. It's just below the radar screen, so it doesn't show up, but I think its total grants [are] something like $175 million in committed grant funds.
Q: How would you classify TCAG? Is it a think tank? There's no actual wet-lab research in there?
A: There's bioinformatics research, but it was set up as a public policy institute to deal with the interpretation of the human genetic code, which means everything from comparative genomics to the social and legal implications of that work. To me, you can't do that in a vacuum. So a center of policy without it being science-driven is sort of why the ELSI [ethical, legal, and social issues] program associated with the public genome effort is basically a failure. It funded all kinds of academic esoteric discussions about whether enhancement gene therapy is moral or not, when gene therapy doesn't work. I mean, those are great discussions you can have late at night in bars, but when people are losing jobs and insurance because of a misunderstanding of genetics, I think there are far more pressing issues to deal with.
Q: You have been on Capitol Hill pressing for federal genetic privacy legislation, which I think everybody would agree is a no-brainer. Why don't we have it?
A: The Bill [H.R. 602, The Genetics Nondiscrimination in Health Insurance and Employment Act] has over 200 co-sponsors in the House, but the Republicans won't let it come up for a vote. It's hard to trace [who's responsible] as this is a very elusive kind of shadowy blockage. Why the pharmaceutical industry is against this legislation is really hard to understand, and I've asked executives at different pharmaceutical companies — they didn't even know they were against it! But their lobbyists seem to know that they're against it and some insurance companies [do too].
But even within the insurance industry, there's a debate about how useful this information could be. I think some of the insurance companies are a lot smarter than even the human genetics community about the value of the data. It can have great actuarial interpretation value assessing population risks, but that's part of what we have to understand. The risk of the population is not your risk as an individual. Our risks are binary. You either have it or you don't. The population is the statistic.
Q: Would the safeguards be effective as written? Or is this just something where we're feeling our way through?
A: We're still feeling our way through, but I think the legislation as it's written would be a huge step forward. It's slightly ironic that the Republican leadership brought me to testify on their side on a committee. I was a little bit stunned by that. What the chairman of the committee had seen were statements I [made] that we are more than the sum total of our genes, and [that] genetic determinism won't work.
So they were sure that I was going against the legislation because it's not needed. I said no, in fact, that's why we need it, because between the scientific community, the public, and the press, everything is being interpreted in that way. Prejudice is not built usually on any kind of fact basis. People can have these prejudices and be misinformed and carry off those prejudices. If you're fired and somebody didn't base it on science — you know, they were just confused about interpreting the genetic code — does that make it all right?
I argued we need legislation more because of the ignorance out there than the scientific fact. I said ultimately the scientific data will prove this is probably not necessary, but we have a huge transformation to undergo to get through that process.
Q: What's on the horizon for the Institute for Biological Energy Alternatives? You're talking about replacing fossil fuels and reducing the carbon dioxide being pumped into the atmosphere.
A: We're talking about doing the research basis of that. I mean your earlier questions, [about] would I form another company? If I knew the [research results] now, I could probably raise a billion dollars overnight for a new energy company to do that. If the science supported that, I would either do it or encourage somebody else to do it.
Q: It would do a number on the oil stock.
A: Well, because the best future we have, in my view, is developing a hydrogen-based economy. The sooner we do that, the better chance we have of not destroying the rest of the environment that we haven't destroyed yet. What portion biology can contribute to that transformation is not known. But it was being largely ignored. So we're doing some research on it.
The first project that IBEA is working on right now is the follow-up of the minimal genome project (Hutchinson, C.A., et al. "Global transposon mutagenesis and a minimal Mycoplasma genome." Science 286, 2165-9 (1999)). We're trying to make a synthetic chromosome to see if we can get to a synthetic life form as a step along the way toward having a genetically engineered species that could very efficiently produce hydrogen or capture CO2. It's truly basic science, just following along the route of trying to understand can we come up with a true definition from the gene side of life. We have to define the environment side as well. And can we, knowing that, transform a species based on that information?
I think that's essential for our basic understanding of biology. But if we can do that, it creates the basis of formally modifying organisms, a specific laboratory-based organism that couldn't grow outside the lab, in a way to produce very efficient metabolism toward whether it's capturing CO2 and replacing the petrochemical industry by using that as a carbon source for products. Dupont's done this very nicely with E. coli.
Q: Has this work been on hiatus? You assembled an ethics panel after that first Science paper.
A: Well, the ethical panel reviewed this and brought in major religions and had extensive discussions for a year and a half or two years [and] concluded they didn't see anything to block it going ahead as we proposed it. I was worried about the biological warfare implications from teaching other people how to make a synthetic species. It's not clear if we're successful that we'd necessarily publish exactly how to do that. And I had this little side project that I wanted to do called sequencing the human genome. So it went on a hiatus for those two reasons.
Q: That brings up a very interesting issue about bio-defense and the public availability of genome sequence information, such as the recent (re)sequencing of polio. Do you think information should sometimes be withheld in the interest of biodefense?
A: Well, there are some things that should definitely not be published. I sequenced the smallpox genome. And when I did that — first at NIH [National Institutes of Health] and finished at TIGR, but even before I left NIH — there was this huge discussion about whether that should be published. We had this incredible discussion in [former NIH director] Bernadine Healey's office, where people from the defense establishment and other government agencies were extremely concerned about it being published. One likened it to publishing a blueprint of an atom bomb, and they talked about putting a barbwire fence around my NIH lab. And I asked, "Is the wire going to point in or out?" Some of them are wishing they probably did that at the time. But it was a very intense discussion.
What actually swayed the day was that the Russians were also sequencing a strain of smallpox and they were going to publish their data. So the U.S. government decided, well, we're not going to be beaten by the Russians, so Venter you can go ahead and publish the genome. But, if some of this data is not published, then the only ones that have it are people that you may not necessarily want to be the only ones having this data.
What I argued very successfully with President Clinton [was that] genomic information put in the public domain can be one of the best deterrents to biological warfare if we do something with it. I urged him to start new programs, new vaccine programs, and new antibiotic programs. Various government agencies use all our microbial genomes as the basis of PCR and other detection assays.
The public just sort of assumed maybe this was going on in secret. Maybe something was there. It's very different than the scientific community being able to compare other genomes now to smallpox. There are several monkey poxes. What led me to think about synthetic genomes initially was how close Vaccinia
|"What I objected to with the poliovirus [publication is that] it was a stunt. It was not important science. Science hyped it by sending out press releases."
was to smallpox. The Vaccinia sequence was already in the public domain. I don't think we have to be afraid of the information itself, as long as we're acting responsibly as a nation, and that's what is taking so long. These are not new arguments. They didn't just happen because some kook released a little bit of anthrax. We have a situation where there are major deaths from bacterial resistance, new emerging infections, that we need better armaments, better vaccines, better antibiotics, better antivirals, and we should be using the information from microbial genomes to drive that.
That's starting to happen. A lot of money is now going in, but it's taken people being killed deliberately to provoke that. And so, genomic data should be in the public domain. What I objected to with the poliovirus [publication is that] it was a stunt. It was not important science. Science hyped it by sending out press releases. (The paper in question describes the artificial synthesis of the poliovirus genome. )
Q: I can't think of a journal doing that ...
A: It's one thing publishing it if the author stated, as he did after the fact, that he was doing this to be provocative and Science said they were publishing it to be provocative. Then that's honest. But doing it to pretend this is breakthrough science makes a mockery of the whole system. And the other thing is people didn't even read the paper: It had 1/10,000th of potency of native poliovirus. So does each molecule have 1/10,000th the potency, or do they have one out of 10,000 molecules of native poliovirus in there and they didn't even have it working in the first place?
If they were doing science, they would've at least answered those questions before it was published. It hurts the whole field when somebody does something for those reasons after we went through a two-year ethical review as to whether doing synthetic genomes made sense.
Q: You recently joined the scientific advisory board of a company, U.S. Genomics Inc., which I think is the first time you've made that kind of commitment.
A: No, I've actually joined their board [of directors].
Q: Have you been on other boards?
A: Over a long time — that's where I learned about the praying mantis syndrome. I've been on scientific advisory boards of a substantial number of biotech companies and pharmaceutical companies. ... I've avoided boards [of directors] in the past, but U.S. Genomics is a very interesting company in terms of it has a young scientist/entrepreneur [CEO and founder Eugene Chan], who is perhaps not far from having his head bit off.
Q: Well, you're on the board.
A: In fact, I think that's why I am. Both sides recognize that I have some experience that might help both of them avoid those mistakes. I mean, the major funder was the original backer of Human Genome Sciences. So, that's one reason ... Eugene's a pretty impressive guy. ...
In this [genome sequencing] field, it's so hard to tell what's hand waving and what's real. While U.S. Genomics can't sequence molecules yet, they can certainly order them and determine them, and the technology they've built to this stage is real. So that's why I decided to [join the board]. I felt that I could give them a unique contribution in terms of my experience, although I have no experience in terms of manufacturing and selling instruments. That's not what they're looking for. They're more just looking for scientific/philosophical guidance.
I think it's the uses of the technology that lead to the breakthroughs. It's because I effectively used the Applied Biosystems machine back in 1987 that they even exist as a company. For the first three years, I was the only one who could get it to work, and we published the first paper in 1987. Had we not been successful, they might have gone under a long time ago.
Q: Does your new sequencing center have a name yet?
A: In fact, we're setting that up as a not-for-profit organization that will be jointly run by TIGR, TCAG, and IBEA, and right now it's just the Joint Technology Center; we [will] have a naming contest to see what the name should be. But it's basically just to be the sequencing, informatics, and proteomics center for those three institutes. ...
TIGR will have its own informatics in terms of the annotation of genomes, but this will allow a very substantial scale up of what TIGR can do. TIGR was becoming capacity-limited for all its new grants and projects. This now gives TIGR basically unlimited capacity, while we do a million genomes and start this human sequencing.
Q: When you say, "resequencing across hundreds of individuals," are you interested in producing haplotype maps? Custom sequencing? What are the prime goals?
A: So what are the two limitations? You hit on one. It's technology. So we need faster, cheaper technology to get it so you can get your genomes done to understand your propensity for disease and some personal health planning kinds of things. The bigger problem, in my view, because I think the technology is going to happen, is we don't know how to interpret this data. If you go through some of the big debates I had early on with Watson, he wanted to turn it into just a mechanical thing of just sequencing the genome. Who cared about interpretation?
Well, that's where the intellectual value comes in. And we still don't know how to interpret it. We don't know what most of our genes do. We don't know their relationships to disease. As much as the early genome project was driven by human geneticists doing classical genetics, those tools have been essentially exhausted and they've come up short. The only way we're going to understand most human traits and the genetic components of most human disease is through large population statistics, and that's where computing comes into such an important play.
Q: What do you think of what Kari Steffanson is doing with deCODE Genetics?
A: I think he absolutely has the right idea in characterizing a large population. He's trying to do the extension of classical genetics, which is repeat mapping in the genome looking for linkage to disease in the population. And there'll be some diseases where that approach still works. But I trained as a biochemist. I think in terms of structural, functional terms, not in loose associations.
So when we sequenced the first neurotransmitter from [the] human brain, the receptor, we then went on and did extensive site-directed mutagenesis and found if you change this nucleic acid, it will change that amino acid, which will change this function. And so, that's my classical thinking. I think about the mutations ... that change protein structure or protein regulation that play a role in biological functions. Many geneticists are happy just to find somebody that knows somebody famous. Right? An association, a link because it gets you in the neighborhood. 'I have this friend that knows Madonna. So, I'm pretty cool!' I never really got into those associations.
Q: Have you thought about any special IT needs for the genome institute?
A: Yes. In fact, that is something I will be talking about more in the future. What we did with Compaq [Computer Corp.] and Celera was a critical, essential experiment for the time that there are very good reasons never to repeat again. We built a special-purpose computer for assembling the human genetic code under the conditions that we had. Even with all the 1.5 teraflops [trillion floating-point operations per second] of capacity that we had there, for many of our things we were still compute-limited. But the cost of that facility, I mean the room the computers were in was a $6-million room before the first computer went in.
Each hospital, each clinical center has to have the compute capacity to deal with its patient population's genetic codes. Not as the be-all and end-all of all medicine, but it's going to contribute to the
|"There's no reason for that. There hasn't been much of a driving force for high-performance computing in the U.S. I think biology will be the ultimate driving force."
understanding of every bit of medical care in the future. So we have to have a cheaper, replicable system. I like this notion of the green computers that the DOE is talking about, just dealing with clusters that don't need massive air conditioning, lower power consumers. I mean, just what we had to have in terms of megawatt generators to back up the [Celera] compute center. Instead of genomics driving down health-care costs, which it should do massively, if we have to replicate what Celera built in computing, that will drive up the health-care costs with computers. There's no reason for that. There hasn't been much of a driving force for high-performance computing in the U.S. I think biology will be the ultimate driving force. Q: What's the status of the Celera discovery system? Does that have a chance of succeeding now that it's inside ABI?
A: I hope it survives. There's nothing like it [anywhere else], and there will never be anything like it in the public domain because NIH can't fund things like that. I found there was such a difference in what people called academic-grade computer code and professional code. I had to learn how to do software audits and all these things [to ensure the code was] really robust and would really work, versus a scientist writing a few lines of code to sort of hack something together in his lab or her lab. There are huge differences, so without private industry, there won't be advances. I suspect that a Microsoft will come in and take over this field because of the ineptitude of the smaller businesses. Q: Have you ever talked to Bill Gates about this?
A: I have a meeting with him in the not-too- distant future to talk about this. I've heard him talk to others about this point. So, we're going to chat about it. Q: What do you think Microsoft could bring to the game that would be special or unique?
A: The ability to make it happen. You have to understand the market to own it. I mean, if your goal is to sell widgets, you go to a widget manufacturer. If your goal is to sell software and information and computer software, I can't think of a better source to do that. I'm obviously impressed with what they've accomplished. I was hoping, quite frankly, to turn Celera into that, and then it was cut short. I thought we could take them out! Q: Any reaction to Celera's decision to rip out those three-year-old Alpha systems and put in the big IBM Regatta servers?
A: Where's the camera on that one?! I could gesture! So, yes. [With] the refresh time in high-performance computing, if you can get through a two-year cycle you're doing well. So it was clearly time. I think a different question is, what could Celera possibly need a 2-teraflop computer for when they're out of the genomics business? So I think with IBM, it's probably just too little, too late, and I think Compaq made very good hay out of their cooperation with us on sequencing the genome. I think they went from being behind, to my understanding, to being sort of the dominant player in high-performance computing in the life sciences because of that association.
I didn't know how to evaluate high-end computing. I'm not a computer scientist. I'm an experimental biologist. So I set up an experiment. I had IBM and Compaq bring in their best machines, and we ran an experiment. We gave them a bacterial genome to assemble and [watched] who could do it faster. The alpha chip was able to do it about three times faster.
Q: Was it a mistake for Compaq to abandon Alpha?
A: I was personally very disappointed, but again, I'm an IT consumer. I'm not an IT expert. We could not have assembled the human genome without it, and so I developed a certain affection for the Alpha chip. If the next level is far faster and far cheaper, then I'm delighted. But the Alpha was truly a wonderful technology.
PHOTO CREDIT FURNALD/GRAY