By Kevin Davies
March 14, 2013 | Steven Salzberg, computer scientist at Johns Hopkins University, is the winner of the 2013 Benjamin Franklin Award, presented annually by the Bioinformatics Organization. (His full title is Professor of Medicine, Biostatistics, and Computer Science and Director of the Center for Computational Biology at the McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine.)
Salzberg, who beat out four other distinguished bioinformaticians in the balloting, will receive his award on April 10 during the Bio-IT World Conference in Boston. He spoke to Kevin Davies about the highlights of his lab’s research over the past 15 years and his philosophy on open-source and open access.
Bio-IT World: Steven, congratulations on the 2013 Benjamin Franklin Award. What was your reaction on hearing the news?
Salzberg: I knew I was nominated but was still rather surprised, since I’d been nominated twice before. So I was pleased and surprised!
What have been some of your lab’s signature achievements over the past 10-15 years?
Our main accomplishments are in the development of new algorithms for sequence analysis, starting with our bacterial gene finding work in the late ‘90s, still going on today with a program called GLIMMER, as well as whole-genome alignment in bacteria, which we extended to bigger genomes.
We worked on genome assembly starting in the early 2000’s... All of this has been open-source for 15 years now. Most recently, we developed a suite of tools for next-gen sequence alignment and transcriptome alignment and assembly, including Bowtie, TopHat and Cufflinks, which are all very widely used. Of course they’re all open source, which is in the spirit of the Franklin Award.
Another fairly major project I was involved in was the influenza genome sequencing project, which I co-founded with David Lipman (NCBI). We started it when I was at TIGR [The Institute for Genomic Research]… We had funding to sequence pathogens, and decided to start sequencing flu strains. Flu is a very small genome, but it’s RNA, which makes it very difficult and leads to challenges. When we started, there were only something like 5-7 completed flu genomes in Genbank. Although now more commonplace, there was nothing like it before. The idea was just to sequence as much of a population of pathogens as we can to gain a picture of how the pathogen is changing… Even though the flu is one of the most common infectious agents in the human population, it had not been sequenced very much.
We changed that dramatically. There are now 10,000 complete flu genomes sequenced. We also did something the flu genome community had not been doing—we committed to releasing all the data right away as we were sequencing. That community wasn’t doing that at all—they still don’t like it! It was actually quite a challenge to get people to provide samples for us to sequence. They generally hold onto their samples until they get funding to sequence them and then they don’t release the sequences until they’ve written every paper they possibly can.
Will this have a big impact on flu vaccine development?
It already has. It’s provided a huge resource for people who study flu and decide on the vaccine strains every year. Every year, there’s a new flu vaccine. Around February, they finalize the choice of the three strains that will go into next year’s flu vaccine. (They do it six months later for the southern hemisphere.) Now, the committee that makes this decision looks at what is known about the circulating strains out there today. The whole goal of this project was to make much better, more detailed genomic information available. So hopefully they’re able to do a better job.
You mentioned TIGR. Of course you co-authored several papers with Craig Venter. How much impact did working with him and at the Institute have on your early career?
[TIGR] was a great place to work, certainly a good move for me to go there. TIGR was a very unusual kind of institute. I was one of very few computer scientists there, but it was an exciting time. We were doing the first genomes of most pathogens. We were also a big player in the Arabidopsis genome and the human genome.
Craig is an interesting character… we got along well. Within a year of me joining TIGR, he left to launch Celera. That was an opportunity for me—I became the head of bioinformatics at TIGR… I didn’t want to be in the for-profit world. Less than a year after that, Claire Fraser became president of TIGR, and I was able to persuade Claire to make all our software open-source—over the objections of our lawyer, who did not like it.
I presume your interest in free data extends to the biomedical literature and open access?
I’m a big supporter of open access and have been a big fan since the beginning. I know [past Franklin laureates] Mike and Jonathan Eisen very well—Jonathan and I had offices next to each other for several years at TIGR. I watched the Public Library of Science grow since its first days. And I supported Biomed Central and published papers in the very first year of Genome Biology, their first and still probably their best genomics journal. So I publish as much as I can in open access journals—the only exceptions I make are, I have to admit, are for Nature and Science and the other Nature journals. It’s hard to ignore when someone is reaching a big audience—you want your papers to be read by a lot of people. Fortunately, because of the NIH policy, the papers at Nature and Science are also made open access within a year. I would rather it be immediate; I’m very sympathetic to that and support that [movement] as much as I can.
I think we’re going to win. Open access is a current battle but I have no doubt that the scientists are going to win this battle. The fact is that we write the papers, we review the papers, we actually edit the papers and the publishers do very little. Their old model, which they are clinging to and fighting tooth-and-nail, is going to be replaced by open access at some point. It’s just a matter of when.
You joined Johns Hopkins not so long ago, in 2011. What are some of the major projects going on in the lab?
The reason I moved here is that I wanted to work on human genomes and human genetics. The University of Maryland was wonderful to me, nothing but good things to say about them, but I was working at College Park, not the medical school, and when you want to work with human subject data, you almost have to be in a medical school or hospital environment. There are a lot of ethical constraints on sharing those data without outside collaborators… I want my work to have an impact on curing human diseases. I want to be as close to human data as I can.
A good example is a project led by David Valle, in which we’re doing exome sequencing of many hundreds of different patients who have genetic diseases of unknown origin, but we know they’re genetic. We’re trying to figure out what genes are responsible by sequencing their exomes. We’ve solved a few of them and there’s many more that we haven’t and we’re still trying to figure out.
Why exome rather than whole-genome sequencing? Is that just a matter of economics at the moment?
Yes, it’s exactly that. I’ve already made the argument—and will keep making it—that we should do whole genomes. Even with an exome, we’re sort of overwhelmed by how much data we get. If the mutation is in the exome, then it’s a more efficient way to find it. But we’ve already got a number of cases where we haven’t found it and the mutation is probably somewhere else…
It’s mostly cost as to why people aren’t doing whole genomes yet... [Exome sequencing] is an imperfect technology that has had some dramatic successes.
Can you give us a preview of your laureate lecture at the Bio-IT World Conference in April?
I only heard about the award a day or two ago, so I haven’t decided! I thought I should say something about open-source and open access. I might talk about gene patents, which I’ve written about recently… I published a commentary in Clinical Pharmacology & Therapeutics last year on gene patents. In that piece—“The Perils of Gene Patents”—I was making a plea directly to scientists not to file gene patents, no matter what their technology transfer office says.