By Laurie Goodman
July 15, 2003 | Cold Spring Harbor, N.Y. -- A guy walks into a bar and asks, “What’s the difference between a weed, a mouse, and a human?” The answer -- if it refers to total number of genes -- is “not much.” That is the result of the three-year betting pool known as Genesweep, where scientists bet on the sum total of genes in the human genome.
The winning number, announced by Ewan Birney of the European Bioinformatics Institute (EBI) during a May symposium at Cold Spring Harbor Laboratory (CSHL) on “The Genome of Homo sapiens,” came to a mere 21,000 -- far short of the conventional wisdom of the 1990s of about 100,000.
The idea for the Genesweep pool began in 2000 during the annual CSHL genetics meeting. David Stewart, executive director of meetings at CSHL, recalls that Birney was discussing the number of genes in the human genome with Francis Collins, director of the National Human Genome Research Institute, when he ran off to grab a “ratty, old lab notebook,” in which he began logging bets. Birney formally announced the genome sweepstakes during his talk, with the winner to be announced in 2003. The cost of bids rose from $1 in 2000, to $5 in 2001, to $20 last year.
The overall winner, and recipient of the prize for the closest bet in 2001, was Lee Rowen, a bioinformatician from the Institute for Systems Biology in Seattle, who predicted 25,947 genes. She collected half of the final pot of $1,140 -- and a signed copy of The Double Helix. The other half of the pot was split between Paul Dear (MRC Laboratory of Molecular Biology, Cambridge, England) who predicted 27,462 in 2000, and Olivier Jaillon (Genoscope-CNS, Paris, France) who estimated 26,500 last year.
Birney initially hesitated about announcing a winner, given the lingering uncertainty over the precise number of human genes despite the completion of the sequence last April. Bets ranged from Lee Rowen’s low bid to more than 300,000 genes. But the original rules of Genesweep dictated a winner be announced this year. Several researchers at this year’s meeting presented data showing that the number of protein-coding regions (the definition of a gene, according to Genesweep rules) was well under 25,000.
The final tally of 21,000 genes was calculated by Ensembl (www.ensembl.org), a computational suite of tools developed by Birney and colleagues at the EBI and the Wellcome Trust Sanger Institute. Ensembl includes a core database, a pipeline that can annotate raw DNA assemblies, an intuitive Web site, and a data-mining system. The system is also capable of handling extensive genome comparisons, and includes annotation on everything from human to rat to worm to mosquito.
The final gene count comes as a blow to the human ego, given that the predicted number of genes is no greater than the mustard weed Arabidopsis thaliania. The result also raises interesting questions about the origins of complexity in the human brain.
So what is the difference between a weed, a mouse, and a human? Clearly, the answer is less in the total number of genes, and more in how we use them.
Laurie Goodman is the editor of Genome Research.