GenBank Celebrates 25th Anniversary


By Kevin Davies

May 12, 2008 | Any conference that brings together the likes of Nobel laureates Sydney Brenner and Rich Roberts, Craig Venter, and Francis Collins, is worth a look. When it marks the 25th anniversary of a resource as valuable as GenBank, it proved irresistible*.

"It's hard to imagine where we would be without the dedication of GenBank," said Collins. GenBank was created in 1982, and moved to its present home at the National Center for Biotechnology Information (NCBI) ten years later. Exchanging data on a daily basis with the European Molecular Biology Laboratory (EMBL) and the DNA Data Bank of Japan (DDBJ), GenBank is a critical, yet largely overlooked, component of the biological enterprise.

A procession of speakers paid tribute to the tireless work of NCBI director David Lipman, James Ostell, and their colleagues. Remarkably, the number of staff overseeing Genbank has hardly changed over the past 15 years, despite the exponential growth in data. Following Moore's Law, the volume of data doubles every 18 months. Today, GenBank contains more than 110 million sequences and 200 billion bases from 260,000 organisms.

Craig Venter is one of GenBank's most prolific depositors. He essentially launched the expressed sequence tag (EST) database when he catalogued the first 331 back in 1991. By 2001, there were some 10 million. Today, that number is close to 51 million, including more than 8 million human sequences.

Venter - who quipped that his institute was selling its old ABI capillary sequencing instruments for a bargain $50,000 - said that both his effort and the international genome sequencing consortium "were dramatically flawed. We thought they'd have the same sets of genes, and only 0.1% difference." That proved to be a stark underestimate. Venter also ruefully remembered that Celera sequenced DNA from five individuals (including himself). "Had we sequenced from one individual, we'd have gotten the right answer," about genome variation, he said.

Information Crisis
Sydney Brenner said there is an information crisis in biology that needs to be solved. "Data is not enough - we have to convert the data into knowledge. You get credit for collecting data, credit for distributing data, but nobody gets credit for organizing data. The task we have now, I think, is not to lose this tremendous capital that has been accumulated, which will be forgotten."

Brenner said it was "a scandal" that most research papers don't cite papers further than the mid 1990s. "We must not lose all the information that has been gathered," said Brenner, before adding mischievously, "We could get grants to rediscover it all again." His goal, he said, was to "turn GenBank from a bank into an organization where [scientists] can make withdrawals with interest."

"Most biology today is low input, high throughput, no output biology," said Brenner. "The idea [that] we'll dissect [cellular] complexity by making lots of measurements is bound to fail... Everyone's hoping for a magic computer program - experimental data, pharmacogenomics data, the whole lot - and it will come out with the answer. That's a vague hope. Because I have to tell you, computers are incredibly stupid! It's better to combine human intelligence with artificial stupidity than the other way around."

Glorious Time
Collins recalled assembling the longest gene region on paper - 40,000 bases - back in 1984. He paid his then 14-year-old daughter $2/hour for proof reading. Having led the assembly of the human genome, he offered a few "notes from the frontlines." Collins cited areas such as comparative genomics, the Cancer Genome Atlas, the 1000 Genomes, and the ENCODE projects as exciting areas. "We may get to the $1000 genome much sooner than the 7-8 years that people have been predicting," he said. No kidding!

Progress in mapping genes for common disease has made for "a glorious 18 months," said Collins. The HapMap project helped us understand genetic variation across the genome, enabling 500,000 SNPs to serve as proxies for the rest. Of course, the precipitous drop in genotyping costs, from 50 cents in 2002 to 0.1 cent today, hasn't hurt.

In Collins' own field of type 2 diabetes, the field has "moved into totally new territory," with the identification of 16 new gene loci. Many of these rare variants confer only a modest odds ratio, suggesting these loci would not make good drug targets for the broader population. But Collins noted that of the first ten diabetes genes identified, two (KCNJ11, PPARG) are mainstays of diabetic therapy. A new NCBI database, dbGaP (Genotype and Phenotype), would prove further stimulus to the collaborative analysis of vast genome-wide mapping data.
________________________________

*GenBank Celebrates 25 Years of Service, NIH, April 7-8, 2008. Video Webcast: www.tech-res.com/GenBank25

___________________________________________________

 This article appeared in Bio-IT World Magazine.
Subscriptions are free for qualifying individuals.  Apply Today.


 

Click here to login and leave a comment.  

0 Comments

Add Comment

Text Only 2000 character limit

Page 1 of 1

White Papers & Special Reports

definiens briefingon-76Next-Generation Technologies Revolutionizing Oncology and Diagnostics
underwritten by Definiens

This “Briefing On” collection of Bio-IT World features, commentaries and analysis, presents some of the latest thinking on high-throughput technologies that are being applied to the fields of research and drug discovery, with particular emphasis on oncology, diagnostics and imaging technologies. Download now at no charge compliments of the underwriting sponsor, Definiens. Download This Free Paper



gq nxt gen seq

This Bio•IT World Briefing On “Next-Generation Sequencing,” underwritten by GenomeQuest, Inc.,
presents a selection of feature stories, interviews,commentaries, conference reports, and editorials on the emergence, opportunities, and challenges posed by high-throughput sequencing. Covered in this collection: the launch of new platforms from Applied Biosystems and Helicos; new applications of nextgen sequencing; the rise of personal genomics; and informatics solutions to vexing problem of managing the vast volumes of next-gen data.  Download now 



Life Science Webcasts & Podcasts

GenoLogicsgenologics 2 translational
Enabling Translational Research Informatics

Learn about the challenges facing life sciences research labs to manage their translational research data:

  • The trends for organizations to adopt informatics solutions for translational research.
  • The unique requirements with managing complex data and workflow.
  • What labs should consider when reviewing informatics solutions for translational research.
  • Which life sciences research organizations are successfully adopting an informatics solution.

Download Now



More Podcasts

Job Openings

Assistant Editor (Science Writer)~Cambridge Healthtech Institute (CHI), Needham, MA, 
Cambridge Healthtech Institute seeks an assistant editor (science writer) who is an ambitious, dependable journalist who can fulfill a range of writing and editorial duties for a series of eNewsletters covering various aspects of the biopharmaceutical industry in addition to CHI’s flagship publication, Bio-IT World magazine.  This is a superb opportunity to make important contributions to the growth and success of a multimedia science publishing group, while gaining invaluable experience in multiple facets of the publishing industry.   Interested candidates should submit a cover letter, including 3 writing samples (attached in Word or PDF format), salary history or requirements, and resume to kdavies@healthtech.com. 

Fred Hutchinson Cancer Research Center: IT Business Analyst III
The Hutchinson Center is the only National Cancer Institute-designated comprehensive cancer center in the Pacific Northwest. Through our Tumor Research Initiative, we are finding new ways to detect tumors at an early stage.  We are presently seeking an experienced IT Business Analyst to assess technology needs for the Tumor Research Initiative, and to identify and design improvements to computer based systems.  For more information please visit www.fhcrc.org and search for Job# AD-21465





For reprints and/or copyright permission, please contact RMS, 1808 Colonial Village Lane, Lancaster, PA;

(717) 399-1900 ext 100 or via email to bio-itworld@theygsgroup.com.