GenBank Celebrates 25th Anniversary



By Kevin Davies

May 12, 2008 | Any conference that brings together the likes of Nobel laureates Sydney Brenner and Rich Roberts, Craig Venter, and Francis Collins, is worth a look. When it marks the 25th anniversary of a resource as valuable as GenBank, it proved irresistible*.

"It's hard to imagine where we would be without the dedication of GenBank," said Collins. GenBank was created in 1982, and moved to its present home at the National Center for Biotechnology Information (NCBI) ten years later. Exchanging data on a daily basis with the European Molecular Biology Laboratory (EMBL) and the DNA Data Bank of Japan (DDBJ), GenBank is a critical, yet largely overlooked, component of the biological enterprise.

A procession of speakers paid tribute to the tireless work of NCBI director David Lipman, James Ostell, and their colleagues. Remarkably, the number of staff overseeing Genbank has hardly changed over the past 15 years, despite the exponential growth in data. Following Moore's Law, the volume of data doubles every 18 months. Today, GenBank contains more than 110 million sequences and 200 billion bases from 260,000 organisms.

Craig Venter is one of GenBank's most prolific depositors. He essentially launched the expressed sequence tag (EST) database when he catalogued the first 331 back in 1991. By 2001, there were some 10 million. Today, that number is close to 51 million, including more than 8 million human sequences.

Venter - who quipped that his institute was selling its old ABI capillary sequencing instruments for a bargain $50,000 - said that both his effort and the international genome sequencing consortium "were dramatically flawed. We thought they'd have the same sets of genes, and only 0.1% difference." That proved to be a stark underestimate. Venter also ruefully remembered that Celera sequenced DNA from five individuals (including himself). "Had we sequenced from one individual, we'd have gotten the right answer," about genome variation, he said.

Information Crisis
Sydney Brenner said there is an information crisis in biology that needs to be solved. "Data is not enough - we have to convert the data into knowledge. You get credit for collecting data, credit for distributing data, but nobody gets credit for organizing data. The task we have now, I think, is not to lose this tremendous capital that has been accumulated, which will be forgotten."

Brenner said it was "a scandal" that most research papers don't cite papers further than the mid 1990s. "We must not lose all the information that has been gathered," said Brenner, before adding mischievously, "We could get grants to rediscover it all again." His goal, he said, was to "turn GenBank from a bank into an organization where [scientists] can make withdrawals with interest."

"Most biology today is low input, high throughput, no output biology," said Brenner. "The idea [that] we'll dissect [cellular] complexity by making lots of measurements is bound to fail... Everyone's hoping for a magic computer program - experimental data, pharmacogenomics data, the whole lot - and it will come out with the answer. That's a vague hope. Because I have to tell you, computers are incredibly stupid! It's better to combine human intelligence with artificial stupidity than the other way around."

Glorious Time
Collins recalled assembling the longest gene region on paper - 40,000 bases - back in 1984. He paid his then 14-year-old daughter $2/hour for proof reading. Having led the assembly of the human genome, he offered a few "notes from the frontlines." Collins cited areas such as comparative genomics, the Cancer Genome Atlas, the 1000 Genomes, and the ENCODE projects as exciting areas. "We may get to the $1000 genome much sooner than the 7-8 years that people have been predicting," he said. No kidding!

Progress in mapping genes for common disease has made for "a glorious 18 months," said Collins. The HapMap project helped us understand genetic variation across the genome, enabling 500,000 SNPs to serve as proxies for the rest. Of course, the precipitous drop in genotyping costs, from 50 cents in 2002 to 0.1 cent today, hasn't hurt.

In Collins' own field of type 2 diabetes, the field has "moved into totally new territory," with the identification of 16 new gene loci. Many of these rare variants confer only a modest odds ratio, suggesting these loci would not make good drug targets for the broader population. But Collins noted that of the first ten diabetes genes identified, two (KCNJ11, PPARG) are mainstays of diabetic therapy. A new NCBI database, dbGaP (Genotype and Phenotype), would prove further stimulus to the collaborative analysis of vast genome-wide mapping data.
________________________________

*GenBank Celebrates 25 Years of Service, NIH, April 7-8, 2008. Video Webcast: www.tech-res.com/GenBank25

___________________________________________________

 This article appeared in Bio-IT World Magazine.
Subscriptions are free for qualifying individuals.  Apply Today.


 

Click here to login and leave a comment.  

0 Comments

Add Comment

Text Only 2000 character limit

Page 1 of 1



White Papers & Special Reports

sgi - whp 1
Turning Genomics Data into Practical Insight
Sponsored by SGI

With worldwide sequencing capacity approaching 13 quadrillion DNA bases annually turning genomics data into knowledge is a true computational challenge. Read this paper and learn how the SGI UV coherent shared memory platform can:  

  • Speed results time while cost competitively tackling the most difficult computational problems across all omics disciplines. 
  • Push performance by scaling to extraordinary levels, up to 256 sockets (2,560 cores, 4,096 threads) per single system (one OS image). 

Provide support for up to 16TB of coherent shared memory in a single system image enabling extreme efficiency across a wide range of compute demands. 



accerlys-logo_2012_wh
New Complimentary Market Survey…
Collaborations and Communications Within Drug Discovery Research
Sponsored by Accelrys
This survey was conducted by the Cambridge Healthtech Media Group in January, 2012. It was sponsored by Accelrys related to their HEOS initiative to gather valid information around externalizing collaborative research while improving communications in the cloud. With 310 qualified industry respondents the survey findings reveal useful usage and trends patterns.  An insightful follow-on discussion and webinar related to this survey, and the HEOS by Scynexis SaaS portal is also available on the Bio-IT World website for complementary viewing.
 


Job Openings

tessella logo 
Scientific Software Engineer
Boston MA
$70,000 to $95,000
 

Tessella delivers software engineering and consulting services to leading pharmaceutical and biotech companies. We are recruiting Software Engineersto work with skilled bioinformaticians and scientists to identify business needs and recommend and develop technical solutions. Applicants require BS, MS or PhD in bioinformatics, biology or chemistry and 2+ years of software development in either: Java, C#, C++, C or VB.NET. 

Apply at http://jobs.tessella.com   

 

oxford nanopore logo 


 Early Access Collaborations Managers
Oxford Nanopore Technologies is developing a novel technology, GridIONTM for the direct, electronic analysis of DNA/RNA and other analytes.  As the system approaches the market, we are building a team of technically knowledgeable, highly motivated candidates with excellent customer service and facilitation skills to join our company as Collaboration Managers.  This is a unique opportunity to work with world-leading genomics customers throughout the early adoption phase of a new generation of DNA sequencing technology.. This is a facilitative, enabling role with responsibility for managing technology development collaborations with key customers at leading genomics institutions.  It will include long term management of the collaboration plan and milestones and associated meetings and documentation. Click here to find out more and apply   

Oxford Nanopore's GridION technology, VP, Sales and Marketing Oxford Nanopore Technologies is a fast-moving technology company that is developing a novel electronic molecular analysis technology. The technology is adaptable for the analysis of DNA/RNA, proteins, chemicals and other molecules.  It is therefore suitable for use in a variety of markets including scientific research and clinical applications.  As the technology approaches the market, Oxford Nanopore is seeking a visionary VP of sales and marketing to join the senior team.  The candidate will embrace the opportunities afforded by entering the market with a truly disruptive technology that has the potential to expand the number of users and the variety of applications in each target market.  This is a rare opportunity to influence the commercial strategy at an early phase of its commercial lifetime, in a well funded company.  Oxford Nanopore welcomes applications from candidates with a track record of high-level strategic commercial  leadership, who wish to apply a fresh approach to existing markets.  Experience in Life Sciences/DNA sequencing is central to this role, however we will consider your application if you have experience of disruptive technologies in other related industries.  We are particularly interested in candidates with strong expertise in the use of digital technologies for sales and marketing of scientific/technical products.  Click to  Apply  


 

For reprints and/or copyright permission, please contact  Tim McLucas, (781) 972-1342, tmclucas@healthtech.com .