YouTube Facebook LinkedIn Google+ Twitter Xingrss  

Dealing with the Data Bonanza at Bio-IT World Europe


At the second European event, attention turned to genomes, storage, and the cloud.

By Kevin Davies

November 16, 2010 | HANNOVER—The second annual Bio-IT World Europe conference, held at 2010 BioTechnica, drew a large and enthusiastic crowd who were treated to three days of top-class presentations on multiple aspects of IT infrastructure, data storage and knowledge management.

One of the undoubted highlights of the conference was a charming presentation from clinical geneticist Marjolein Kriek (University of Leiden), who described her personal experiences as the first woman to have her genome completely sequenced. Kriek (initially chosen because her name sounds like “Crick” in Dutch) said her genome contains more than 4,500 known substitutions (including 11 nonsense) and 600 unknown substitutions (137 non-synonymous variants) in coding genes. “Are there other advantages? Yes, I have my own statue!” she joked.

Chris Dagdigian (BioTeam) said that data management is less scary in 2010 than a few years ago, but while the amount of next-generation sequencing (NGS) data that is routinely being saved off the instruments is dropping, research consumption is going up. The physical movement of data is becoming more important, with Dagdigian saying he was a fan of eSATA toasters as the fastest way to carry Terabytes of data around campus.

Guy Coates (Wellcome Trust Sanger Institute) said the Sanger was likely to have close to 15 Petabytes storage by the end of 2010 after installing about 30 new Illumina HiSeq instruments. Coates’ colleagues have experienced these data challenges before as new platforms arrive. Takeaways included the virtue of periods of “masterly inactivity,” during which no storage is bought until users have cleaned up their archives, and applying a “storage surcharge” to PIs requesting sequencing capacity to alert them to IT costs.

Jurgen Eils (LSDF, Heidelberg) is providing data to the bioinformaticians working for the International Cancer Genome Consortium. “The Large Hadron Collider produces 15 Petabytes/year. By contrast, ICGC expects to produce up to 50 Petabytes data per annum,” said Eils. He predicted his group would be dealing in Exabytes within a few years, and would have to think about new data management strategies such as those form Elixir.

Other IT infrastructure highlights were presentations from Rupert Lueck (EMBL), who is managing a lot of microscopy/movie data with an IBM blade center and NetApp NFS storage in a new 0.8 MegaWatt data center with water-cooled server rack systems. Sweden’s Ingela Nystrom (Uppmax) said UPPNEX is playing a national role connecting seven institutions and filling up to 10 TB/week, using 800,000 core hours/month, 3,000 cores, 10 TB RAM and 800 TB storage by Panasas. The system launched in March 2010. At the Erasmus Medical Center, Bert Eussen and Peter Walgemoed were using HP’s local Storage Cloud X9000 to address their data management challenges in translational research and clinical care. “Proteomics is the biggest challenge of data,” said Eussen.

New Resources

The Structural Genomics Consortium (SGC), discussed by Brian Marsden (Oxford) has deposited more than 1,100 protein structures to date and 28 percent of all novel structures since 2009, but communicating that information to the research community is a huge challenge. That is changing thanks to a new approach called iSee (http://whatisisee.org), developed with Ruben Abagyan (MolSoft), allowing annotated 3D visualizations of protein structures, dynamic animations, and growing acceptance of the content by journals such as PLoS ONE. Peer reviewers love it, said Marsden. One reportedly said, “I’m no structural biologist, but this is freakin’ sweet!”

“Consider Elastic-R as a huge jukebox,” said Karim Chime who provided a superb live demonstration of running R-based applications in the Amazon cloud and sharing the results in real time with another user on an iPad. The early applications may focus on teachers, but there are many potential life science applications.

Reinhard Schneider (EMBL) is also helping the cause of enlivening static journals with a tool called Reflect, which won the Elsevier Grand Challenge in 2009. Reflect is a plug-in that can tag proteins, genes, or small molecule names in web pages, providing a convenient summary of that molecule’s properties.

Editor’s Note: In addition to Bio-IT World Europe, CHI also held two other simultaneous conferences, PEGS Europe and Molecular Diagnostics Europe. Bio-IT World Europe 2011 will hold its third annual event on October 11-13, 2011.


This article also appeared in the November-December 2010 issue of Bio-IT World Magazine. Subscriptions are free for qualifying individuals. Apply today.
Click here to login and leave a comment.  

0 Comments

Add Comment

Text Only 2000 character limit

Page 1 of 1

For reprints and/or copyright permission, please contact  Jay Mulhern, (781) 972-1359, jmulhern@healthtech.com.