Data Sharing and Training Rise to the Top in Lisbon

December 13, 2013

By Bio-IT World Staff 
 
December 13, 2013 | Lisbon welcomed attendees to the 2013 Clinical Genomics & Informatics event last week with blue skies, plenty of pasteis de nata, and excellent discourse in clinical exome sequencing, high-scale computing, RNA sequencing, and genome informatics. 
 
Trends from the Talks  
 
Many researchers called for more open data sharing. Keynote Anne Cambon-Thomsen, Director, Research, Centre National de la Recherche Scientifique (CNRS) in France, reported the global push for data sharing (See, Nature News, June 2013). The consortium now consists of 116 institutions. The key to furthering data sharing, Cambon-Thomsen said, is to incentivize it. The concept is well-discussed, but poorly done. She suggested creating a bioresource research impact factor. The methodology was published in GigaScience in May. 
 
Niklas Blomberg is director of ELIXIR, a pan-European research infrastructure for biological information based at the Wellcome Trust Genome Campus, is working on a cloud model of data sharing. He gave an update on the ELIXIR consortium, which now consists of 16 countries. The volume of life sciences data is ballooning, Blomberg said, and Europe needs a distributed infrastructure. Blomberg promoted “embassy clouds” as ways researchers across Europe could access reference data virtually as if on foreign soil. 
 
Liz Worthey, Human and Molecular Genetics Center, The Medical College of Wisconsin, warned against data sharing that is big database-focused. Many people are championing shared data, she said, saying: Share your data with me, in my big database, at my big institution. Worthey suggested that data stay where they are. Informatics tools can access data in their home locations and still enable sharing, she said. 
 
Galaxy was mentioned several times in various tracks. Gianmauro Cuccuru with CRS4 Bioinformatics Laboratory, Italy, hosts a public instance of Galaxy and has implemented a thin integration between Galaxy and Hadoop, so that his group can run Hadoop-based programs via Galaxy. Several researchers also mentioned using a Galaxy portal for NGS analysis. 
In the High Scale Computing Track, training emerged as a theme, with speakers calling for more training needed on IT infrastructure. Christophe Blanchet, CNRS, pointed out that the cloud deployments are particularly useful for training. His lab uses Galaxy for NGS analysis, and he creates a special instance for with dedicated datasets for new users to play with. 
 
Researchers reported using both whole exome and whole genome sequencing. Timothy Hubbard who recently moved from the Sanger Institute to King’s College, London, unequivocally stated: “We’re not doing exomes in the UK.” Research must focus on the whole genome sequencing, Hubbard said. “Once you go to whole genomes, you increase the discovery rate.” 
Hubbard called for increased awareness of genomic complexity. Start thinking about how you’re going to handle a more complex genome, he advised. There are many uncertainties, he said, listing long non-coding RNAs, pseudo genes, and more. We need to go much broader than the exome to capture all functionality.  
 
Other researchers still heralded the usefulness of whole exome sequencing for clinical use. It’s cheaper and we can do it. Pascal Joset, Institute of Medical Genetics, University of Zurich, Switzerland reported that whole exome sequencing delivered diagnoses for 60% of patients. Conceicao Bettencourt, University College London, presented exome sequencing of neurodegenerative disease that identified two novel pathogenetic variants. Paolo Missier, Newcastle University, United Kingdom, is using exome sequencing to study rare neurological cases, and is deploying e-Science Central, a cloud-based workflow management system. 
 
Industry Insights 
 
Several excellent industry speakers used the event to share news. Janis Landry-Lane, Program Director, World Wide High Performance Technical Computing, Life Sciences/Higher Education Segments, IBM, praised the industry for the open source software available at the app level. At the systems level, though, she stressed the importance of good middleware. 
Landry-Lane briefly mentioned IBM and CLC bio’s genomics sequencing analytics solution. The IBM Application Ready Solution for CLC bio consists of a compute cluster with optimized IBM hardware and software, CLC Genomics Server software for high-throughput sequencing, and CLC Genomics Workbench platform for data analysis and visualization. 
 
Aspera’s Michelle Munson—the first Emmy-winning speaker—told the story of Aspera’s conversation with Netflix two years ago. Netflix asked for software to get data in and out of Amazon’s S3 storage quickly. The result is Aspera on Demand, 20-100x the speed of traditional methods. Munson said. Munson gave a quick peek of Aspera Drive 1.0: a brand new product for Windows (Mac & Linux to come). Everything is available in the file explorer, Munson said. It’s a new unified sharing and collaboration platform for big data that allows transfer and synchronization of files sets of any size and any number with maximum speed and robustness at any distance, with the full access control, privacy and security of Aspera technology. 
 
James Reaney from SGI felt right at home, he said, since SGI invented the term “big data”. (Reaney showed a slide from Supercomputing 1996 as proof.) He said that the data landscape—not just data volume—is changing quickly. 70% of data generated in 2020 will be created by consumers, not researchers, he said. 80% of that data will be held by companies. For genomics workflows, Reaney said that he generally believes that UV is a better tool, but hinted that SGI will soon release a “sandbox” with UV and non-UV options. 
 
Eduardo Gonzalez-Couto, CSO of Integromics, presented OmicsOffice VariantExplorer. Based on Spotfire but adding statistical tools and other workflows, the tool lets physicians analyze genomics data interactively with lovely graphics. 
 
For play-by-play coverage, read the Storify collection or follow the event hashtag #CGIE13. Mark your calendars. The 2014 event will be held in Lisbon December 2-4, 2014.