A Bright iDEA: Illumina Launches Data Visualization Competition

The competition’s goal is to improve scientific utility of genomics data. 

July 29, 2010 | In an effort to challenge the life sciences community to create more informative and accessible ways to look at the wealth of genomic and other life science data, Illumina is launching what it calls the iDEA (Illumina Data Excellence Award) Challenge. The initiative was first discussed at a small workshop of the Genomics Informatics Alliance, held in early May in Seattle.

Jacques Retief, Illumina’s senior manager of scientific affairs, says the goal behind the iDEA Challenge is “to find ways to improve the scientific utility of genomics data and provide an intellectual leap for the community.”

“Illumina’s technologies have the potential to help us understand diseases and the living world around us at the molecular level. Unlocking that knowledge will enable radical improvements in human health and quality of life over the next decade,” states the competition Website, “iDEA is a program designed to challenge the scientific community to develop new and creative visualization and data analysis techniques.”

The competition is open to all comers and will run for about 9 months, from June 15, 2010, when Illumina released a standard data set for the challenge, until March 15, 2011. The winners will be announced at an iDEA conference in San Diego in June 2011.

Retief says biological sciences have always had a strong observational component. “It is no coincidence we use the word insight to describe improved understanding,” he said. “In the golden era of biology, scientific breakthroughs by Mendel, Pasteur, Darwin and others were based on sharp observations using small data sets. The size and scope of modern day scientific breakthroughs have changed with the sequencing of the human genome and the development of next-generation sequencing technology. As a result, we are now entering the second golden era of biology, an era that is fueled by large amounts of data.”

Ironically, Retief says, the sheer size and complexity of modern genomic data that makes them so valuable also makes them very challenging to visualize. “The size of the data sets we are dealing with now have a closer resemblance to those generated in high energy physics than those a typical biologist is used to handling.” Take, for example, the ongoing 1,000 Genomes Project. “How do you intuitively display and compare 1,000 genomes?” asks Retief. “Our challenge to the community is to answer that very question by increasing the utility of next-gen sequencing data through creative integration and visualization.”

Software Solutions

The iDEA contest seeks to recognize creative advances in algorithms and visualization. “Algorithms are rarely in the spotlight, but a well-crafted algorithm can play a key role in integrating or improving the display of data,” says Retief. “The same can be said for data formats. The data formats common in the field today, such as BAM and FASTQ, are not particularly elegant or compact... you cannot visualize or use the data if you cannot transport or parse it efficiently.”

There will be a total of six awards divided into two categories—academic and commercial. The overall academic winner will receive a grant for $50,000 from Illumina, while the overall winner in the commercial category receives a one-year co-marketing deal with Illumina. In addition, awards will be given for most creative algorithm and most creative visualization in both categories.

Illumina will provide any interested iDEA contestant a hard drive containing a dataset derived from breast cancer cell lines, including methylation, gene expression, RNA-seq and genotyping data. The data set will include different read-lengths, paired ends on mRNA, small RNA, DNA and methylation, as well as biological information on the breast cell lines (drug response, tumor progression, etc.).

Entries will be judged by a team of five judges on magnitude of the intellectual advance over the present state of the art; novelty or uniqueness of the entry; utility to the typical scientist; intuitiveness of results presented to the user of the algorithm or visualization; and integration of data.

Retief expects the visualization category to be the most exciting aspect of the competition and to generate the most interest.

Illumina is not requiring that the entries be complete, fully functioning software. “Entries do not need to have functioning software. For the Algorithm and Visualization judging, storyboards, sketches and cartoons are all eligible,” entry details state. “Sometimes the best ideas come from unusual sources and we want to encourage the brightest minds to participate,” explains Retief. 

This article also appeared in the July-August 2010 issue of Bio-IT World Magazine. Subscriptions are free for qualifying individuals. Apply today.

