Nov 15, 2005 | A joint submission from Aaron Darling of University of Wisconsin-Madison Computer Sciences Department and Victor Ruotti of the WiCell Research Institute has been selected as the winning entry in the Bio•IT World and Orion MultiSystems Personal Supercomputing Contest.
The submitted project will perform high-throughput alignment and identification of micro-rearrangements in mammalian genomes for stem cell research. Working with Orion’s manager of application engineering Stu Jackson, the researchers are getting two weeks of run time on an Orion DS-96, a 96-node deskside cluster, to perform their analysis work. (The two-week run on the Orion DS-96 is the equivalent of about 3.68 years of CPU.)
Using the Orion DS-96 compute time, Darling and Ruotti hope to accomplish a high-throughput comparative analysis of human, mouse, and rat genomes to better understand the important genetic regulatory elements that play a role in human embryonic stem cells. Specifically, when comparing the respective genomes, they will be looking for small conserved genomic rearrangements.
“We find it particularly important to identify which portions of the human genome have been subject to past rearrangements,” the pair said in their contest submission. They note that previous comparative alignments of mammalian genomes have been done, but none has used a method sensitive enough to sequence micro-rearrangements.
To look for the rearrangements, the two researchers will use a multiple genome alignment system called Mauve, which was developed by Darling. The two note that aligning whole genomes is a fundamentally different problem than aligning short sequences. To that end, Mauve helps to construct multiple genome alignments in the presence of large-scale evolutionary events such as rearrangements. With the run on the DS-96, Ruotti hopes to look at micro-rearrangements that might play a significant role in human embryonic stem cells.
To conduct the analysis Ruotti and Darling want to carry out requires a great deal of computer processing power. Darling notes that in his lab he has a network of PCs and access to the University of Wisconsin’s Condor system, which harvests idle compute cycles for high-throughput computing projects. But with Condor being a shared resource, researchers often do not know how long or when a job will run. Worse, the large genome sequence data sets pose a particularly thorny problem for Condor, which offers limited means for data motion and locality, according to Darling. So the two entered the Personal Supercomputing Contest to get a large amount of computational time in a relatively short period of time.
The entry was chosen because the nature of the problem was a great fit for the computational power of the Orion system. “Sequence analysis tasks are ideal for the Orion platform and will really highlight the power of the DS-96,” says Jackson.
In addition to using the data for stem cell research, Darling and Ruotti plan to make the data freely available to the public.