By Salvatore Salamone
April 7, 2002 | An open source, distributed data exchange technology called the Distributed Annotation System is gaining popularity among genomic researchers.
“DAS is the Napster of genomics research,” says Manuel Glynias, senior vice president of strategic planning at LION bioscience AG. “We’re collecting more and more data, and it’s harder to get information out of it. It’s harder to hook everything together,” Glynias said during a presentation at the recent Scientific American BioSilico 2002 .
That’s where DAS comes in, allowing a single computer to gather information from multiple, geographically dispersed annotation databases, collate the information, and display it to the researcher in a single, integrated view, says Lincoln Stein, co-writer of the DAS specification and a researcher at the Cold Spring Harbor Laboratory in New York.
DAS was developed by Stein and fellow genomic researchers Robin Dowell and Sean Eddy about 2 to 3 years ago. Like most open source software, multiple developers add enhancements to DAS all the time. Currently, developers share information about DAS through the group biodas.org.
DAS’ usefulness and popularity are driven by the growth in genomics research, which generates untold amounts of data.
Researchers exploring a particular gene make annotations about the structure of the gene and how that gene might interact with other structures. For large scale projects, much of the work is distributed among many labs. That was the case with the Human Genome Project, but it also happens in large research organizations as well.
The information generated by researchers is stored in a handful of centralized databases, all of which use different formats for their data.
There are two problems with this approach. First, large centralized databases sometimes do not scale well, placing a burden on administrators’ databases charged with archiving and storing the annotation information.
Second, much of the information researchers note in their work can go undiscovered because of the volume of data and because there is no easy way for others to search and retrieve the annotations. “It’s harder to see the forest because the trees are growing at some outrageous rate,” says Glynias.
DAS overcomes these problems by using a distributed architecture, one that allows for the search and retrieve of annotation information.
To illustrate the complexity researchers face, Stein cites the example of weekly submissions of sequences to GenBank/European Molecular Biology Laboratory (EMBL). Typically, a researcher would have to write a script to get the relevant data from one Web site, write another script to parse the data into an appropriate format, and then write a third script to input the data into a private database.
In this process, researchers all write different scripts and none of them work together. This overall process is causing many to look to open source data exchange solutions, of which DAS is just one of many.
Today, DAS servers are operating to support some large-scale public genomics research projects at Ensembl; The Institute for Genomic Research; University of California at Santa Cruz; WormBase; and FlyBase.
But DAS is also gaining support within corporations. “You will see DAS servers being deployed in companies,” says Glynias. He notes it helps with the data exchange process and it lets the annotation be done by the people who know what they are doing, instead of a third party entering the annotations.