YouTube Facebook LinkedIn Google+ Twitter Xingrss  

Hip Hop Offers Lessons on Life Science Data Integration


Hip hop artists often combine sections of several songs to create a new piece of music. The technique is known as a mashup, since it mashes together disparate sounds from different sources into one recording.

A similar mashup technique is now getting the attention of scientists as a way to quickly bring together disparate informatics, biological, chemical, and imaging information when conducting research.

The idea behind mashups is simple: Using some relatively simple programming techniques, take information that is available on the Web or in company databases and combine the data. Thus integrated, the data may offer more insight into a problem than when kept or viewed separately.

The idea of aggregating data in this way is not new. But what is drawing attention to mashups these days is that, increasingly, public databases are making their contents available in formats that make it easier to aggregate. At the same time, some programming aids and utilities are making it easier for non-technical people to pull together this data.

Over the last six months, mashups have been getting a lot of publicity mainly due to Google, which offers an API (application programming interface) that makes it relatively easy to overlay geographical data on a map. Essentially, with Google Maps data is displayed as a virtual stick-pin on a map.

A July 2005 BusinessWeek article noted that many people were using Google Maps mashups to pull together data as varied as real estate listings and neighborhood crime statistics.

This technique was seized upon last November, when the most recent list of the world’s most powerful supercomputers was announced at the SC05 conference in Seattle. At that time, the Top500.org published its traditional top 500 list, but the group also created an interactive map displaying the location of the world’s 100 most powerful computer systems. Moving a cursor over the stick-pins on the map produces a bubble with information about the particular computer installation.

The ability to display data in this manner has many applications in the life sciences. An article titled Mashups Mix Data into Global Service in last week’s Nature (Vol. 439, January 5, 2006, p. 6-7) noted that this technique could be used to track the progression of an infectious disease or study global health and disease patterns. To emphasize this point, Nature created its own mashup tracking avian-flu outbreaks by combining information from the World Health Organization (WHO) and the UN Food and Agriculture Organization into a Google map. 

The article also stated that mashups are not limited to just aggregating geographical data onto maps. It noted that the data in many life science databases, such as GenBank, is easily accessible and could be combined with other information.

An example cited was of the mashup iSpecies.org. Upon entering a species into what looks like a regular query search line, the mashup returns a page with NCBI genomics information, Yahoo images of the species, and articles culled from Google Scholar.

A limiting factor to using mashups is that much of the data in public database is not machine-readable. Typically, a person has to manually cut and paste data from a website for it to be used by another application. This approach will not work with a mashup.

Some sites are addressing this problem (and not just for the sake of mashups) by enhancing the way data is accessed. For example, many sites are moving from traditional command line interfaces and onscreen queries to exposing a site’s data to applications via a Web services interface.

Another approach that would greatly expand the amount of data available for mashups and other applications would be to use Semantic Web technology such as RDF. Sites that publish their data in RDF format make that data computer readable. This makes the data easier to find, search, save, and access and as such, makes it easier to incorporate that data into a mashup and other application.

The combination of new tools like the Google Maps API and increased adoption of Web services and Semantic Web will give researchers new ways to view and aggregate their data in the coming year.

To that end, Web services and Semantic Web are two key IT trends that potentially will have a great impact on life sciences this year. Listen to the accompanying podcast for more on the major IT trends likely to impact drug discovery in 2006. And for those who want more details about how these technologies are being used today in major life science organizations, check out Bio-IT World’s Life Sciences Conference + Expo to be held in Boston April 3-5, 2006.

What do you think about mashups? Do you think they are just a fad? Are you using them today? What applications do you envision them being used for? Drop me a note at Salvatore_Salamone@bio-itworld.com and share your thoughts on the subject. 

Click here to login and leave a comment.  

0 Comments

Add Comment

Text Only 2000 character limit

Page 1 of 1

For reprints and/or copyright permission, please contact  Jay Mulhern, (781) 972-1359, jmulhern@healthtech.com.