Hip Hop Offers Lessons on Life Science Data Integration


By Salvatore Salamone Bio-IT World

Hip hop artists often combine sections of several songs to create a new piece of music. The technique is known as a mashup, since it mashes together disparate sounds from different sources into one recording.

A similar mashup technique is now getting the attention of scientists as a way to quickly bring together disparate informatics, biological, chemical, and imaging information when conducting research.

The idea behind mashups is simple: Using some relatively simple programming techniques, take information that is available on the Web or in company databases and combine the data. Thus integrated, the data may offer more insight into a problem than when kept or viewed separately.

The idea of aggregating data in this way is not new. But what is drawing attention to mashups these days is that, increasingly, public databases are making their contents available in formats that make it easier to aggregate. At the same time, some programming aids and utilities are making it easier for non-technical people to pull together this data.

Over the last six months, mashups have been getting a lot of publicity mainly due to Google, which offers an API (application programming interface) that makes it relatively easy to overlay geographical data on a map. Essentially, with Google Maps data is displayed as a virtual stick-pin on a map.

A July 2005 BusinessWeek article noted that many people were using Google Maps mashups to pull together data as varied as real estate listings and neighborhood crime statistics.

This technique was seized upon last November, when the most recent list of the world’s most powerful supercomputers was announced at the SC05 conference in Seattle. At that time, the Top500.org published its traditional top 500 list, but the group also created an interactive map displaying the location of the world’s 100 most powerful computer systems. Moving a cursor over the stick-pins on the map produces a bubble with information about the particular computer installation.

The ability to display data in this manner has many applications in the life sciences. An article titled Mashups Mix Data into Global Service in last week’s Nature (Vol. 439, January 5, 2006, p. 6-7) noted that this technique could be used to track the progression of an infectious disease or study global health and disease patterns. To emphasize this point, Nature created its own mashup tracking avian-flu outbreaks by combining information from the World Health Organization (WHO) and the UN Food and Agriculture Organization into a Google map. 

The article also stated that mashups are not limited to just aggregating geographical data onto maps. It noted that the data in many life science databases, such as GenBank, is easily accessible and could be combined with other information.

An example cited was of the mashup iSpecies.org. Upon entering a species into what looks like a regular query search line, the mashup returns a page with NCBI genomics information, Yahoo images of the species, and articles culled from Google Scholar.

A limiting factor to using mashups is that much of the data in public database is not machine-readable. Typically, a person has to manually cut and paste data from a website for it to be used by another application. This approach will not work with a mashup.

Some sites are addressing this problem (and not just for the sake of mashups) by enhancing the way data is accessed. For example, many sites are moving from traditional command line interfaces and onscreen queries to exposing a site’s data to applications via a Web services interface.

Another approach that would greatly expand the amount of data available for mashups and other applications would be to use Semantic Web technology such as RDF. Sites that publish their data in RDF format make that data computer readable. This makes the data easier to find, search, save, and access and as such, makes it easier to incorporate that data into a mashup and other application.

The combination of new tools like the Google Maps API and increased adoption of Web services and Semantic Web will give researchers new ways to view and aggregate their data in the coming year.

To that end, Web services and Semantic Web are two key IT trends that potentially will have a great impact on life sciences this year. Listen to the accompanying podcast for more on the major IT trends likely to impact drug discovery in 2006. And for those who want more details about how these technologies are being used today in major life science organizations, check out Bio-IT World’s Life Sciences Conference + Expo to be held in Boston April 3-5, 2006.

What do you think about mashups? Do you think they are just a fad? Are you using them today? What applications do you envision them being used for? Drop me a note at Salvatore_Salamone@bio-itworld.com and share your thoughts on the subject. 

Click here to login and leave a comment.  

0 Comments

Add Comment

Text Only 2000 character limit

Page 1 of 1

White Papers & Special Reports

sas whitepaper92

Managed Innovation, Assured Compliance 
sponsored by SAS
Discovery organizations are identifying a lot of promising compounds, but clinical research processes haven't kept pace with timely testing of all those potential therapies. This white paper describes how SAS® Drug Development supports true innovation across the clinical trial process.

In this white paper you will learn how to:

  • Assemble data to foster better collaboration
  • Get up-to-date information during clinical trials
  • Make informed decisions earlier in the trial process Download now 


BlueArc white paper image 1

Addressing Life Sciences Constantly Growing Data Challenges Research Environments
sponsored by BlueArc
The continued explosion of raw experimental data, the increased use of video, the growing adoption of new data retention practices, and the move to high throughput computational workflows are all placing new demands on the way life sciences organizations store and manage their data.

Download this white paper to learn about:

  • Factors driving the data explosion in the life sciences
  • New data management issues that must be addressed
  • HPC trends that are placing new demands on storage
  • Storage solution attributes that address performance, manageability, and energy efficiency. Download now 


isilon white paper

“Storage for Science – Methods for Managing Large and Rapidly Growing Data Stores in Life Science Research Environments” sponsored by Isilon
Large and rapidly growing stores of file-based and other data are a hallmark of life science research and bioinformatics. Determining how best to manage those data stores has become a significant challenge for Researchers and IT Pros alike.

This paper is intended to:

  • Provide guidance on the many storage requirements common to Life Science research;
  • Explain the evolution of modern storage architectures;
  • Summarize the major data storage architectures currently in use.

Additionally, it will present the Isilon IQ clustered storage product as a strong and flexible solution to those needs. Download now



Life Science Webcasts & Podcasts

Adobe

Hospital Paperwork No Longer Has to Be an In-patient Procedure 

Adobe podcast imageHow many times have you filled out that same patient registration form when visiting a doctor or the hospital? If you are a hospital administrator, nurse or registrar, you know that your patients and particularly your staff have managed hundreds of consent forms for medication, procedures, anesthesia, and HIPAA. Paperwork redundancy has become a significant bottleneck in the healthcare system. In this podcast, we’ll learn about how Adobe solutions for healthcare can help you streamline your paperwork and stop making paperwork an in-patient procedure.

Download Now 



More Podcasts

Job Openings

Oxford Nanopore Technologies, Oxford, UK
We seek a highly motivated individual to lead the administration, expansion and maintenance of our IT infrastructure, supporting our business operations and technological development of a DNA third generation sequencing system.  Includes administration and configuration of core corporate servers, high performance scientific computing and disk systems, security systems, network infrastructure and backups, maintenance of service levels, implementation of any IT related legal compliance issues and policies, and disaster recovery. to apply: www.nanoporetech.com/vacancies

For reprints and/or copyright permission, please contact The YGS Group, 1808 Colonial Village Lane, Lancaster, PA;

(717) 399-1900 ext. 125, or via email to Ashley.Zander@theYGSgroup.com.