Hip Hop Offers Lessons on Life Science Data Integration



Hip hop artists often combine sections of several songs to create a new piece of music. The technique is known as a mashup, since it mashes together disparate sounds from different sources into one recording.

A similar mashup technique is now getting the attention of scientists as a way to quickly bring together disparate informatics, biological, chemical, and imaging information when conducting research.

The idea behind mashups is simple: Using some relatively simple programming techniques, take information that is available on the Web or in company databases and combine the data. Thus integrated, the data may offer more insight into a problem than when kept or viewed separately.

The idea of aggregating data in this way is not new. But what is drawing attention to mashups these days is that, increasingly, public databases are making their contents available in formats that make it easier to aggregate. At the same time, some programming aids and utilities are making it easier for non-technical people to pull together this data.

Over the last six months, mashups have been getting a lot of publicity mainly due to Google, which offers an API (application programming interface) that makes it relatively easy to overlay geographical data on a map. Essentially, with Google Maps data is displayed as a virtual stick-pin on a map.

A July 2005 BusinessWeek article noted that many people were using Google Maps mashups to pull together data as varied as real estate listings and neighborhood crime statistics.

This technique was seized upon last November, when the most recent list of the world’s most powerful supercomputers was announced at the SC05 conference in Seattle. At that time, the Top500.org published its traditional top 500 list, but the group also created an interactive map displaying the location of the world’s 100 most powerful computer systems. Moving a cursor over the stick-pins on the map produces a bubble with information about the particular computer installation.

The ability to display data in this manner has many applications in the life sciences. An article titled Mashups Mix Data into Global Service in last week’s Nature (Vol. 439, January 5, 2006, p. 6-7) noted that this technique could be used to track the progression of an infectious disease or study global health and disease patterns. To emphasize this point, Nature created its own mashup tracking avian-flu outbreaks by combining information from the World Health Organization (WHO) and the UN Food and Agriculture Organization into a Google map. 

The article also stated that mashups are not limited to just aggregating geographical data onto maps. It noted that the data in many life science databases, such as GenBank, is easily accessible and could be combined with other information.

An example cited was of the mashup iSpecies.org. Upon entering a species into what looks like a regular query search line, the mashup returns a page with NCBI genomics information, Yahoo images of the species, and articles culled from Google Scholar.

A limiting factor to using mashups is that much of the data in public database is not machine-readable. Typically, a person has to manually cut and paste data from a website for it to be used by another application. This approach will not work with a mashup.

Some sites are addressing this problem (and not just for the sake of mashups) by enhancing the way data is accessed. For example, many sites are moving from traditional command line interfaces and onscreen queries to exposing a site’s data to applications via a Web services interface.

Another approach that would greatly expand the amount of data available for mashups and other applications would be to use Semantic Web technology such as RDF. Sites that publish their data in RDF format make that data computer readable. This makes the data easier to find, search, save, and access and as such, makes it easier to incorporate that data into a mashup and other application.

The combination of new tools like the Google Maps API and increased adoption of Web services and Semantic Web will give researchers new ways to view and aggregate their data in the coming year.

To that end, Web services and Semantic Web are two key IT trends that potentially will have a great impact on life sciences this year. Listen to the accompanying podcast for more on the major IT trends likely to impact drug discovery in 2006. And for those who want more details about how these technologies are being used today in major life science organizations, check out Bio-IT World’s Life Sciences Conference + Expo to be held in Boston April 3-5, 2006.

What do you think about mashups? Do you think they are just a fad? Are you using them today? What applications do you envision them being used for? Drop me a note at Salvatore_Salamone@bio-itworld.com and share your thoughts on the subject. 

Click here to login and leave a comment.  

0 Comments

Add Comment

Text Only 2000 character limit

Page 1 of 1



White Papers & Special Reports

sgi - whp 1
Turning Genomics Data into Practical Insight
Sponsored by SGI

With worldwide sequencing capacity approaching 13 quadrillion DNA bases annually turning genomics data into knowledge is a true computational challenge. Read this paper and learn how the SGI UV coherent shared memory platform can:  

  • Speed results time while cost competitively tackling the most difficult computational problems across all omics disciplines. 
  • Push performance by scaling to extraordinary levels, up to 256 sockets (2,560 cores, 4,096 threads) per single system (one OS image). 

Provide support for up to 16TB of coherent shared memory in a single system image enabling extreme efficiency across a wide range of compute demands. 



accerlys-logo_2012_wh
New Complimentary Market Survey…
Collaborations and Communications Within Drug Discovery Research
Sponsored by Accelrys
This survey was conducted by the Cambridge Healthtech Media Group in January, 2012. It was sponsored by Accelrys related to their HEOS initiative to gather valid information around externalizing collaborative research while improving communications in the cloud. With 310 qualified industry respondents the survey findings reveal useful usage and trends patterns.  An insightful follow-on discussion and webinar related to this survey, and the HEOS by Scynexis SaaS portal is also available on the Bio-IT World website for complementary viewing.
 


Job Openings

tessella logo 
Scientific Software Engineer
Boston MA
$70,000 to $95,000
 

Tessella delivers software engineering and consulting services to leading pharmaceutical and biotech companies. We are recruiting Software Engineersto work with skilled bioinformaticians and scientists to identify business needs and recommend and develop technical solutions. Applicants require BS, MS or PhD in bioinformatics, biology or chemistry and 2+ years of software development in either: Java, C#, C++, C or VB.NET. 

Apply at http://jobs.tessella.com   

 

oxford nanopore logo 


 Early Access Collaborations Managers
Oxford Nanopore Technologies is developing a novel technology, GridIONTM for the direct, electronic analysis of DNA/RNA and other analytes.  As the system approaches the market, we are building a team of technically knowledgeable, highly motivated candidates with excellent customer service and facilitation skills to join our company as Collaboration Managers.  This is a unique opportunity to work with world-leading genomics customers throughout the early adoption phase of a new generation of DNA sequencing technology.. This is a facilitative, enabling role with responsibility for managing technology development collaborations with key customers at leading genomics institutions.  It will include long term management of the collaboration plan and milestones and associated meetings and documentation. Click here to find out more and apply   

Oxford Nanopore's GridION technology, VP, Sales and Marketing Oxford Nanopore Technologies is a fast-moving technology company that is developing a novel electronic molecular analysis technology. The technology is adaptable for the analysis of DNA/RNA, proteins, chemicals and other molecules.  It is therefore suitable for use in a variety of markets including scientific research and clinical applications.  As the technology approaches the market, Oxford Nanopore is seeking a visionary VP of sales and marketing to join the senior team.  The candidate will embrace the opportunities afforded by entering the market with a truly disruptive technology that has the potential to expand the number of users and the variety of applications in each target market.  This is a rare opportunity to influence the commercial strategy at an early phase of its commercial lifetime, in a well funded company.  Oxford Nanopore welcomes applications from candidates with a track record of high-level strategic commercial  leadership, who wish to apply a fresh approach to existing markets.  Experience in Life Sciences/DNA sequencing is central to this role, however we will consider your application if you have experience of disruptive technologies in other related industries.  We are particularly interested in candidates with strong expertise in the use of digital technologies for sales and marketing of scientific/technical products.  Click to  Apply  


 

For reprints and/or copyright permission, please contact  Tim McLucas, (781) 972-1342, tmclucas@healthtech.com .