Bridging Gaps with Web Services



 

November 19, 2004 | Researchers need to be able to migrate their processes from local development environments to large compute facilities. Organizations need to be able to address global computing requirements with flexible and interoperable systems. Data-processing pipelines and workflows need to be transferable and repeatable. IT needs to encourage collaboration, not isolate communities.

The emerging technologies known as Web services address portions of all these needs. They allow access to remote code at the level of a function or an object call. (This is far from a new idea: A pedantic case could be made that remote access to functions started with the first "jump" instruction in early microcode.)

At the core, Web services are really nothing more than a generic way to provide an application programming interface (API) to remote function calls in a language-independent manner. A server advertises its functions using a document in Web Services Description Language (WSDL), an XML document type. Client/server communication is entirely via documents written in Simple Object Access Protocol (SOAP), another dialect of XML.

Because the API is specified in a formal yet generic manner, and because all communications are written to a middle protocol, Web services are entirely independent of language, operating system, client, and wire protocol.

DEAN PROUDFOOT 
The majority of implementations use a Web server to provide access to WSDL documents, and to pass SOAP documents from client to server and back again. From our perspective, ease of use is the only reason to implement Web services over the Web. For debugging purposes, The BioTeam has used everything from Simple Mail Transport Protocol (SMTP) to flat files in moving SOAP documents from client to server and back again. Our experience has shown that this is particularly convenient when a firewall or other network barrier makes direct Web access impossible. In terms of client and server languages, there is a SOAP module or library for every language we've checked. We have experimented with Perl, Ruby, C++, C#, and Java, but there are many others.

On a recent project, we had the opportunity to add a Web services interface to an existing cluster that was already using PISE (the open-source software for building interfaces) to automatically provide Web access to a number of bioinformatics applications. Our experience with that project convinced us of the value and power provided by a formally published API backed up by a SOAP interface. Web services provided a middle ground between the raw power and matching complexity of the command line, and the powerful but sometimes limiting interface of Web pages.

SOAP and WSDL are not a panacea, and they do not completely eliminate the subtle and time-consuming process of debugging. We tested our system using a variety of client languages, which revealed differences in interpretation and implementation. Multiple return values, implicit data types and default values, large data streams, and many forms of overloading were examples that bit us. In terms of usability, the SOAP::Lite serialization from Perl is (naturally) much more forgiving about syntax than the same serialization implemented in C++.

It is not that we have found any particular client to be more mature or correct than the others; it's just that it is impossible to debug an interoperability standard in a single language, and as we move forward from "Hello World," the bugs get more difficult and more subtle. We are excited about the forthcoming version 2.0 of the WSDL specification, since it clarifies a number of these issues. It also explicitly addresses service discovery.


Interoperability Benefits 
Public APIs like those used to implement Web services can help IT purchasers avoid being locked into a single vendor. APIs published via WSDL and SOAP will remain accessible in the face of flux in software vendors and versions. They are also amenable to integration with the similarly enabled products of other vendors. Web services can, at some level, be thought of as a "plug-and-play" protocol. The big win comes when an organization knows that most or all of its services can work together smoothly and can be combined in novel and unexpected ways. This also allows each component to be tuned and debugged independent of the others, which is always a plus for IT.

Some vendors of client and/or server software may perceive Web services as a threat to their existing client base. This is shortsighted: When clients and servers communicate through open protocols with free technology, developers can focus on providing powerful services and compelling interfaces rather than on constantly tweaking a custom API.


Workflows, Workgroups, and Taverna 
Web services can be used to enable workflows because the atomic units of functionality from various systems are published in a generic, machine-readable, network-accessible format. The best workflow editors will be Web services clients. They will dynamically read interface specifications (including WSDL) to discover services, and will present them to the user in an easy-to-use format. The less static, client-side information is kept in a workflow editor, the better.

We have had success with Taverna, a free, open-source client used in the MyGrid project. While Taverna is still a little rough around the edges, it was inspiring to open up the Taverna application, point it to our WSDL document, and see glyphs appear for the various services that we had published. As the services evolved, a "reload" on the client side was all that was needed to discover new functionality and have it available to the user.

Beyond the level of a single organization, groups can use Web services technology to share resources with the world and enable processes that might literally span the globe. Several major bioinformatics groups are already providing Web services interfaces to their tools and data resources, including KEGG, EBI, and the SeqHound and BioMOBY projects. Of course, a performance penalty is associated with running one step of a process in, say, Japan and the next in the United Kingdom. However, given that the alternative is installing and maintaining every single one of the software packages locally, the network delay doesn't seem too high a cost.

As with grid computing, Web services have been the victim of marketing hyperbole. Under the hood, though, there is a rapidly maturing technology with immediate benefits to both developers and users of scientific software.

Chris Dwan is a senior consultant with The BioTeam. E-mail: cdwan@bioteam.net. 






White Papers & Special Reports

sgi whp 2
Managing the Modern Genomics Data Flood
Sponsored by SGI

Managing and storing the perfect storm of multi-disciplined data pouring from next generation sequencers and other omics instruments is a central challenge in life sciences. Discover in this paper how the SGI ArcFiniti storage solution, optimized for unstructured genomics and life sciences data can: 

  • Reduce costs, proactively protect data integrity, and deliver the high performance I/O required for genomics data processing and analysis.  
  • Effectively manage capacities from 156TB to 1.4PB as a disk based, integrated hardware and software platform 


sgi - whp 1
Turning Genomics Data into Practical Insight
Sponsored by SGI

With worldwide sequencing capacity approaching 13 quadrillion DNA bases annually turning genomics data into knowledge is a true computational challenge. Read this paper and learn how the SGI UV coherent shared memory platform can:  

  • Speed results time while cost competitively tackling the most difficult computational problems across all omics disciplines. 
  • Push performance by scaling to extraordinary levels, up to 256 sockets (2,560 cores, 4,096 threads) per single system (one OS image). 

Provide support for up to 16TB of coherent shared memory in a single system image enabling extreme efficiency across a wide range of compute demands. 



accerlys-logo_2012_wh
New Complimentary Market Survey…
Collaborations and Communications Within Drug Discovery Research
Sponsored by Accelrys
This survey was conducted by the Cambridge Healthtech Media Group in January, 2012. It was sponsored by Accelrys related to their HEOS initiative to gather valid information around externalizing collaborative research while improving communications in the cloud. With 310 qualified industry respondents the survey findings reveal useful usage and trends patterns.  An insightful follow-on discussion and webinar related to this survey, and the HEOS by Scynexis SaaS portal is also available on the Bio-IT World website for complementary viewing.
 


Job Openings

tessella logo 
Scientific Software Engineer
Boston MA
$70,000 to $95,000
 

Tessella delivers software engineering and consulting services to leading pharmaceutical and biotech companies. We are recruiting Software Engineersto work with skilled bioinformaticians and scientists to identify business needs and recommend and develop technical solutions. Applicants require BS, MS or PhD in bioinformatics, biology or chemistry and 2+ years of software development in either: Java, C#, C++, C or VB.NET. 

Apply at http://jobs.tessella.com   

 

oxford nanopore logo 


 Early Access Collaborations Managers
Oxford Nanopore Technologies is developing a novel technology, GridIONTM for the direct, electronic analysis of DNA/RNA and other analytes.  As the system approaches the market, we are building a team of technically knowledgeable, highly motivated candidates with excellent customer service and facilitation skills to join our company as Collaboration Managers.  This is a unique opportunity to work with world-leading genomics customers throughout the early adoption phase of a new generation of DNA sequencing technology.. This is a facilitative, enabling role with responsibility for managing technology development collaborations with key customers at leading genomics institutions.  It will include long term management of the collaboration plan and milestones and associated meetings and documentation. Click here to find out more and apply   

Oxford Nanopore's GridION technology, VP, Sales and Marketing Oxford Nanopore Technologies is a fast-moving technology company that is developing a novel electronic molecular analysis technology. The technology is adaptable for the analysis of DNA/RNA, proteins, chemicals and other molecules.  It is therefore suitable for use in a variety of markets including scientific research and clinical applications.  As the technology approaches the market, Oxford Nanopore is seeking a visionary VP of sales and marketing to join the senior team.  The candidate will embrace the opportunities afforded by entering the market with a truly disruptive technology that has the potential to expand the number of users and the variety of applications in each target market.  This is a rare opportunity to influence the commercial strategy at an early phase of its commercial lifetime, in a well funded company.  Oxford Nanopore welcomes applications from candidates with a track record of high-level strategic commercial  leadership, who wish to apply a fresh approach to existing markets.  Experience in Life Sciences/DNA sequencing is central to this role, however we will consider your application if you have experience of disruptive technologies in other related industries.  We are particularly interested in candidates with strong expertise in the use of digital technologies for sales and marketing of scientific/technical products.  Click to  Apply  


 

For reprints and/or copyright permission, please contact  Tim McLucas, (781) 972-1342, tmclucas@healthtech.com .