November 19, 2004
| Researchers need to be able to migrate their processes from local development environments to large compute facilities. Organizations need to be able to address global computing requirements with flexible and interoperable systems. Data-processing pipelines and workflows need to be transferable and repeatable. IT needs to encourage collaboration, not isolate communities.
The emerging technologies known as Web services address portions of all these needs. They allow access to remote code at the level of a function or an object call. (This is far from a new idea: A pedantic case could be made that remote access to functions started with the first "jump" instruction in early microcode.)
At the core, Web services are really nothing more than a generic way to provide an application programming interface (API) to remote function calls in a language-independent manner. A server advertises its functions using a document in Web Services Description Language (WSDL), an XML document type. Client/server communication is entirely via documents written in Simple Object Access Protocol (SOAP), another dialect of XML.
Because the API is specified in a formal yet generic manner, and because all communications are written to a middle protocol, Web services are entirely independent of language, operating system, client, and wire protocol.
The majority of implementations use a Web server to provide access to WSDL documents, and to pass SOAP documents from client to server and back again. From our perspective, ease of use is the only reason to implement Web services over the Web. For debugging purposes, The BioTeam has used everything from Simple Mail Transport Protocol (SMTP) to flat files in moving SOAP documents from client to server and back again. Our experience has shown that this is particularly convenient when a firewall or other network barrier makes direct Web access impossible. In terms of client and server languages, there is a SOAP module or library for every language we've checked. We have experimented with Perl, Ruby, C++, C#, and Java, but there are many others.
On a recent project, we had the opportunity to add a Web services interface to an existing cluster that was already using PISE (the open-source software for building interfaces) to automatically provide Web access to a number of bioinformatics applications. Our experience with that project convinced us of the value and power provided by a formally published API backed up by a SOAP interface. Web services provided a middle ground between the raw power and matching complexity of the command line, and the powerful but sometimes limiting interface of Web pages.
SOAP and WSDL are not a panacea, and they do not completely eliminate the subtle and time-consuming process of debugging. We tested our system using a variety of client languages, which revealed differences in interpretation and implementation. Multiple return values, implicit data types and default values, large data streams, and many forms of overloading were examples that bit us. In terms of usability, the SOAP::Lite serialization from Perl is (naturally) much more forgiving about syntax than the same serialization implemented in C++.
It is not that we have found any particular client to be more mature or correct than the others; it's just that it is impossible to debug an interoperability standard in a single language, and as we move forward from "Hello World," the bugs get more difficult and more subtle. We are excited about the forthcoming version 2.0 of the WSDL specification, since it clarifies a number of these issues. It also explicitly addresses service discovery.
Public APIs like those used to implement Web services can help IT purchasers avoid being locked into a single vendor. APIs published via WSDL and SOAP will remain accessible in the face of flux in software vendors and versions. They are also amenable to integration with the similarly enabled products of other vendors. Web services can, at some level, be thought of as a "plug-and-play" protocol. The big win comes when an organization knows that most or all of its services can work together smoothly and can be combined in novel and unexpected ways. This also allows each component to be tuned and debugged independent of the others, which is always a plus for IT.
Some vendors of client and/or server software may perceive Web services as a threat to their existing client base. This is shortsighted: When clients and servers communicate through open protocols with free technology, developers can focus on providing powerful services and compelling interfaces rather than on constantly tweaking a custom API.
Workflows, Workgroups, and Taverna
Web services can be used to enable workflows because the atomic units of functionality from various systems are published in a generic, machine-readable, network-accessible format. The best workflow editors will be Web services clients. They will dynamically read interface specifications (including WSDL) to discover services, and will present them to the user in an easy-to-use format. The less static, client-side information is kept in a workflow editor, the better.
We have had success with Taverna, a free, open-source client used in the MyGrid project. While Taverna is still a little rough around the edges, it was inspiring to open up the Taverna application, point it to our WSDL document, and see glyphs appear for the various services that we had published. As the services evolved, a "reload" on the client side was all that was needed to discover new functionality and have it available to the user.
Beyond the level of a single organization, groups can use Web services technology to share resources with the world and enable processes that might literally span the globe. Several major bioinformatics groups are already providing Web services interfaces to their tools and data resources, including KEGG, EBI, and the SeqHound and BioMOBY projects. Of course, a performance penalty is associated with running one step of a process in, say, Japan and the next in the United Kingdom. However, given that the alternative is installing and maintaining every single one of the software packages locally, the network delay doesn't seem too high a cost.
As with grid computing, Web services have been the victim of marketing hyperbole. Under the hood, though, there is a rapidly maturing technology with immediate benefits to both developers and users of scientific software.
Chris Dwan is a senior consultant with The BioTeam. E-mail: firstname.lastname@example.org.