By Salvatore Salamone
Just as companies are discovering the enormous power of peer-to-peer (P2P) computing and adopting it in droves, P2P is increasingly blending with its more complex cousin, distributed computing.
That hardly means P2P computing is going away. Indeed, there are more tools than ever to help companies harness the unused computing power in their employees' PCs.
After several high-profile implementations, however, it's clear that architecture, security, and systems management issues are trickier than first thought. For example, managers often prefer to operate P2P systems as separate entities, but doing so is impossible in most deployment scenarios.
The increasing reach of P2P beyond the desktop and into distributed computing's fringe will further magnify those stumbling blocks. But that hardly will deter the rising tide of P2P implementers. Dramatic cost savings and the ability to do parallel processing — a big plus in the life sciences — will continue to drive P2P use higher.
A company with slightly fewer than 2,000 desktop computers can reap nearly 1 teraflop (one trillion floating-point operations per second) of computing capacity. That's about one-quarter of the processing power of the Advanced Simulation and Computing Program's Blue-Pacific supercomputer at the Lawrence Livermore National Laboratory. Even better, the company would capture that power from computers it already owns that sit idle at night and work at less than full capacity during the day.
Traditionally, there have been three categories of "distributed" computing:
Cluster computing: Similar machines — generally servers of similar power and configuration — are joined to form a virtual machine. Linux clusters are a good example.
Peer-to-peer: Many desktop computers are linked to aggregate processing power. The distinguishing characteristic is the machine itself, which almost exclusively is a low-power client PC. Often, the link is via the Internet.
Distributed computing: Increasingly known as grid computing, this approach connects a wide variety of computer types and computing resources, such as storage area networks, to create vast "virtual" reservoirs of computers serving geographically far-flung user communities.
Although clustering retains a distinct niche, the line between P2P and distributed computing is blurring. One reason is that many distributed computing software vendors are making it easier to incorporate a mix of PCs, Macs, Linux and Unix servers, and even high-end multiprocessor servers as nodes in a peer-to-peer computing system.
Last May, for example, distributed software vendor Platform Computing Inc. announced that its Platform LSF software would be run on Xserve, Apple Computer Inc.'s new high-end server that runs the Mac OS X server software.
Industry Groups Galvanize
The computer industry has taken notice and is trying to cater to the trend, primarily through the work of a couple of grid initiatives.
Just as Platform Computing was introducing its new commercial LSF product, the National Science Foundation (NSF) was launching NMI Release 1.0, a software and tools kit to help scientists implement distributed computing via the Internet. The package of middleware was developed as part of the NSF Middleware Initiative (NMI), launched in September 2001. At the time, the NSF committed $12 million over three years to develop advanced network services that simplify access to a wide variety of Internet services. (NMI Release 1.0 is available at www.nsf-middleware.org.)
The NMI's work is the combined effort of two groups: the Grids Research Integration Deployment and Support (GRIDS) Center and the Enterprise and Desktop Integration Technologies (EDIT) Consortium.
The GRIDS Center itself is a joint effort of the University of Southern California's Information Sciences Institute, the National Center for Supercomputing Applications at the University of Illinois, the University of Chicago, the University of California at San Diego, and the University of Wisconsin. The EDIT group is comprised of the Internet2 Consortium, the Southeastern Universities Research Association, and an association of universities and corporations called EDUCAUSE.
The package bundles distributed computing software, such as the Globus Toolkit, with security tools. Also included are tools to help a company set up and manage a distributed computing system. Specifically, the software performs functions such as resource discovery, data management, scheduling of online resources, and security. These functions are considered essential to any P2P computing deployment, and until recently, many of these functions had to be cobbled together by researchers.
The NMI recognized that utility and ease of use were key drivers in its initiative. "This is technically
|Getting Down to Business
|As distributed computing gets adopted more often, companies need more tools, including:
Access control tools that grant or deny a researcher access to a particular computer, storage capacity, or software licenses.
Business management tools that link access control and performance management tools so managers can set priorities for the use of a distributed system.
Software development kits or consulting services to port custom developed applications to the distributed computing environment.
challenging because in such systems there is a need to reach various qualities of service when the [distributed computing system] is running on top of the Internet, which might not have the features needed to support those levels of service," says NMI co-principal investigator Ian Foster, a computer science professor at the University of Chicago. He notes that adding the functions that would enable the desired qualities for a distributed system should not interfere with keeping the system simple to deploy and use.
More evidence of coalescence between P2P and distributing computing emerged earlier this year when the Peer-to-Peer Working Group (P2PWG) joined forces with the Global Grid Forum (GGF).
The P2PWG was formed to accelerate the advancement and efficient interaction of P2P computing. GGF participants come from more than 30 countries and 250 organizations, most of which are either academic institutes or government labs and agencies. The group touts itself as a "community-initiated forum" working on distributed computing technologies.
"Convergence of peer-to-peer and grid computing is a natural outcome of recent distributed systems thinking," said Colin Evans, director of distributed systems at Intel Labs, in a statement released by the GGF.
Taking on Web Services
In the past, both the NMI and the GGF saw the Internet as one of many methods for connecting peers in a P2P system. Now, both groups aim to enhance their existing P2P approaches by strengthening their support of Web services technologies.
Web services are increasingly considered the underlying network framework for accessing Internet applications. Virtually all applications, including P2P computing applications, can be Web service-enabled.
The World Wide Web Consortium is responsible for most Web services standards. Several vendors, notably Microsoft, IBM, and Sun Microsystems, are developing Web services-enabling technologies.
Any application enabled to support Web services typically uses a mix of underlying technologies, including XML, the Simple Object Access Protocol (SOAP), the Web Services Description Language (WSDL), and Universal Description, Discovery and Integration (UDDI).
XML and SOAP are used to share and exchange data as well as support communications between different pieces of a program's code, which may reside on different computers. UDDI is essentially used as a directory of Web services, and WSDL is to Web services what HTML is to Web pages — a standards-based language that defines interfaces to Web services.
P2P computing is likely to borrow heavily from all of these Web services-related technologies, but there is a catch.
Normal Web services deal with persistent services, or services that are always available. In a P2P computing environment, transient services must also be taken into account. For example, a particular P2P computation might require that a higher-end computer be available. If that specific computer were not available when the task attempted to run, the computation would fail.
Traditional Web services are not designed to take sporadic availability of a resource into account. For instance, a P2P computing application might need an analysis tool's software license to be available when a job runs, or it might require a certain amount of hard disk space to carry out a calculation.
Accommodating such transient needs in a Web services-enabled P2P computing system is difficult. In February, IBM and the Globus Project teamed up to propose a solution, and they have since suggested specifications that combine the benefits of P2P computing with Web services.
Named the Open Grid Services Architecture (OGSA), the system taps existing Web services standards for data sharing and standard application interface definitions. OGSA also leverages features of the open-source Globus Toolkit, which is considered the de facto standard for many grid protocols and services. The OGSA can make use of Globus' security and the management of transient grid services.
This summer, Globus will release an alpha version of the Globus Toolkit 3.0 based on OGSA. The new version will include the distributed computing services available in the Globus Toolkit 2.0, and integration to traditional Web services and to Sun Microsystem's development tool Java 2 Platform Enterprise Edition. It will also support a new file transfer protocol called GridFTP, which Globus says offers more reliable file transfer than standard FTP.
The importance of OGSA to P2P computing has led to the formation of a new group called the Open Grid Services Infrastructure working group, formed within the GGF to address such issues and technical challenges.
"The OGSA will make it possible to develop and integrate applications for distributed computing that move beyond the scientific and technical community to the world of business applications," says Jack Dongarra, a computer science professor at the University of Tennessee.
Ready for Prime Time
True to Dongarra's words, many life science companies turning to P2P computing say their efforts are yielding quantifiable business advantages. For instance, Entelos Inc. recently started using distributed computing software from Platform Computing (see June Bio·IT World, page 32). Alex Bangs, Entelos' chief technology officer, notes that in addition to giving internal researchers more computing power, he plans to leverage the additional computational capacity to support the work of more customers and business partners.
Novartis AG is using distributed computing software from United Devices Inc. to increase its efficiency in the early stages of drug discovery research. Specifically, Novartis plans to use existing desktop computers to increase the number of targets identified for further investigation tenfold using in silico technologies run on a distributed platform.
But as IT managers transform P2P and distributed computing systems from exclusive tools for researchers into corporate instruments to achieve business objectives, they will encounter two intimately related challenges: administration and security.
Administrative tasks often center around scheduling jobs to run on a distributed computing system, and that scheduling may require granting specific researchers or groups specific rights to use particular computers, data storage capacity, or software.
Making computers, storage, and licenses available on an individual, group, or job basis isn't trivial. P2P computer systems must have a way to authenticate users and to control what resources individual users can access.
This is an area where commercial distributed computing products have more robust features. Avaki, Blackstone Computing, Entropia, Platform Computing, and United Devices all have recently introduced new systems management tools that target the corporate manager (see May Bio·IT World, page 21).
From a corporate perspective, the key is having a tool that ties the management of the P2P system with a company's business goals. "If a group that uses 1,000 [desktop computers] today suddenly identifies a new biological target in a hot market, you want to be able to quickly shift another 1,000 machines working on something of less importance to the effort," says Martin Stuart, vice president of life sciences at Entropia.
Beware of Sloppy Security
Traditional security issues are also important. Many companies worry about guarding intellectual property in data shared on a P2P system. And IP issues can exist even if a P2P computing system is deployed within a single company. "You are still putting data out on secretaries' computers," says Raymond Lopez, an independent systems consultant.
The extent of the security problem is often debated. Says one manager, who did not wish to be identified, "In a distributed environment, even if a hacker gets all the data on one computer, it is only a small part of a big puzzle."
Others in the security industry disagree. "It might only be a small part of all the data, but the bigger security issue is that if one computer has been compromised, hackers could get to others," says Bob Bales, CEO and founder of PestPatrol Inc., whose security software company specializes in detecting and removing malicious software. "The security issue here has less to do with an individual getting access to small pieces of data than the fact that the data makes its way back to some central machine. It's possible that an unscrupulous participant might try to hijack the session to gain access to other computers in the distributed network."
Although hijacking a computer in a P2P network without the user's knowledge may seem unlikely, beware: It has already happened.
Brilliant Digital Entertainment, a Woodland Hills, Calif.-based digital advertising company, provides a popular file-sharing program, Kazaa, to anyone who wants it. Kazaa enables users to share and listen to music files, among other things.
However, unwary users who download the program often fail to read the user agreement, which states: "You hereby grant the right to access and use the unused computing power and storage space in your computer for use in distributed computing. The user acknowledges and authorizes this use w/o the right of compensation."
The Kazaa example, while extreme, illustrates a security weakness in P2P systems that managers must watch. One weapon to protect data and intellectual property is encryption technology.
Encryption can be applied when data passes between computers, when it resides on a computer, or in both cases. For instance, the data used in the Oxford Anthrax research peer-to-peer project (see March Bio·IT World, page 12) were encrypted both in transit and while they resided on the computer doing the calculations.
To some extent, the strength of security measures implemented should correspond to the value of data and the potential damage incurred if proprietary information is disclosed. For example, a public project whose goal is to speed cancer research might not require more than a simple user name and password.
However, as life science companies move more sophisticated computing tasks to P2P platforms, they will need to take a more rigorous approach to managing both performance and security. And as with many areas of the rapidly evolving world of P2P computing, the tools are emerging to make that possible.
Read "Learning to Love Linux"
ILLUSTRATION BY JIM FRAZIER; PHOTO BY KALIM BHATTI