By Salvatore Salamone
May 7, 2002 | Corporate IT networks are like officials in sporting events. They go unnoticed if everything is OK. But if a problem arises, all attention is focused on them. Any complaint — slow data retrieval, inability to share information with another researcher, lack of access to an application — puts an unwanted spotlight on the network and the staff that manages it.
That is why network administrators spend a lot of time and effort designing and maintaining networks. The goal is to give users access to whatever networked computing resources they need, and at the same time make sure networks can accommodate the traffic generated by the applications researchers are using.
These tasks are formidable in any corporate setting. But life science applications have two major traits that pose specific networking challenges not typically encountered in most corporations, or even in some other sciences.
First, there is
|A Basic Network Infrastructure
|A look at a typical life sciences company network.
the huge amount of data in life science research. Naturally, data must be stored, retrieved, and moved in an efficient and timely manner. This has a profound impact on the way a life science network is designed.
Second, many life science applications require their voluminous data to be an intimate part of calculations, something that is not common in other sciences. For instance, in many physics applications data are used to baseline a computational experiment, and additional data are not required at every computational step of the way.
"When supercomputers are used in other sciences, they tend to have a lot of data in memory — archived data," says Bill Blake, vice president of high-performance technical computing at Compaq Computer Corp. "These applications are not used to the flood of data that is the norm in life sciences. In the life sciences, you have supercomputing combined with high data throughput."
Blake notes that applications in other sciences often require data about some initial state of a system. This information is stored in memory, and calculations are performed that simulate how a system behaves from that starting point. For example, an atmospheric modeling program might start with a complete set of temperatures, winds, and particle densities and calculate how these properties change with time under certain atmospheric conditions.
With life science applications, new data are constantly being pulled into calculations. For example, a virtual screening application that performs 3-D molecule fitting requires that billions of candidate molecules be used as input to the computer model that simulates the real-world biology.
How does this impact a network? Plenty. In most life science companies, computer networks are ever-evolving beasts. The challenge for the networking staff is to keep ahead of constantly changing bottlenecks that limit performance.
Inventorying the Network To understand the intimate role networking technology plays in life science research, it is necessary to look at the common elements that comprise every network. At the simplest level, there are really only three elements: the research scientist's workstation, from which data are viewed and programs are created and launched; the high-performance computer on which calculations are performed; and the storage devices that hold the data.
Obviously, these three elements need some way to communicate, which is where networking enters the picture. Here's a typical setup:
Workstations connect to the network using what is called a network interface card (NIC). A cable connects the card to a wall socket near a researcher's desk.
A cable then runs behind the wall from the wall jack to a wiring closet where the core of the networking infrastructure, called a wiring hub or switch, is located. The cable plugs into a port in the wiring hub.
Similarly, the high-performance computers and storage devices also connect to the wiring hubs and switches (see diagram above).
It is quite common for the hubs, switches and NICs that link the workstations to use either Ethernet or Fast Ethernet connections. Ethernet and Fast Ethernet support data transfer rates of up to 10Mbps or 100Mbps per second, respectively.
In many cases, the servers and high-performance computing platforms that perform the scientific calculations and manage the databases connect using either Fast Ethernet or Gigabit Ethernet, which supports data transfer rates of 1Gbps.
The trend with networks in life science companies is to extend connectivity to more and more people. Life science networking expenditures were $276 million in 2000, according to Debra Goldfarb, group vice president of worldwide systems and life science research for the International Data Corp. (IDC), a market research firm. But Goldfarb says spending is "expanding significantly as sites grow their internal and external networking infrastructure to support greater collaborative activities and integrate Web services more directly into the R&D process."
Goldfarb says there will be an increasing demand for gigabit or higher-speed switched infrastructure during the next few years, and that demand will result in a 35 percent compounded annual growth rate in the spending by life science companies on networking products. IDC estimates that life science companies will spend $1.6 billion on networking equipment in 2006.
Closely tied to the need for higher-speed networking infrastructure is the issue of storage.
"We're collecting more and more data," says Manuel Glynias, senior vice president of business at LION Bioscience AG. He notes, as many others in the industry have, that the mushrooming amount of data poses a growing challenge. For instance, at the National Center for Biotechnology Information in Bethesda, Md., discovery-relevant data doubles every six to nine months.
The growth of genomic, and soon proteomic, databases is driving storage. By 2006, life science companies will spend $11.8 billion for storage hardware and software, which will represent the largest single element of bio-IT spending, according to IDC.
Explosive growth in the volume and value of data is not unique to life sciences. In fact, providing access to large databases is changing the way networks are designed and built in companies in many different fields.
Storage Demands Shape Networks The increased demands for continuous availability and use of applications, as well as the ability to access increasing computing capabilities and storage, are driving a new network architecture, according to In-Stat/MDR, a networking industry market research firm. The total market for backbone network solutions will have a compound annual growth rate of more than 140 percent over the next four years, reaching $173.3 billion in 2006, according to In-Stat/MDR.
"[Storage area networks] are at the core of this new backbone network trend," says Lauri Vickers, manager of In-Stat/MDR's LAN group. Storage area networks (SANs) essentially are dedicated high-speed networks designed expressly to speed the retrieval of data from large databases.
In addition to hardware to store and retrieve data, companies will need a way to efficiently manage the data they have stored on their networks. "Companies are trying to rein in the costs of managing storage, and they need to protect the information assets they own," says David Hill, research director for storage and storage management at the Aberdeen Group Inc. consultancy. To that end, storage management software spending worldwide will grow to $21.2 billion for all companies by 2005, according to the Aberdeen Group.
Clearly, life science companies are on the leading edge of networking infrastructure trends thanks in part to the demands created by the huge volume of data that researchers are generating and manipulating in their calculations.
The challenge during the next few years will be how to keep ahead in the game by building higher-speed networking infrastructures to handle the data.
As part of Bio·IT World's mission to help life scientists and IT professionals better understand each other's worlds, we will occasionally publish subject primers — our "101" guides — such as this one on networking. Salvatore Salamone, Bio·IT World's senior editor for information technology, is both an experienced IT journalist and an established networking author and expert. His books include The Complete Guide to VPNs (published by InternetWeek, 1999), LAN Times Guide to Managing Remote Connectivity (Osborne McGraw-Hill, 1997), and Reducing the Cost of LAN Ownership (Van Nostrand Reinhold, 1995, co-written with Greg Gianforte). If you have networking questions, we invite you to go to our Web site, www.bio-itworld.com, and submit questions to Salamone as part of an ongoing program to make experts in many areas periodically available to answer your questions.