By Salvatore Salamone
June 12, 2002 | Much of the data collected in drug research efforts and clinical trials never changes. Yet it must be retained for more than decade, which is a storage-management headache.
To address this challenge, EMC Corp. has introduced Centera, a storage product line optimized to more efficiently manage large volumes of permanent data over long periods of time. Centera takes a fresh approach by using a new storage architecture that intimately links data storage with the applications used to create, view, and store data.
In a traditional storage system, a network administrator or the person using the application would have to change the network address of the stored file within the application any time a file is moved to a new storage device.
With Centera, the only thing an application needs to find any data file is a unique identifier called a Content Address, which is generated the first time the file is stored and does not change when a file is moved. The Content Address relieves the application of keeping track of the data file's physical location.
This simplifies long-term storage management, according to EMC. Over time, files are likely to be moved to different storage drives or devices. “When we’re talking long periods of time, the physical media holding the data might not last, and even if it does, the format the data is stored in may become outdated,” says Barry Burke, director of integrated solutions in networked storage marketing at EMC.
In for the Long Haul
The issue of managing permanent data is just starting to get some attention.
No specific studies have determined what percentage of life science data does not change. Across all life science markets, however, 75 percent of all new digital data is fixed content, according to Hal R. Varian, dean of the School of Information Management and Systems at the University of California at Berkley.
Anecdotally, the 75 percent figure seems reasonable in life science applications.
Most research and development experiments generate lab results that are simply kept on file somewhere. For example, sequencing data is retained for future annotation and comparisons. And in clinical trials, patient information such as X-rays, medical history, and drug reactions are all collected once and never modified.
For such long-term storage “there are lots of problems with tape and optical storage systems,” said Varian during a Webcast interview on storage issues earlier this year. “One of the biggest problems is the formats keep changing. And whenever you have a change in format, you have a big problem with data migration.
“It’s easier to have the data available on hard drives, because migrating becomes a much smaller problem,” Varian said. Centera manages such migrations.
EMC is not the only game in town when it comes to improving the management of large volumes of fixed-content data. Network Appliance Inc. is working on the problem but takes a somewhat different approach.
Network Appliance has its NetCache and NearStore product lines that complement the company's high-availability, fast-access storage systems, NetApp Filers.
NetCache moves data closer to the people who use it by caching it locally after the first retrieval. All requests for data and files go through NetCache, which checks to see if the desired information is available locally. If it is, the request is answered on the spot. If not, the request is forwarded to the central data repository.
Without caching, if 100 workers in a branch office want to view the same file, it must be pulled from the central repository 100 times. NetCache is more efficient and saves corporate bandwidth.
Infrequently accessed data that must remain online can be managed by NearStore, which Network Appliance bills as a lower cost solution than NetApp Filers.