By Salvatore Salamone
May 13, 2003 | After trying a traditional relational database management system and an object-based database, the Center for Computational Pharmacology (CCP) at the University of Colorado at Denver didn’t opt for either. It took the less-traveled road and selected a native XML database architecture --- a move with some benefits and some pitfalls.
The process leading to the decision began about two years ago. The CCP set out to build a Web site that would let researchers in the National Institutes of Health’s Integrated Neuroscience Initiative on Alcoholism program share data and perform gene expression analysis. The challenge was finding a suitable backend database to support the work. “We wanted to be able to exchange data with other labs, and we wanted this task to be as simple as possible,” says Ron Taylor, director of gene expression analysis at CCP.
One challenge was not knowing the format of all the data the site would need to support. “We believed the lab would need to accommodate many new types of data over the five-year life of the project,” Taylor says.
That relational database management systems require that the structure of the database be predefined is a common problem. A researcher or programmer needs to set up specific categories for the types of data, and any associations between the categories must be spelled out before any data can be entered.
Predefining the structure of a database works only when you know exactly what type of data needs to be supported. The CCP staff did not have that luxury. They knew what only some of the data would be -- gene expression results from microarrays, for example.
Data from the Future
But they did not know every format and data type that would need to be accommodated. Besides gene expression data, CCP officials wanted the system to manage future information needs as well, such as storing data on protein levels and signaling pathways in the brain.
So Taylor tried an object-oriented database that allowed more flexibility in supporting a variety of formats. However, when he tried to input a protein database, he found it took several days to upload. What’s more, indexing the items in the database was difficult. This approach was quickly ruled out.
While this research was going on, Harvey Greenberg, director of the Center for Computational Biology at the University of Colorado at Denver, was looking for corporate partners and industry alliances to further the work at the university. (The CCP is one of the institutions within the CCB.) He met a local company called NeoCore that sells a native XML database system, called NeoCore XMS. It turned out that NeoCore XMS was a good choice for the CCP’s database needs.
“A native XML database matched our database model better than a RDMS,” Taylor says. “It allows a more natural match-up between what we put in and what we take out of the database.” Additionally, NeoCore supported XML query techniques like XPath and XQuery that are commonly used in informatics research.
Virtually all the data the CCP was using and planning to use were in XML format. So the native XML database serving as the backend for the CCP site would make it easier to move experiment results online. For example, the protein database that took Taylor several days to upload and index into his object-oriented database system took only 45 minutes to insert.
Use of native XML databases is relatively new. But many see such databases playing a major role in bio-IT. “As we become more aligned with other disciplines, [a native XML database] makes it easier to incorporate data from these other disciplines,” Greenberg says.
Is a native XML database for everyone? The short answer is no.
In cases like the CCP’s where there would be many varied forms of XML data, a native XML database makes sense. “But an RDMS can handle XML if the XML document is very rigid,” says Ronald Schmelzer, a senior analyst with the XML and Web-services research firm ZapThink.
But Schmelzer says the RDMS can’t simply be dismissed. “RDMSes have lots of features that most native XML systems don’t,” Schmelzer says. For instance, many relational systems can run on a wide range of operating systems, support data integrity services like data backup, and have built-in resiliency features like workload balancing and failover in the event a server goes down.
Another word of caution about using such new products: watch the financials of the vendors. NeoCore closed its doors in March. CEO Ric Miles is looking for funding to resurrect the company. For now, CCP is sticking with its native XML database.