By Mark D. Uehling
Sept. 9, 2002 | Database die-hards may not be trash-talking professional wrestlers, but they're close.
"We can perform better getting access to Oracle data than Oracle can," says Bill Wong, director of DB2 for Linux and life sciences at IBM Corp. "We have never lost to IBM at a large client since the life sciences started," replies Vijay Pillai, Oracle Corp.'s principal product manager for life science server technology.
One of the most vexing questions for research-oriented customers is sorting out which database is best in this proteomic age. IBM and Oracle have been butting heads in databases for years. What's new is that, despite being an unofficial standard in the pharmaceutical industry, Oracle now seems vulnerable in the life sciences. Consultants who work with both companies say IBM has informatic and human capital specific to the life sciences that could eventually force Oracle to play a less prominent role.
"IBM has basically caught Oracle in market share," says Herb Edelstein, president of Two Crows Corp., a database consulting firm with major life science clients. "The whole database business has shifted from being a system sale, governed by IT folks, to an end user sale, governed by the consumers of the data. The techniques that work so well for building a data warehouse of market and product information don't necessarily translate to a genomics or proteomics database."
IBM on the Move
Edelstein praises DiscoveryLink, IBM middleware that allows users to query heterogeneous databases all over the world. There are now more than a dozen software wrappers that allow information in a variety of files or formats -- whether BLAST, SQL Server, Oracle, Informix, Excel spreadsheets, Sybase, Documentum or flat files -- to be searched via a single DiscoveryLink query. Says Edelstein: "What DiscoveryLink is trying to do is solve a business problem that is very difficult and that no company concerned with building drugs can or should be solving for themselves."
Such comments puzzle Oracle's Pillai. For him, DiscoveryLink is old wine in new bottles. "I thought a few years ago [DiscoveryLink] was going to be a big threat," Pillai says. "Then I looked under the covers." What he found, he says, was more or less Relational Connect, a paleolithic database migration package from IBM that is roughly comparable to Oracle's Gateway software. "If you want to query four databases around the world and make it look transparent, Oracle has that capability today," says Pillai.
As the man who managed the Celera Genomics account at Oracle for three years, Pillai is intimately familiar with some of the most monumental scientific databases ever created. He says most researchers don't need to manipulate such mammoth files on a regular basis, or query multiple uniquely structured databases at the same time -- a "federated" database approach. "If I am trying to do a cross-species analysis, trying to find where each gene is expressed or not expressed in each of those species, those kinds of things cannot be done using a federated environment," Pillai says.
A 'Consolidated' Environment?
But at a June meeting in Cambridge, Mass., sponsored by the Whitehead Institute Center for Genome Research and Hewlett-Packard Co., Pillai had a different emphasis. He was skeptical of the trend toward proliferating databases. "We see more and more databases. Every drug has its own database," Pillai says. "It may be good from a license standpoint, but it's not good for your environment. You have to use [the database] the way it was supposed to be used. There may be problems with performance. We are trying to move our customers to a more consolidated environment."
Pillai declined to elaborate, but IBM is delighted to interpret. IBM contends some Oracle customers are aghast that the company is recommending they move all of their data into one Oracle database. When slow performance is an issue, some are apparently told to manually partition large Oracle databases into smaller, more easily queried subcomponents. "The way that Oracle is telling customers to do it is unacceptable," says IBM's Wong. "We're not going to insist that you put everything together into one database."
When it comes to using multiple databases, Structural Bioinformatics Inc. is a case study. CEO Ed Maggio is developing drugs, but he can rattle off a variety of company databases licensed to biotech and pharmaceutical rivals. All are gargantuan repositories of computer-generated and experimentally verified structural data about proteins. His company maintains its data on both DB2 and Oracle 8 platforms. IBM has invested in his company, but he speaks highly of both companies.
Still, Maggio believes IBM has an edge. "There is no way you can require literally thousands of users to adhere to a single standard for storing information and retrieving it," he says. "What IBM has recognized is that people are going to generate and store data, and they're going to do it in a variety of ways. What's really important is the ability to integrate the use of this data. That's where the power resides. It isn't going to happen by forcing people to basically homogenize their database."
The aversion to proprietary formats is shared at the Biomolecular Interaction Network Database (BIND). That massive, public proteomics database is run by Chris Hogue, senior scientist at Toronto's Samuel Lunenfeld Research Institute. BIND is at the frontier of current databases: It will have 30 or 40GB online by year's end, primarily of protein-protein interactions and small molecule structures, all exported in XML or ASM.1 formats.
Hogue says he could have built the BIND database using DB2 or Oracle. When in 2000 he made the decision to use DB2, it looked like Oracle would scale to 8 Linux boxes. Hogue was planning to use more than 100. "There's a huge difference in performance," he says. "We went with DB2 because it would cluster on all 108 machines on my system. DB2 has always been cluster-enabled. Oracle is kind of a retrofit."
The Oracle Standard
Which is not to say Oracle customers are unhappy. Bob Bradish is senior database administrator at AstraZeneca Group. He has big genomic databases on Oracle, and they're running fine, thanks. Oracle is his company's official standard. "We are to use Oracle for any new or existing application unless that application or product we're purchasing requires a very specific database that cannot run on Oracle," Bradish says.
AstraZeneca does support DB2, but any meaningful migration away from Oracle is almost unthinkable. "Once you get a database in, even a small database, it is very difficult to shift to another database, especially when you have an application that is running fine," says Bradish. "You can't get your user groups to buy in and pay for a conversion."
So the question is not whether Oracle databases will be ripped out. It's whether Oracle will continue to be the star at the center of the bioinformatic universe or whether it will have to share more of the glory. What scientists demand could be key. If researchers insist on Oracle-friendly analytical software tools from companies like MDL Information Systems Inc., Tripos Inc., Accelrys Inc., and Daylight Chemical Information Systems Inc., even small biotech firms will choose Oracle and feel in sync with Big Pharma peers. If scientists must access multiple databases routinely, however, DB2 could continue to expand its market share at Oracle's expense.
Either way, IBM may spot key developments first. Database users and consultants often cite IBM's life science staff, heavily sprinkled with Ph.D.s, as a reason to feel more comfortable with DB2 in the years ahead. That talent could allow IBM to be more closely attuned to what scientists require. "My impression is that IBM has made a larger and more organized investment in life sciences, and that Oracle is revving up," notes Richard Winter, president of Winter Corp., a specialist in jumbo-sized databases. "It's an important question to ask: Who has more focus on the market, and who is investing more resources in anticipating needs?"
Sidebar: Life Left in Those Other Databases
Oracle and IBM are not the only game in life sciences databases. Glaxo Wellcome and Wyeth use Sybase, for example, to manage marketing and sales tasks. MySQL has a certain following as well. As a graduate student, in a day, Harvard University's Chris Dagdigian loaded Swissprot into a MySQL database, which is in use today at Harvard's Bauer Center for Genomic Research.
Indeed, the demands of very large databases --especially with mushrooming genomic data -- mean there is room for new ideas in the field. "There is an opportunity for someone to do something different here," says Mike Swenson, a life sciences analyst at IDC.
At the Winter Corp., consultant Richard Winter notes that there is another company that bears watching, even if its life science strategy has yet to be articulated formally. "Microsoft is a huge factor in the database space," he says. "But at this point it is mostly mid- to low-end applications that are implemented on SQL Server. They have been strengthening SQL Server. They're investing at a substantial pace. I believe there are soon will be announcements from HP or other players for very large, robust platforms for SQL Server. It will give Oracle and IBM a run for their money in the mid range."
One win for Microsoft is an Israeli company Rosetta Genomics, which does genomics data analysis. Using off-the-shelf- dual processor Dell Servers and SQL Server, Rosetta plans to build a 10 terabyte database within a year.