By Mark D. Uehling
October 15, 2003 | All Jake Chen wanted to do was find similar pairs of proteins in yeast and humans. As principal bioinformatics scientist at Myriad Proteomics, he does not mince words about one of his most indispensible tools. "Using BLAST in the traditional way can be a laborious process and a data management hell," Chen says, ticking off half a dozen steps of copying, pasting, and file creation as he switches between a variety of applications, databases, and Web sites.
Now, he says, he's found a better way: the next release of Oracle's database, 10g. Much of the work could occur within the Oracle environment. "Now what seemed quite daunting in the old paradigm seems extremely easy. It's simply a join between four tables. We [at Myriad] demand a richer and richer syntax and biological support with the database management system. Oracle 10g can benefit our scientific discovery in protein interactome studies."
10g is the next edition of Oracle's database, and it was announced in September at OracleWorld, the database company's annual trade show and pep rally in San Francisco. BLAST algorithms from the National Center for Biotechnology Information (NCBI) have been incorporated directly into 10g -- and that is not the only functionality specifically targeted toward the life sciences. Also being added are statistical tools and cutting-edge bioinformatic algorithms such as support vector machines (SVMs), used for the analysis of large data sets. Much of that was specifically requested by life science heavies like the Whitehead Institute, where Pablo Tamayo, a consulting member of technical staff, works half-time for Oracle.
Although CEO Larry Ellison is not quite able to articulate a coherent life science strategy different from, say, the company's financial services strategy, his lieutenants are salivating at the prospect of ever-larger quantities of scientific data. "We're trying to make it easier to get the data into the database, to give you more and more things to do once it's in there, and give you fewer excuses to ever have to take it out," says Charlie Berger, Oracle's senior director, product management, life sciences.
One of the biggest changes to Oracle's next database is an evolution of distributed computing features built into the company's current database, 9i. Oracle is promoting "grid" computing and attempting to redefine the term. The company is relegating what it calls "scientific" grid computing to a few large U.S. government labs. What Oracle says is new is "enterprise" grid computing for the corporate world -- dynamically and automatically allocating additional computational resources to any garden-variety server-based application (from Oracle, SAP, or Siebel) that requires it.
Benny Souder, vice president of distributed computing at Oracle, notes that many big applications and computational resources sit idle, awaiting the end of a fiscal quarter or a payroll period. "I want to be able to say, Emily's on the verge of discovering a new drug -- take Charlie's computer and Mark's computer and everybody else's computer and the server and let's apply it to the problem," Souder says. "And let's get the answer in four hours. That's the kind of capability we're talking about."
Ellison is no less grandiose. He did not demo the grid features or the database or disclose pricing; he also declined to name a date by when the product would ship. But he said his new database would run 10 times faster at one-tenth the cost. In a briefing with the press, Ellison made it clear he's been trying to do grid computing for more than a decade. "We started this clustering quest 14 years ago," he said. One of his inspirations, he added, was a product known as Sysplex, an obscure database that runs on IBM mainframes.
A key part of the hypothetical savings could be fewer database administrators (DBAs). Here Ellison is treading on slippery ground. The DBA is part of the company's well-entrenched market position. Knowing both Oracle and a particular company's implementation of it allows selected DBAs to be paid more than the president of the United States. For the CIO considering an upgrade to the company's database, Oracle is unapologetically promising savings by building in automated administrative and management functions. "It's not surprising that our industry matures and becomes less labor intensive," Ellison says.
Oracle's partners echoed that theme. "Our industry is way too complex," Sun Microsystems CEO Scott McNealy told the OracleWorld audience. "It's too expensive to deliver the services we're delivering today. There is an order of magnitude way too many employees delivering functionality that we're delivering."
HP: We Do Grids, Too
Another big, struggling company, HP, also sent its highest executive. CEO Carly Fiorina put her own spin on the grid: "When I hear other people preach about the path to grid and say that standardization is the path, my hype meter goes off," she said. "The grid is about a whole lot more than a single rack and a lot of servers. We are focused squarely on the management and execution of grid services. This is where we intend to make a major contribution."
Ironically, for all the professed solidarity of Sun and HP -- and the companies are working closely with Oracle -- it is much cheaper hardware that seems to be driving the shift to the grid. As racks of cheap Intel or Linux boxes continue to drop in price and be available in blade form factors, Oracle is betting that people like Matthew Cockerill, of BioMed Central, will continue to use Oracle.
As Cockerill explained at OracleWorld, BioMed Central has some 70GB of data online and 250,000 users. For him, the combination of XML in Oracle simplifies serving up data to 15,000 simultaneous users: The XML tags can show every strange German or Scandinavian character, and enable text searches on the full name of a journal or an abbreviated name. Keeping everything in Oracle, he said, simplified the management of the mountain of articles, images, PDFs, and movies on the site. There are no discrepancies between a database and a file system; instead, everything is in a data repository.
What Do Scientists Want?
Of course, it's by no means clear if people outside Oracle are as excited by the grid. Even Oracle's handpicked speakers conceded that in off-guard moments. Peter Smith, director of discovery research applications at Wyeth Research, is a fan of Oracle and disparaged the storage of critical data in a nonauditable file like an Excel spreadsheet. But it's not clear when he'll migrate to 10g. "The cost to move from 8 to 9 is huge," Smith said. "We have to validate everything again."
In part, Oracle's strategy seems designed to have scientists using tools from third-party developers -- Rosetta Biosoftware, Spotfire, MDL, Accelrys, Applied Biosystems, Agilent, and others -- to gather and analyze data. That scenario is plausible: As the pressure to streamline the scientific workflow increases, the manifold inefficiencies could become more apparent to scientists and the people who pay their salaries.
Still, the complexity of Oracle and the learning curves associated with applications built on top of Oracle should not be underestimated. "Will the scientists use Oracle?" asks Srinivasan Seshadri, CEO of Strand Genomics. "I'm not convinced."