Ellison Charts Course for 10g


By Mark D. Uehling

October 15, 2003 | All Jake Chen wanted to do was find similar pairs of proteins in yeast and humans. As principal bioinformatics scientist at Myriad Proteomics, he does not mince words about one of his most indispensible tools. "Using BLAST in the traditional way can be a laborious process and a data management hell," Chen says, ticking off half a dozen steps of copying, pasting, and file creation as he switches between a variety of applications, databases, and Web sites.

Now, he says, he's found a better way: the next release of Oracle's database, 10g. Much of the work could occur within the Oracle environment. "Now what seemed quite daunting in the old paradigm seems extremely easy. It's simply a join between four tables. We [at Myriad] demand a richer and richer syntax and biological support with the database management system. Oracle 10g can benefit our scientific discovery in protein interactome studies."

10g is the next edition of Oracle's database, and it was announced in September at OracleWorld, the database company's annual trade show and pep rally  in San Francisco. BLAST algorithms from the National Center for Biotechnology Information (NCBI) have been incorporated directly into 10g -- and that is not the only functionality specifically targeted toward the life sciences. Also being added are statistical tools and cutting-edge bioinformatic algorithms such as support vector machines (SVMs), used for the analysis of large data sets. Much of that was specifically requested by life science heavies like the Whitehead Institute, where Pablo Tamayo, a consulting member of technical staff, works half-time for Oracle.

Built-In BLAST
Although CEO Larry Ellison is not quite able to articulate a coherent life science strategy different from, say, the company's financial services strategy, his lieutenants are salivating at the prospect of ever-larger quantities of scientific data. "We're trying to make it easier to get the data into the database, to give you more and more things to do once it's in there, and give you fewer excuses to ever have to take it out," says Charlie Berger, Oracle's senior director, product management, life sciences.

One of the biggest changes to Oracle's next database is an evolution of distributed computing features built into the company's current database, 9i. Oracle is promoting "grid" computing and attempting to redefine the term. The company is relegating what it calls "scientific" grid computing to a few large U.S. government labs. What Oracle says is new is "enterprise" grid computing for the corporate world -- dynamically and automatically allocating additional computational resources to any garden-variety server-based application (from Oracle, SAP, or Siebel) that requires it.

Benny Souder, vice president of distributed computing at Oracle, notes that many big applications and computational resources sit idle, awaiting the end of a fiscal quarter or a payroll period. "I want to be able to say, Emily's on the verge of discovering a new drug -- take Charlie's computer and Mark's computer and everybody else's computer and the server and let's apply it to the problem," Souder says. "And let's get the answer in four hours. That's the kind of capability we're talking about."

Ellison is no less grandiose. He did not demo the grid features or the database or disclose pricing; he also declined to name a date by when the product would ship. But he said his new database would run 10 times faster at one-tenth the cost. In a briefing with the press, Ellison made it clear he's been trying to do grid computing for more than a decade. "We started this clustering quest 14 years ago," he said. One of his inspirations, he added, was a product known as Sysplex, an obscure database that runs on IBM mainframes.

A key part of the hypothetical savings could be fewer database administrators (DBAs). Here Ellison is treading on slippery ground. The DBA is part of the company's well-entrenched market position. Knowing both Oracle and a particular company's implementation of it allows selected DBAs to be paid more than the president of the United States. For the CIO considering an upgrade to the company's database, Oracle is unapologetically promising savings by building in automated administrative and management functions. "It's not surprising that our industry matures and becomes less labor intensive," Ellison says.

Oracle's partners echoed that theme. "Our industry is way too complex," Sun Microsystems CEO Scott McNealy told the OracleWorld audience. "It's too expensive to deliver the services we're delivering today. There is an order of magnitude way too many employees delivering functionality that we're delivering."

HP: We Do Grids, Too
Another big, struggling company, HP, also sent its highest executive. CEO Carly Fiorina put her own spin on the grid: "When I hear other people preach about the path to grid and say that standardization is the path, my hype meter goes off," she said. "The grid is about a whole lot more than a single rack and a lot of servers. We are focused squarely on the management and execution of grid services. This is where we intend to make a major contribution."

Ironically, for all the professed solidarity of Sun and HP -- and the companies are working closely with Oracle -- it is much cheaper hardware that seems to be driving the shift to the grid. As racks of cheap Intel or Linux boxes continue to drop in price and be available in blade form factors, Oracle is betting that people like Matthew Cockerill, of BioMed Central, will continue to use Oracle.

As Cockerill explained at OracleWorld, BioMed Central has some 70GB of data online and 250,000 users. For him, the combination of XML in Oracle simplifies serving up data to 15,000 simultaneous users: The XML tags can show every strange German or Scandinavian character, and enable text searches on the full name of a journal or an abbreviated name. Keeping everything in Oracle, he said, simplified the management of the mountain of articles, images, PDFs, and movies on the site. There are no discrepancies between a database and a file system; instead, everything is in a data repository.

What Do Scientists Want?
Of course, it's by no means clear if people outside Oracle are as excited by the grid. Even Oracle's handpicked speakers conceded that in off-guard moments. Peter Smith, director of discovery research applications at Wyeth Research, is a fan of Oracle and disparaged the storage of critical data in a nonauditable file like an Excel spreadsheet. But it's not clear when he'll migrate to 10g. "The cost to move from 8 to 9 is huge," Smith said. "We have to validate everything again."

In part, Oracle's strategy seems designed to have scientists using tools from third-party developers -- Rosetta Biosoftware, Spotfire, MDL, Accelrys, Applied Biosystems, Agilent, and others -- to gather and analyze data. That scenario is plausible: As the pressure to streamline the scientific workflow increases, the manifold inefficiencies could become more apparent to scientists and the people who pay their salaries.

Still, the complexity of Oracle and the learning curves associated with applications built on top of Oracle should not be underestimated. "Will the scientists use Oracle?" asks Srinivasan Seshadri, CEO of Strand Genomics. "I'm not convinced."

 



White Papers & Special Reports

sgi whp 2
Managing the Modern Genomics Data Flood
Sponsored by SGI

Managing and storing the perfect storm of multi-disciplined data pouring from next generation sequencers and other omics instruments is a central challenge in life sciences. Discover in this paper how the SGI ArcFiniti storage solution, optimized for unstructured genomics and life sciences data can: 

  • Reduce costs, proactively protect data integrity, and deliver the high performance I/O required for genomics data processing and analysis.  
  • Effectively manage capacities from 156TB to 1.4PB as a disk based, integrated hardware and software platform 


sgi - whp 1
Turning Genomics Data into Practical Insight
Sponsored by SGI

With worldwide sequencing capacity approaching 13 quadrillion DNA bases annually turning genomics data into knowledge is a true computational challenge. Read this paper and learn how the SGI UV coherent shared memory platform can:  

  • Speed results time while cost competitively tackling the most difficult computational problems across all omics disciplines. 
  • Push performance by scaling to extraordinary levels, up to 256 sockets (2,560 cores, 4,096 threads) per single system (one OS image). 

Provide support for up to 16TB of coherent shared memory in a single system image enabling extreme efficiency across a wide range of compute demands. 



accerlys-logo_2012_wh
New Complimentary Market Survey…
Collaborations and Communications Within Drug Discovery Research
Sponsored by Accelrys
This survey was conducted by the Cambridge Healthtech Media Group in January, 2012. It was sponsored by Accelrys related to their HEOS initiative to gather valid information around externalizing collaborative research while improving communications in the cloud. With 310 qualified industry respondents the survey findings reveal useful usage and trends patterns.  An insightful follow-on discussion and webinar related to this survey, and the HEOS by Scynexis SaaS portal is also available on the Bio-IT World website for complementary viewing.
 


Job Openings

tessella logo 
Scientific Software Engineer
Boston MA
$70,000 to $95,000
 
Apply at http://jobs.tessella.com   

oxford nanopore logo 


Early Access Collaborations ManagersClick here to find out more and apply   

Oxford Nanopore's GridION technology, VP, Sales and Marketing Click to  Apply  

Related Resources & Products




For reprints and/or copyright permission, please contact  Tim McLucas, (781) 972-1342, tmclucas@healthtech.com .