Oct. 10, 2007
| One way to address the data management issue is to store data more efficiently so that it takes up less space and is easier to query. That is the general idea behind a new database from start-up Vertica.
The company was founded by life sciences veteran Andy Palmer and database veteran Michael Stonebraker. Palmer was most recently CIO and senior vice president at Infinity Pharmaceuticals. He also served as president of the Interoperable Informatics Infrastructure Consortium (I3C). Stonebraker was the main architect of the INGRES relational DBMS, and the object-relational DBMS, POSTGRES.
Most databases are optimized to handle a large number of updates. The Vertica Database is a general-purpose relational database system designed to provide extremely good performance on read-intensive query workloads.
“In many [industries], there are applications and uses of database technology where people spend much more time reading rather than writing to a database,” said Palmer. “I figured there was an opportunity to build from scratch an SQL database for read-only mode.”
The database organizes data on disk as columns of values from the same attribute, as opposed to storing it as rows of tabular records. This means that when a query needs to access only a few columns of a particular table, only those columns need to be read from disk. Conversely, in a row-oriented database, all values in a table are typically read from disk, which wastes I/O bandwidth.
Storing data in the column-oriented manner improves performance. “Because of the way the data is represented, queries can be completed in reasonable times,” said Palmer.
The Vertica Database also uses aggressive compression of data on disk, as well as a query execution engine that is able to keep data compressed while it is operated on. “Because of [the] significant compression, [it] is much more efficient allowing you to keep more data,” said Palmer.
According to Vertica, these technologies help execute queries much faster than traditional relational database management systems and require significantly less storage space.
Palmer notes that the technology is well suited to life sciences applications such as those that tag data using the World Wide Web Consortium’s Resource Description Framework (RDF). -- S.S.
Return to main article.