February 11, 2012
| Bio-IT World > Pack It In


Pack It In



Oct. 10, 2007 | One way to address the data management issue is to store data more efficiently so that it takes up less space and is easier to query. That is the general idea behind a new database from start-up Vertica.

The company was founded by life sciences veteran Andy Palmer and database veteran Michael Stonebraker. Palmer was most recently CIO and senior vice president at Infinity Pharmaceuticals. He also served as president of the Interoperable Informatics Infrastructure Consortium (I3C). Stonebraker was the main architect of the INGRES relational DBMS, and the object-relational DBMS, POSTGRES.

Most databases are optimized to handle a large number of updates. The Vertica Database is a general-purpose relational database system designed to provide extremely good performance on read-intensive query workloads.

“In many [industries], there are applications and uses of database technology where people spend much more time reading rather than writing to a database,” said Palmer. “I figured there was an opportunity to build from scratch an SQL database for read-only mode.”

The database organizes data on disk as columns of values from the same attribute, as opposed to storing it as rows of tabular records. This means that when a query needs to access only a few columns of a particular table, only those columns need to be read from disk. Conversely, in a row-oriented database, all values in a table are typically read from disk, which wastes I/O bandwidth.

Storing data in the column-oriented manner improves performance. “Because of the way the data is represented, queries can be completed in reasonable times,” said Palmer.

The Vertica Database also uses aggressive compression of data on disk, as well as a query execution engine that is able to keep data compressed while it is operated on. “Because of [the] significant compression, [it] is much more efficient allowing you to keep more data,” said Palmer.

According to Vertica, these technologies help execute queries much faster than traditional relational database management systems and require significantly less storage space.

Palmer notes that the technology is well suited to life sciences applications such as those that tag data using the World Wide Web Consortium’s Resource Description Framework (RDF). -- S.S.


Return to main article.

Click here to login and leave a comment.  

0 Comments

Add Comment

Text Only 2000 character limit

Page 1 of 1



White Papers & Special Reports

sgi whp 2
Managing the Modern Genomics Data Flood
Sponsored by SGI

Managing and storing the perfect storm of multi-disciplined data pouring from next generation sequencers and other omics instruments is a central challenge in life sciences. Discover in this paper how the SGI ArcFiniti storage solution, optimized for unstructured genomics and life sciences data can: 

  • Reduce costs, proactively protect data integrity, and deliver the high performance I/O required for genomics data processing and analysis.  
  • Effectively manage capacities from 156TB to 1.4PB as a disk based, integrated hardware and software platform 


sgi - whp 1
Turning Genomics Data into Practical Insight
Sponsored by SGI

With worldwide sequencing capacity approaching 13 quadrillion DNA bases annually turning genomics data into knowledge is a true computational challenge. Read this paper and learn how the SGI UV coherent shared memory platform can:  

  • Speed results time while cost competitively tackling the most difficult computational problems across all omics disciplines. 
  • Push performance by scaling to extraordinary levels, up to 256 sockets (2,560 cores, 4,096 threads) per single system (one OS image). 

Provide support for up to 16TB of coherent shared memory in a single system image enabling extreme efficiency across a wide range of compute demands. 



accerlys-logo_2012_wh
New Complimentary Market Survey…
Collaborations and Communications Within Drug Discovery Research
Sponsored by Accelrys
This survey was conducted by the Cambridge Healthtech Media Group in January, 2012. It was sponsored by Accelrys related to their HEOS initiative to gather valid information around externalizing collaborative research while improving communications in the cloud. With 310 qualified industry respondents the survey findings reveal useful usage and trends patterns.  An insightful follow-on discussion and webinar related to this survey, and the HEOS by Scynexis SaaS portal is also available on the Bio-IT World website for complementary viewing.
 


Job Openings

tessella logo 
Scientific Software Engineer
Boston MA
$70,000 to $95,000
 
Apply at http://jobs.tessella.com   

oxford nanopore logo 


Early Access Collaborations ManagersClick here to find out more and apply   

Oxford Nanopore's GridION technology, VP, Sales and Marketing Click to  Apply  

For reprints and/or copyright permission, please contact  Tim McLucas, (781) 972-1342, tmclucas@healthtech.com .