February 11, 2012
| Bio-IT World > Written in Stone


Written in Stone


By Hillel Alpert

December 15, 2002
| Until recently, only expensive mainframes and high-powered supercomputers could be summoned to grapple with the flood of complex genomic data, which is expected to grow to 250 billion records (25 terabytes) in the next 12 months and to 1 trillion records (100TB) the following year. Rosetta Genomics Ltd. (not to be confused with Rosetta Inpharmatics Inc. in Seattle), a startup located in Jerusalem's Ein Karem neighborhood, has a relatively inexpensive solution to accommodate the computing challenge of genome sequence analysis on such a large scale: It is PC-based and uses an industry-standard database — namely, Microsoft SQL Server 2000.

The platform's most important feature is its massive scalability, according to Isaac Bentwich, Rosetta's founder, chairman, and CEO. "If you want to scale up a mainframe-based system, you simply can't line up 100 mainframes next to each other. Working with databases, this is quite possible," Bentwich explains. "Our platform facilitates easy chaining of additional servers to multiply the processing power at approximately one-fifth the cost." That represents a huge savings for extremely heavy genomic data-mining tasks, which may cost as much as $10 million.

Microsoft Corp.'s SQL Server 2000 was originally built to handle 2 billion records. Rosetta Genomics approached the software giant and presented it with real-life customer applications that would require a 100-billion record capacity. Together, they succeeded in overcoming the technological barriers to increase the size and performance of the database, and it is now capable of performing functions such as simulation of the genome and execution of complex queries on billions of records.

Rosetta Genomics has also developed a genomic query accelerator that improves complex query times by as much as 900 percent, as well as a method of indexable genomic data compression that achieves 40 percent compression while still allowing the data to be fully indexable.

"Currently, we have one database instance measured at 2TB and 20 billion individual records, partitioned into 16 data file groups, each comprising eight files," Bentwich says. "There is also one temporary database file group, comprising four files, and one log file group comprising four files. Primary data is stored in one file group, comprising four files."

Bentwich says that the company will be able to achieve 100 terabytes on a high-volume storage device (such as two EMC Symmetrix machines) working with two 16-processor servers or four eight-processor servers, the Microsoft Windows 2000 Datacenter Server operating system, and possibly Distributed Partition Views technology.

Rosetta Genomics also aspires to be a gene discovery company, not only supplying the technology but also using it to discover novel disease-related genes.

—Hillel Alpert 

Back to Bio-IT Shines Bright in Israel 






White Papers & Special Reports

sgi whp 2
Managing the Modern Genomics Data Flood
Sponsored by SGI

Managing and storing the perfect storm of multi-disciplined data pouring from next generation sequencers and other omics instruments is a central challenge in life sciences. Discover in this paper how the SGI ArcFiniti storage solution, optimized for unstructured genomics and life sciences data can: 

  • Reduce costs, proactively protect data integrity, and deliver the high performance I/O required for genomics data processing and analysis.  
  • Effectively manage capacities from 156TB to 1.4PB as a disk based, integrated hardware and software platform 


sgi - whp 1
Turning Genomics Data into Practical Insight
Sponsored by SGI

With worldwide sequencing capacity approaching 13 quadrillion DNA bases annually turning genomics data into knowledge is a true computational challenge. Read this paper and learn how the SGI UV coherent shared memory platform can:  

  • Speed results time while cost competitively tackling the most difficult computational problems across all omics disciplines. 
  • Push performance by scaling to extraordinary levels, up to 256 sockets (2,560 cores, 4,096 threads) per single system (one OS image). 

Provide support for up to 16TB of coherent shared memory in a single system image enabling extreme efficiency across a wide range of compute demands. 



accerlys-logo_2012_wh
New Complimentary Market Survey…
Collaborations and Communications Within Drug Discovery Research
Sponsored by Accelrys
This survey was conducted by the Cambridge Healthtech Media Group in January, 2012. It was sponsored by Accelrys related to their HEOS initiative to gather valid information around externalizing collaborative research while improving communications in the cloud. With 310 qualified industry respondents the survey findings reveal useful usage and trends patterns.  An insightful follow-on discussion and webinar related to this survey, and the HEOS by Scynexis SaaS portal is also available on the Bio-IT World website for complementary viewing.
 


Job Openings

tessella logo 
Scientific Software Engineer
Boston MA
$70,000 to $95,000
 
Apply at http://jobs.tessella.com   

oxford nanopore logo 


Early Access Collaborations ManagersClick here to find out more and apply   

Oxford Nanopore's GridION technology, VP, Sales and Marketing Click to  Apply  

For reprints and/or copyright permission, please contact  Tim McLucas, (781) 972-1342, tmclucas@healthtech.com .