YouTube Facebook LinkedIn Google+ Twitter Xinginstagram rss  

By Hillel Alpert

December 15, 2002
| Until recently, only expensive mainframes and high-powered supercomputers could be summoned to grapple with the flood of complex genomic data, which is expected to grow to 250 billion records (25 terabytes) in the next 12 months and to 1 trillion records (100TB) the following year. Rosetta Genomics Ltd. (not to be confused with Rosetta Inpharmatics Inc. in Seattle), a startup located in Jerusalem's Ein Karem neighborhood, has a relatively inexpensive solution to accommodate the computing challenge of genome sequence analysis on such a large scale: It is PC-based and uses an industry-standard database — namely, Microsoft SQL Server 2000.

The platform's most important feature is its massive scalability, according to Isaac Bentwich, Rosetta's founder, chairman, and CEO. "If you want to scale up a mainframe-based system, you simply can't line up 100 mainframes next to each other. Working with databases, this is quite possible," Bentwich explains. "Our platform facilitates easy chaining of additional servers to multiply the processing power at approximately one-fifth the cost." That represents a huge savings for extremely heavy genomic data-mining tasks, which may cost as much as $10 million.

Microsoft Corp.'s SQL Server 2000 was originally built to handle 2 billion records. Rosetta Genomics approached the software giant and presented it with real-life customer applications that would require a 100-billion record capacity. Together, they succeeded in overcoming the technological barriers to increase the size and performance of the database, and it is now capable of performing functions such as simulation of the genome and execution of complex queries on billions of records.

Rosetta Genomics has also developed a genomic query accelerator that improves complex query times by as much as 900 percent, as well as a method of indexable genomic data compression that achieves 40 percent compression while still allowing the data to be fully indexable.

"Currently, we have one database instance measured at 2TB and 20 billion individual records, partitioned into 16 data file groups, each comprising eight files," Bentwich says. "There is also one temporary database file group, comprising four files, and one log file group comprising four files. Primary data is stored in one file group, comprising four files."

Bentwich says that the company will be able to achieve 100 terabytes on a high-volume storage device (such as two EMC Symmetrix machines) working with two 16-processor servers or four eight-processor servers, the Microsoft Windows 2000 Datacenter Server operating system, and possibly Distributed Partition Views technology.

Rosetta Genomics also aspires to be a gene discovery company, not only supplying the technology but also using it to discover novel disease-related genes.

—Hillel Alpert 

Back to Bio-IT Shines Bright in Israel 

For reprints and/or copyright permission, please contact Angela Parsons, 781.972.5467.