YouTube Facebook LinkedIn Google+ Twitter Xingrss  

Isilon’s Data Storage Odyssey

CEO Sujal Patel and the quest to “remove storage as an impediment to progress.”

By Kevin Davies

May 19, 2009
| During his five years working at Real Networks, the pioneering streaming media Internet company, Sujal Patel began to appreciate the sheer size and growth of the data his customers were trying to store. Their frustrations sharpened his ideas on how to solve the storage scalability problem. In 2001, he founded Isilon Systems in Seattle, with the goal of building a storage architecture from scratch. Patel realized this file-based explosion was much broader than digital content, reaching all sorts of data—oil and gas, media, design, and bioinformatics.

In 2007, Patel took over as CEO after the company went public. Since then, Isilon has reacted to what Patel calls the “avalanche of life sciences data,” attracting marquee clients both in academia (Broad Institute, UCLA) and industry (Complete Genomics). Nowhere is that growth more evident than in the data-intensive next-generation sequencing space. The proportion of life science customers grew from 2% to 12% in 2008, while total revenue almost doubled in two years, pushing above $100 million. The company’s client base will soon top 1,000. “We have an incredible cash balance ($80 million) to innovate during this time,” says Patel. “There’s still a lot of work to be done to take storage out as an impediment to progress.”

On a recent visit to Bio•IT World headquarters, Patel unveiled a new range of storage products. Isilon is purely a storage company, says Patel, but in the broadest sense. “Storage for us is about data protection, data management, long-term data analysis, anything having to do with data purely at the storage level.” When asked why launch the products now, Patel replies simply: “We’re done building them!”

There are two big problems facing life sciences organizations, says Patel. “One is a capture problem—how do I get the data from the sequencer, get it in and start to work with it? Then, even larger, is the huge repository of the genome sequences that you want to keep online for a long time, refer to them and analyze them. That’s the larger opportunity.” For example, Patel says the problem facing Complete Genomics is how to generate and store 30 petabytes (PB) of human genome data online for a year? “It’s a huge task,” he admits.

Isilon’s traction in the biomedical community can be traced to several factors. “Isilon has a scalable architecture that allows you to add both performance and capacity incrementally and thus scale to where your application demands you to be,” says Noemi Greyzdorf, research manager with IDC.  Traditional NAS [network-attached storage] approaches limit users to the performance of a single node. Clustered file systems are typically associated with high-performance computing environments “but the advances in applications in the biomedical world and other verticals, have increased demand for that architecture in more commercial applications.” Moreover, “Isilon has done a lot of work on making it user friendly so a typical storage/NAS administrator can maintain the environment as any other NAS.”

Designing Storage
Patel realized in founding Isilon that regardless of the data type—sequence data, video clips, chip designs—storing huge repositories of fast-growing files required a much more scalable architecture than what existed at the time. “We leveraged principles of clustered computing to create from scratch a whole new architecture for storage, built around the needs of these huge stores of file-based data. When you have a huge store of file-based data, you have to get it onto the storage quickly, grow and scale effectively, utilize all hardware effectively and simplify that whole architecture. We did that and it intersected exactly with next-generation sequencing requirements.”

While “very sophisticated” clients such as the Broad Institute (see “A Broad View,” Bio•IT World, Apr 2008) and Complete Genomics are among the more prestigious clients, there are many more customers with impressive needs of their own—a few sequencers and 1.5 PB storage. “For anyone who knows how to manage storage and move data around, the data protection strategy, the simplicity of our solutions is something that customers in this space really like,” says Patel.

Providing solutions for the life sciences market, Isilon draws on experiences in other tiers. “In the semiconductor space, it’s getting to the point where chip designs are getting so complex that the needs out of the storage system have grown by an order of magnitude. That sort of growth parallels the same sort of thing we’re seeing in life sciences and next-gen sequencing, the same sort of thing we saw a couple of years ago in media.”

Ideally, Patel says, it would be nice to have “a big Z drive in the sky, you’d put my genome, your genome, and so on. But if you look at NetApp and EMC, the two leaders in the NAS space, the maximum volume of size they can build is 16 TB…. I can’t even fit a genome onto one volume.”
That necessitates an “an extremely manual process of scaling up hundreds of these separate volumes—that’s an incredible amount of complexity and leads to poor utilization of the storage systems. It means you’re buying twice as much as you need. And you have a ton of staff micro-managing this daily.”

Isilon’s storage typically requires less people to manage the load balancing, scalability, and hardware upgrades, says IDC’s Greyzdorf. “The clustered architecture allows you to automatically load balance users and data across the nodes in the cluster. It also allows you to scale simply by adding more nodes instead of doing a forklift upgrade to the next model,” she says. Moreover, upgrading hardware is also seamless as the data are simply swapped to other nodes in the cluster. The system automatically rebalances the data across the new node.

According to VP marketing Ram Appalaraju, the Broad Institute formerly had three people managing the institute’s NetApp infrastructure full time. Now it’s about 1/3 of a person. At last year’s supercomputing conference, Broad Institute CIO Jill Mesirov said the institute had been using eight NetApp filers and 54 Sun Thumper filers before it moved 92% of its workflow onto Isilon. “Now they have 1.5 PB on the same cluster,” says Appalaraju. “Because we’re driving utilization up to 80%, our 1.5 PB is equivalent to 2.25 PB of our competitors. You get much more of your money’s worth.”

New Series
Isilon has added three new products to its existing X-Series nodes, providing a tiered set of storage solutions. To create and capture data, it has introduced the S-Series—the IQ 5400S—which allows customers to build clusters. This will appeal to “those who need high performance but don’t typically have a lot of data that need to be processed at the same time,” says Greyzdorf.

Patel says the S-Series “is going to be incredibly important for high-speed data analysis, such as coming off next-gen sequencers.” That said, Patel thinks the S-Series unit is probably faster than most of the apps currently being targeted in life sciences. But, “If we get to the point where the sequencers output grows another half order of magnitude than the current generation of machines,” then all bets are off. “I bet you we’re going to run into at least a few life science customers where the performance needs get up high enough we’ve got to go this route,” he says.
For archiving, it released the NL (nearline) storage IQ 36NL. The NL-Series disk-based deep archiving solution economically scales up to 3.45 PB within a single file system, at a unit price of about $2/GB, with 80% storage utilization. And rounding out the X-Series is the new top-of-the-line IQ 36000. The X-Series suits those with lesser performance needs but more capacity. It allows Isilon to scale up to 2 or 3 PB in a single cluster. It is a 4-U box (two processors per box) with 36 TB per node, that can be clustered together to create extremely large file systems with high performance. It provides up to 30 GB/second performance.

“From the data management perspective, we can uniquely tailor a particular product while still bringing in overall benefits of scalability,” says Appalaraju.Says IDC’s Greyzdorf: “It is designing to the need instead of trying to position the same product for all types of needs.”

Patel signs off with a flourish, pointing to the remarkable specification enhancements over the past few years that put Moore’s Law to shame. “If you look at 2010, our vision is to be able to build 10-, 15-, 20-PB clusters with performance that’s virtually unlimited, so we can take away the challenge of storage being an inhibitor to applications,” he says.

“Our key focus is around data management, data protection, and ensuring our customers have a well rounded offering—you plug it in, turn it on and it goes. From a hardware perspective, it means you’ll see us do things like integrate SSD drives instead of hard disc drives, and improve our hardware technologies. These are all things that are in the works.”  

5400 S-Series (create and capture)
• Enterprise Class 2U Isilon IQ Node
• 12 450-GB 15K RPM SAS
• 5.4 TB storage per node
• Dual, quad-core 2.33 GHz CPU
• 4 x 1 GbE interfaces
• DDR InfiniBand cluster Interconnect

IQ 36000 X-Series Node (Process)
• Enterprise Class 4U Isilon Platform Node
• 36 1-TB 7200 RPM SATA 3Gb/s drives
• 36 TB capacity per node
• Dual Quad-Core Intel 2.33 GHz CPU
• 4 x 1 GbE Front-End Interface
• DDR InfiniBand for intra-cluster communication
• 7-node minimum cluster configuration (252 TB)

IQ 36NL (Archive)
• Enterprise Class 4U Isilon Node
• 36 1-TB 7200 RPM SATA 3Gb/s drives
• 36 TB capacity per node
• Single, Quad-Core Intel 2.33 GHz CPU
• 1 x 1 GbE Front-End Interface
• DDR InfiniBand for intra-cluster communication
• 7 node minimum cluster configuration ( 252TB)

This article also appeared in the May-June 2009 issue of Bio-IT World Magazine.
Subscriptions are free for qualifying individuals. Apply today.

Click here to login and leave a comment.  


Add Comment

Text Only 2000 character limit

Page 1 of 1

For reprints and/or copyright permission, please contact  Jay Mulhern, (781) 972-1359,