High Performance and High Throughput
High-throughput genome sequencing, or nextgeneration genome sequencing (NGS), is being driven by the high demand for low-cost sequencing. NGS parallelizes the sequencing process, producing thousands or millions of sequences at once [1,2]. The latest NGS sequencers from 454 Sequencing [3], Solexa (Illumina) [4] and Applied BioSystems (SoLiD) [5] now routinely produce terabytes (TB) of data. For example, the SoLiD 5500xl produces in one run (~7days) over 4TB of data. With additional overheads of reference genome storage/access, and type of analysis to be done, there is a requirement for cost effective, high performance and high throughput clusters and storage to handle these tasks. The ultimate goal is to bring down the cost of genome sequencing to within $1K with a turn-around time of one week, enabling personalized genomics medicine to become commonplace. Currently, times vary from one week to four weeks depending on the cluster infrastructure, and costs are still high. Figure 1 below shows the current associated cost structure for a human-sized gen