August 2, 2011 | Insights / Outlook | Todd Smith is the senior leader of research and applications at Geospiza, now part of PerkinElmer (acquired May 2011). As senior leader, Smith helps develop the company’s research roadmap around high-performance computing and ensures Geospiza’s GeneSifter software scales to meet the future demands of high-throughput sequencing systems. Smith was interviewed by Insight Pharma Reports for its latest report on next-generation sequencing. Here are some extracts from that interview.
On Geospiza’s cloud strategy: I guess we’ve always felt the cloud would be very important, so we’ve always had that as part of our strategy. Going forward it becomes more of a technical implementation of our strategy. So we’re not saying we have to do more of this or less of it. In our marketing, we probably stress the word “cloud” more than “application service provider” or other terms we used to employ. So in that sense, we’re going with the flow, but the cloud’s always been very important in our strategy, because in general IT costs certainly can be prohibitive in getting started with next-generation sequencing. So when people try out cloud services and do some experiments, I think they definitely find some scale issues. When they have a data center-size operation, they need to consider accessing someone’s hosted service center versus building their own. I think those are the kinds of things people consider, and we need to consider those things as we mature and increase our business. But I’m going to call them technical implementation issues. How do you offer more services at a lower cost? That’s something we focus a lot of energy on.
There’s an appeal to being able to use cloud services for data storage, most importantly for backup and for the infrastructure that goes with maintaining the data. In our cost structure, the way the fees work is transaction-based, so it’s focused on the analysis.
On third-generation sequencing systems and the way informatics deals with the data: There will be new problems to solve. I think at one level they will be incremental problems. One of the very interesting features of Pacific Biosciences’ system is the ability to produce very long sequences, and a lot of alignment algorithms that are now doing very high-throughput work are dealing with short sequences. So people have to adapt those tools to handle longer sequences, and they will. There will be strategies to deal with that. I’m a little less familiar with Oxford Nanopore in terms of the kinds of data that are coming out. But largely, they are producing bases like any other system. There is 20 or 30 years of alignment experience now in the collective community, and if you considered that by individuals it would be many hundreds of years of cumulative experience. People will solve those kinds of problems.
Some interesting work is going on using MapReduce kinds of technologies to make these things super-scalable. Each new instrument is going to produce new varieties of the data that people will need to deal with. I don’t think any of these are going to limit adoption of the technology or be intractable given the vast amount of experience that now exists. What is a challenge is what to do with those alignments. How do you then go that next step and summarize and visualize the information contained in the large bodies of data? I did a FinchTalk post in which I talked about Illumina’s new HiSeq instrument and recent articles about cloud computing. Often these conversations focus on the alignment challenges, and yet there’s a far greater challenge, once you’ve done those alignments, with using that body of information to understand what your data means. That’s where I think we’ve done a particularly good job, and people like what we’ve done.
On library preparation for sequencing: The benefits of next-generation sequencing override the library preparation difficulties, and this has been demonstrated in literature. We certainly see it in plenty of examples. Compared to microarrays, you’re going to get a higher dynamic range in terms of the sensitivities, so with next-generation sequencing you get at genes that are less expressed. Also you don’t blow out your signal, if you will, so you can measure high levels of expression to a finer degree. But more importantly with microarrays you can only measure with the probes you have on that chip. With the next-generation sequencing we’re finding that there are many regions of the genome that aren’t annotated and are showing expression. These sequences are not on today’s microarrays so I have a [chance] to discover new genes, gene boundaries, and exons through next-generation sequencing. This is information which until now you couldn’t get in a microarray experiment.
Having said that, there are artifacts that you can see in an RNA-Seq that you’d never measure in a microarray, and those can get in the way. Ribosomal RNA is an example. It’s very important to have good preparation methods to remove those contaminating molecules. So there are some trade-offs. One of the nice things is that since we have a LIMS product, we can start to capture laboratory information about the experimental process. Our strategy integrates that laboratory information with the analytical information so that people know more quickly whether their experiments are on track.
Further reading: Next-Generation Sequencing Gains Momentum: Markets Respond to Technology and Innovation Advances. June 2011.
This article also appeared in the 2011 July-August issue of Bio-IT World.