What Should We Make of BaseSpace?

March 11, 2014 | As the market leader in gene sequencing, it’s not Illumina’s style to promise revolutions. Competitors want to overturn how we think about sequencing, but when Illumina announced it had delivered the world’s most powerful sequencer this January, with the launch of the HiSeq X, it was done by incremental refinements to the company’s existing technology. But faster sequencers won’t solve the biggest bottleneck in genetics, which is interpreting the wave of nucleotides that come out of the instruments. The industry’s new race is to build bioinformatics software into the sequencing pipeline, to make sense of the raw data as efficiently as possible. Here, in a strange twist, it’s staid Illumina with the radical vision for transforming how sequencing is done.

BaseSpace, Illumina’s app store for analytics, has now been open for 18 months. Unlike the enterprise solutions of competitors Life Technologies and QIAGEN, BaseSpace outsources its innovation, asking users to work with tools from a multitude of contributors. The vision is of an evolving platform that can collect state-of-the-art tools in a single space. But with each app built separately, and with a different provider’s interests in mind, will BaseSpace be able to serve the high-volume labs whose complex, flexible workflows are the biggest use cases for genetic informatics?

Bio-IT World contributor Aaron Krol spoke with Jordan Stockton, Illumina’s Director of Marketing in Enterprise Informatics and an early contributor to the company’s vision for BaseSpace, to discuss some lingering questions about the project.

Bio-IT World: Illumina relies on third-party developers to add new tools to BaseSpace, one or a few at a time. Are you concerned that this will cause BaseSpace to lag behind in creating a complete home for genetic analysis?

Jordan Stockton: We looked at all of the types of solutions that are being delivered by other partners, and concluded that engaging a community to build different types of applications was the only way to keep up with the progress in this field. And only by engaging both the open source community and commercial developers together would we be able to provide a broad enough offering, in the long term, to attract people to the platform at all. So that was very strategic, and it’s all about application breadth.

In the short term, I think the only time lag is getting the ecosystem built. We’re actively working with development partners in the commercial and the academic space, to bring together people who complete different parts of the workflow. So if you provide a variant caller, you need there to be a suitable aligner upstream of it. I don’t think there’s going to be a single inflection point for the whole industry. I think there will be individual inflection points for different users. For instance, with the most recent release of Core Apps, if your primary function as a lab is to produce RNAseq data, it will be very capably enabled by the RNAseq apps that are part of the BaseSpace Core Apps. There are other types of users who have needs that are vast and diverse, and those customers will be served better and better over time as the diversity of applications grows.

In BaseSpace, an informatics workflow is likely to travel through tools developed by multiple companies. How can users structure these apps into workflows? Can those workflows be easily stored and repeated?

Application providers are increasingly wrapping more and more of a workflow into a single end-to-end solution, but I think that’s not really a long-term solution. The longer-term solution speaks to our roadmap going forward, and that’s to provide ways to not only capture the state and the options you use to run individual apps, but string them together in a very systematic way. We’re trying to do this while maintaining an ease of use, and a point-and-click user experience, that I think is unique to BaseSpace. That’s a challenge, but that’s certainly the direction we’re heading over the next year – this notion of a workflow that is both repeatable and connects multiple different applications.

Illumina lets app developers charge for use of their apps, which is an important incentive to bring developers to BaseSpace. Will it be an obstacle for high-volume users to navigate multiple layers of payment?

I don’t think so, and I would look to other industries who have done similar types of billing systems. If you look at the iCredit system, the notion is that you have pre-purchase credit that you use over time, and I think that’s very similar to the way the telecommunications market lets users prepay for different types of phone and text and messaging services, and also very similar to the way the cable industry sells a subscription level to content, and then pay-per-use content. What’s interesting about this is that, for high-volume labs, the per-use cost is incrementally very small, but it also allows them to model their costs exactly, because it will scale with utilization.

A lot of the early apps that BaseSpace has attracted so far are “teaser” apps, which give users free access to a limited piece of the developer’s more comprehensive enterprise software. The developers see this as a way to reach the huge user community around Illumina’s instruments, without giving away the store. Do you worry that there’s an incentive for companies to hold back in creating their BaseSpace apps?

At the end of the day, we’re trying to enable customers to do more with genomic information, and that’s fundamentally good for Illumina. And if they’re able to do that in, for instance, desktop solutions that are connected in teaser form to BaseSpace, I think everybody wins. The value that we’ve already started to see recognized by our commercial apps partners is that BaseSpace allows them to take a piece of their app, and put it literally a click away from the largest collection of Illumina sequencing data that exists in the world. We have a couple different apps partners who started with a teaser app, who are now building out more and more complete, higher-value, higher-cost solutions right in BaseSpace. We really like that trend.

From a convenience standpoint, it’s absolutely a goal to provide as complete a solution as possible in BaseSpace, and that’s really about the logical goal of bringing the analysis physically close to the data. There’s a big data problem when you have to move data to the analysis component, so from a customer experience standpoint it’s absolutely critical to provide as complete a solution as possible without having to move data around.

A recent major change to BaseSpace was the launch of the Native App Engine in November. Has that been successful in drawing new developers to BaseSpace?

Absolutely. The early part of the pipeline is better than it’s ever been in terms of bringing new developers on, and I think you’ll see that in the coming year. The other thing that it’s done, more importantly, is brought on a whole new class of developer. Whereas our previous API required you to host your own system, and so was more enticing to commercial entities, what we’ve really seen is an onboarding of academic, open source providers, who just want a place to prototype their tool – and then, if it works, a place to popularize their tool. For academic researchers in the field of informatics who are looking for a way to popularize what they do, I think there’s no more powerful way to put their expertise close to where the data is.

A selection of the apps available in the BaseSpace Store. Image credit: Illumina

Cloud platforms for bioinformatics are becoming increasingly popular, and companies like DNAnexus have been popping up rapidly in the last year to capitalize on that trend. One thing these platforms have stressed is that private clouds and encryption measures are available for users who have to worry about data security. By contrast, BaseSpace was open for over a year before Illumina announced the launch of BaseSpace OnSite to provide those services this January. Why the delay in building private clouds into BaseSpace? Was this kind of use case less of a priority at Illumina?

No, I think it was about product evolution. One thing to remember is that BaseSpace OnSite is literally a copy of the code stack that works on BaseSpace Cloud. We wanted to preserve a user experience in both environments. So we could easily have rolled out and cleaned up an existing pipeline, and called that a local cloud, but we wanted to make sure that the exact same user experience could be rolled out in BaseSpace OnSite. Everything from the user interface to, in the future, the Docker-based ability to roll out Native Apps, is available on someone’s institution. And those weren’t available until this year. So we wanted a complete user experience that we could call BaseSpace, and then we wanted to clone it in the local appliance. You could think of it like commercial appliances, such as a Tivo or a video game unit. This is a unit that does genomic data processing, data storage and interpretation.

How do you expect the use of BaseSpace to evolve as new apps are added? Do you think a typical foray into BaseSpace will look very different a year from now than it does today?

Two things are going to be different a year from now. One is certainly the diversity of apps, and the inclusion of apps from the academic community, and you’ll see that grow. You’ll see a two-pronged approach, where there will be more apps that represent the core use cases from our customers, and then a growing diversity of apps from both commercial and academic partners.

The second is, as practices become standard in different markets and different use cases, what you’ll see is workflows that start further up in the sample prep space. One of the features that was released early this year was the BaseSpace Prep Tab, and it allows you to organize samples into pools, and pools into runs, in a way that makes sense, right on BaseSpace – and then kick off that run from the instrument, having all that predefined knowledge about how your samples and pools are set up in BaseSpace already. So you’ll see workflows that start upstream of the actual sequencing itself in BaseSpace, and go linearly downstream to places that are farther than most of the apps go into interpretation and customized reporting, all in a very linear, simplified workflow.

In addition to new apps, will there be structural changes to BaseSpace in 2014?

The most exciting thing, and a thing you’ll see in the near future, is a tighter connection between what we’re doing in BaseSpace, and what we’re doing in the NextBio interpretation, analysis and understanding platforms. You’ll see programmatic and data processing links between those, and they’ll become more integrated. In general, we think about the world in terms of information producers, who have sequencers and analyze them in BaseSpace, and people on the other side, who potentially may only consume data. They are fundamentally interested in using that data to make some sort of decision. And that’s what the NextBio platform represents, a place for information consumers to look at reports and graphical outputs. We’re really excited, from the perspective of making a seamless workflow, about connecting the BaseSpace workflow to the interpretation and reporting workflow that NextBio has. That’s a huge effort right now.

We also, almost a year ago, published pricing for storage, and we just haven’t turned it on. And the goal there is a lot of things. We want to give people an opportunity to use BaseSpace before we start charging them, but we also wanted to set an expectation for what the long-term costs of using BaseSpace will be. I think 2014 will probably be the year we start charging for both storage and processing.

Is there a pitch you’d make to high-volume users and labs to choose BaseSpace over another informatics environment?

Certainly if you’re a core facility, and you’re serving multiple customers, there’s probably no better dissemination vehicle than BaseSpace to get your data to customers, and in customers’ hands. That’s just a key market. Our ability to share data without actually moving where that physical data lives, and the ability to do that in a way that’s integrated with the instrument, provides an economy and an efficiency for a core lab with multiple customers that I don’t think anyone else can match.