It’s Time for Wheels on the Big Data Sports Car

By Chris Molloy

April 30, 2013 | Guest Commentary | Our ‘always-on’ digital world has made data a key resource which is fundamentally changing our lives. We’re swimming in it—oceans of it. Whether or not IBM’s view that 50% to 90% of the world’s data has been generated in the last two years is totally accurate, there’s no doubt the amount of genomic data, peer-reviewed journals, image data, secure de-identified clinical information and telemedicine data is astonishing. So why, at a time when global research and development investment is over $1.5 trillion and is more externalized than ever, are we still bemoaning a lack of open access to decision-making data and an innovation deficit? Big Data is here, so how do we make sense of this brave new world?

This collaborative data ecosystem has now become an increasingly globalized, multiparty, multidisciplinary environment. The information we generate and share continues to grow exponentially so we must urgently rethink how we use and further exploit both our internal and the new global communities to increase R&D productivity. The bottom line is that it’s high time we got more bang for our buck.

Becoming leader of the data pack

The world of R&D-centric companies revolves around the creation, use, and monetization of information—a capital asset. As data are added to, interpreted and shared, they become increasingly complex and valuable. This process is supported by various teams, each providing skills and guidance to move products from inception to delivery. It can be viewed as an ecosystem of ideas, data, and information.

This ecosystem is, however, too often unstable, stifled by ineffective collaboration. Researchers commonly use multiple, disjointed systems to capture, compute and structure their data. The priority of any R&D function must be to maintain the flow of ideas. How can we enable communities to connect and to collaborate through data? How are we to gain insights and knowledge from this ecosystem? The answer is not just through new enterprise analytics but also through improvements to underlying data quality, provenance and availability.

The use of multiple nodes and processes, coupled with Infrastructure-as-a-Service capability is now established in financial and genomic analysis. It is a fallacy to believe that R&D is all about Big Data storage and algorithms alone. The bigger picture is that competitive advantage comes from those who first find and fill the data gaps. Those who most rapidly appreciate the fearsome competitive edge that harnessing their data gives them, can surely end up pack leaders.

Data needs context

The secret to a valuable knowledge ecosystem is to capture contextual data at their source, enabling all the clever people who collaborate to have secure access to what everyone else is doing and—as importantly—how they are doing it. Capturing metadata including instrument used, sample preparation methods, analysis parameters, and observational data is essential to R&D. Contextualized data needs to be stored alongside its ontology and provenance. Data provenance arises from being able to establish who generated the data, how they did it and a full audit trail of any modifications. This is what enables data to be compared and used effectively, weighed against

competing data properly and quality controlled.

Tag it

Scientific arguments should be peer to peer, not paper and PowerPoint. The social interaction of researchers is a vital data asset; it is context that is too often ignored as transient, but is the most important information to persist. To capture the intelligence of the community as they interpret and challenge community data, we need to harness effective social tools close to the experiment. A recent report from McKinsey discusses the untapped value from social technologies lying in “improved communication and collaboration within and across enterprises.” This idea of social media as a serious business tool is an important one. To enable this interaction, adapt what are now social norms such as tagging, commenting, and easy sharing across the R&D landscape. Social capability, delivered inside existing scientific data tools, will engender confidence, trust and security within a peer group so let’s embrace it before more valuable insights get lost.

Set science free

In the new world of Big Data, it’s time to free up science and our scientists. Let’s break out from the entrenched view that R&D is a linear progression through basic research, new product discovery, regulated trials, and manufacturing. Let’s reflect the reality that data, information, and knowledge are created through complex interacting processes that span research, design, development, patent filing, manufacture and post-market.

Big Data is too often a fabulous sports car powered with crude oil, not distilled fuel. Its success is limited without access to high context, connected stores which can effectively aggregate and assimilate data to generate a high quality information landscape. Ensuring that we capture, structure and store the right high quality data from the bench to the boardroom is not just nice to have, it’s essential. And the tools we use must enable this now, not somewhere down the line. Embracing this will allow federated high quality stores to share their treasures with relevant decision-makers, enterprise analytics and communities. Then we’ll start seeing powerful decision-making data and some truly amazing innovations. All delivered at lower cost and within tighter timescales. Now that’s a brave new Big Data driven world I want to see!

Chris Molloy is VP Corporate Development for IDBS, and has had a twenty year career in the life sciences and high tech sector. He can be reached at CMolloy@idbs.com.