Transforming Information Into Knowledge In The Big Data Era

Contributed Commentary by Peter Derycz

November 13, 2017 | If the data scientists are right, more than 40 Zettabytes (or 40 trillion gigabytes) of digital data will have been generated by 2020. In addition to spurning a new field of study and analysis, this overwhelming amount of information, or the “Data Deluge” as it's called by some experts, is impacting the way life science researchers work.

Despite the overwhelming volume and complexity of scientific literature, and the myriad ways of accessing and managing it, the success of an entire research organization may depend on quick, efficient, cost-effective, and unfettered access to the information contained in reference publications. Simplifying access to the right information across the organization has become the mantra for the successful, research-driven enterprise — but it is only the first step in an enterprise-wide knowledge management strategy.

So, how do biomedical and drug discovery researchers effectively transform information into useful knowledge in the Big Data era?

The answers lie in how the magnitude of available information is being harnessed and exploited. With at least 50 million scholarly journal articles already filling information pipelines, and more than 2.5 million more added each year, the ways content is discovered and utilized by scientists and technologists working at millions of companies, must evolve.

Accessing Information Across The Scientific Literature Lifecycle

While a casual observer might assume that access to more data would make the researcher’s job easier, the opposite is true more often than not. The race to acquire information, and the urgency to publish, patent, deliver, and protect that information, is, in fact, putting new pressures on researchers to rethink their approaches to content creation and dissemination. Subsequently, the focus is on the scientific literature life cycle—that sometimes-daunting process of discovering, accessing, storing, publishing, and reusing scholarly content—and emerging technologies that facilitate easy access to information at each step along the way.

In a recent Science News article, author Tom Siegfried says that Big Data often conflicts with the scientific research workflow and subsequently published literature. “In medical research, for example, many factors can influence whether, say, a drug cures a disease,” Siegfried wrote. “Traditional medical experimentation has been able to study only a few of those factors at a time. The empirical scientific approach—observation, description and inference—can’t reliably handle more than a few such factors. Now that Big Data makes it possible to actually collect vast amounts of the relevant information, the traditional empirical approach is no longer up to the job. To use Big Data effectively, science might just have to learn to subordinate experiment to theory.”

Siegfried’s point is well-taken and hints at the challenge of making Big Data useful and visible in the course of scientific research workflow. The new paradigm will be to leverage large amounts of data augmented to scientific reference materials while improving scientific literature access across the organization at the same time.

The good news is that the constraints of science have the potential to make Big Data more reliable and useful, and having Big Data in the hands of responsible stewards like those handling peer review and scientific publication has the potential to address the problems Siegfried hints at. Kent Anderson, CEO of publishing analytics firm RedLink, describes the potential blending of skills in this way: “Scholarly publishing has often been associated with terms such as ‘old school’ or ‘traditional’. However, without the proper use of data, adapting to the trends of scholarly publishing isn’t possible. Becoming data-oriented is necessary for success. Where scholarly and scientific publishing can contribute specifically is in placing guardrails and confirming trust in the system. Trust is vital for the coming data environment. It means operating with integrity and good faith. Scholarly publishers are accustomed to forming bilateral trust relationships—with authors, institutions, and others. They can do the same around Big Data.”

The Breakthrough: Personalized Research Workflows

Beyond digital platforms and computing power, technologies that facilitate highly-personalized research environments that take the individual researcher into account for the first time will soon become mainstream. These customizable workflows will combine data insights integrated with reference-specific search engines, analytics and visualization tools. Far more valuable than simply presenting information, personalized access tools will help the researcher make the best of available reference data by pinpointing what’s important for their particular research project.

Looking to the near future, there will be a continuous shift toward workflows and working environments that comprise systems of related activities specific to the type of research being conducted. Access to scholarly articles and the extraction of key information at each step of the literature life cycle will become easier and less costly.

The scientific research community is more than ready for a new way of thinking when it comes to transforming the data locked within the increasingly large body of scientific literature into useful knowledge. Speaking from a researcher’s point of view, the revolution can’t begin soon enough.

Peter Derycz founded Reprints Desk and parent company Research Solutions in 2006 and has served as its Chief Executive Officer and President since its inception. He can be reached at pderycz@reprintsdesk.com.