Data Virtualization Powers Next Generation Sample Intelligence

By Suresh Chandrasekaran

August 8, 2014 | Contributed Commentary | With more than one billion samples stored in biobanks around the world, research advancements, genomics and biomarker development, and reduced research budgets are all driving the need for unified sample intelligence. In the past, samples were treated as a commodity, but in today’s clinical environment you need to know where the sample is located, what it can be used for, and how it has been consented (i.e. how those samples and the intelligence associated with those samples can be used more broadly than initially planned). If these questions can’t be answered, then that sample cannot be used. As a result, there is a need to move from treating a sample as a commodity to treating it as a reusable or valuable scientific asset. Many research organizations are seeking technologies and related strategies to improve sample intelligence and optimization and one of the more popular solutions is data virtualization.

As translational medicine catches up to genomics and genetics, it is more apparent that research organizations need to associate sample intelligence with the potential use of any sample that's been acquired. Organizations also need to be able to better navigate among all of their data sets, not necessarily just the data sets that are associated with the physical sample.

Typically, samples used to remain within one particular trial, visit, or within one particular therapeutic environment. Today, we're learning that “future use consent” is allowing those samples and the associated intelligence to be used more broadly than even the researcher initially thought. Often times, samples from one study may hold some key biomarker information that might be relevant to another study. Therefore, it is critical that research organizations realize that even they don't know what the future holds, so it is important to make sure that samples and the related data are able to talk to one another. Disparate data must be able to be merged and analyzed in order to make predictions, and the users must be able to see and report comprehensively what's going on with their scientific assets in both the physical inventory environment and the virtual inventory environment which includes sample associated data.

Sample data is more about the attributes of the sample and typically includes details such as the way it's been consented, what type of sample it is, and the patient’s demographic information. Another dataset might contain both clinical information and patient data information, while the third dataset is focused on genotyping data. Often, sample data is housed in one database, clinical in another and genotyping data in yet another. Each database could be associated with a different vendor or legacy system.

In addition, there is often metadata that is related to the samples: the date it was collected, the date it was stored, and the location in the bin. While this intelligence is fairly structured, data such as lab notes taken in context with the sample are not very structured. The challenge becomes how to extract some structure out of that unstructured or semi-structured data (through keywords or parsing strings, etc.) and properly represent it as a relational data source so that it can be combined with a more structured data source.

Bringing that kind of structured data, internal unstructured content systems, and external data that has structure but is not easily accessible together has never been more important for research organizations. Equally important is that data scientists and clinical researchers need constant visibility into—and access to—all data assets.

Data Virtualization Enhances Sample Intelligence

Unified sample intelligence can yield improved time to market, speed to solution and greater flexibility in how research and clinical information is delivered. In terms of ROI, once you can make sample data assets more accessible and reusable, you automatically increase the return on those data assets. Data virtualization provides the flexibility and the agility to help life science organizations optimize sample intelligence, which ultimately results in faster, more efficient drug discovery. Researchers need the ability to identify which samples are meaningless or redundant, glean more genetic information and draw more correlations among the therapeutic groups. This is the exponential value of unified sample intelligence powered by data virtualization.

Additionally, data virtualization addresses the heart of the issue by providing researchers with an operational engine that provides inputs of data using either a dashboard approach, a reporting approach or a portal approach to navigate the data. The insight and analysis that is gleaned by data virtualization reduces research costs, time-to-market and the number of samples collected.

BioStorage Technologies, Inc., a global comprehensive sample management solutions company that provides information intelligence in the management of research samples across all phases of drug development, is now using data virtualization technology to help research organizations build a holistic "Single View of Research Data" across multiple R&D data services with minimal replication to preserve privacy.

Data virtualization powers their proprietary sample management inventory system, which helps to support agile research decision-making based on external and emerging data sources, as well as reusable data services for multiple applications and users. Further, the technology enables visualization of complete data sets to support improved asset optimization and faster go/no-go decisions to support drug development. As a result, research organizations can improve their overall global sample data integration; create 24/7 data visibility and access; easily connect to bioprocessing data; establish sample tracking consent; provide intuitive search and discovery of critical intelligence; access to data services; and easily create custom reporting.

Leveraging data virtualization, drug development teams and partners can create a sort of virtual data layer; really, a logical or canonical view of entities from disparate, structured, semi-structured or unstructured data sources. In the case of sample intelligence, the entities are samples, patients, informed consents, the type of disease, inhibitor or molecule, etc. Once you have created that virtual data layer, you have created unified access to the intelligence which can be served up to users in various ways and then reused as needed. This makes it easy for data scientists and clinical researchers to achieve real-time or right-time access to the intelligence that is located across distributed or disparate data sources. This valuable access also significantly reduces the need to constantly replicate the information, from say 10-20 times down to 2-3, which saves time and cost, and establishes an easier path to governance by applying security, access rules, etc.

In response to an increasing array of cost-reduction initiatives, efficiency objectives and safety drivers from the commercial, regulatory, and patient sectors, research organizations are embracing innovative technologies across the entire drug development continuum. The explosions in data volume and data complexity and new types of sources are certainly driving the need for research companies to become agile enough to incorporate new data sources and grow and adapt to the market quickly. Data virtualization enables these organizations to elevate sample data from a commodity view into a more strategic, reusable, and valuable scientific asset. By providing the flexibility needed to unify data, creating easy access to the data, and migrating all of the pertinent information together in a secure manner, life sciences companies can accelerate their drug development and research needs.

Suresh Chandrasekaran, senior vice president at Denodo Technologies, is responsible for global strategic marketing and growth. He can be reached at suresh@denodo.com.