Leveraging Private Cloud Technologies to Improve Prior Investments in Semantic Technologies and Data Warehouses

By Eric Little, PhD

December 5, 2012 | Guest Commentary | Pharmaceutical R&D divisions normally possess a significant amount of warehoused legacy data that bears directly on other areas of the enterprise, such as new product development, manufacturing pipelines, clinical studies, regulatory affairs, etc., where a better understanding and reuse of data can be beneficial from a cost and productivity standpoint. In large enterprises, it can be difficult for scientists, engineers, management and other stakeholders to quickly and effectively access pertinent data that allow for improved business decisions. Getting that data out of legacy systems can be difficult, which is one of the reasons that semantic technologies have been sought after for some years in the pharmaceutical and bio-tech areas.

The Value of Semantics

Semantic technologies provide a means to capture data in complex logical models called ontologies, which are often represented as data graphs comprised of nodes and edges. Ontologies provide a means to capture subject matter expertise that exists implicitly in peoples’ heads, organize it, and present it as a set of explicit statements about a given domain. Rules of inference and automated reasoning techniques can then be employed to run over those models and create new facts within one’s data. This provides a very flexible and extensible approach to data modeling, where new data sources can be easily added over time to meet the needs of growing organizations that rely on ever-changing scientific data. Many pharmaceutical companies have been working with semantic-based systems for some time and have served as rather early adopters of that technology with varying degrees of success.

The Limitations of Semantics

In spite of the advantages mentioned above, semantics has not been the “silver bullet” many had hoped it would be. While semantic technologies provide a sophisticated way of modeling complex relationships between data, the graphs that are created within semantic solutions can quickly grow to enormous sizes, given that they capture not only the elements contained within an enterprise’s raw data, but the added litany of related facts and relationships generated by automated reasoning, where 10-100 times as much new data can be generated from a single data source.

As an example, imagine taking one’s raw assay data on a given compound, then linking it to all known data about related clinical studies and phenotypic effects, as well as underlying genomics data. The ensuing data graph would quickly become unmanageable and that is only a small portion of the relevant data one may wish to interrogate for a given study. Semantic solutions alone do not solve the problem of providing people with the data they need to make better decisions across a large enterprise, rather, they can cause a logjam of computational resources making large-scale tractability nearly impossible.

Utilizing Cloud Technologies

Cloud technologies can be leveraged to improve the computational issues facing semantic systems by providing a scalable and computationally tractable infrastructure to support numerous computations over large data graphs.

A private cloud of this nature contains multiple physical machines which, themselves, each contain multiple virtual machines within them. This provides a highly dynamic and scalable back-end for enterprise level systems that can be employed to a greater or lesser extent based on the sophistication of the query at hand, the amount of users on a system, etc. A cloud-based semantic system allows for improved federation of virtualized computational resources whereby queries can be threaded across multiple nodes at once and data can be integrated from multiple back-end sources with relative ease, using standards-based approaches such as Apache Hadoop. Open standards help to limit vendor lock-in and reliance on a single approach or configuration, which may not be prudent in the long run as technologies evolve.

Real World Benefits of Semantics in a Cloud

One recognized benefit of semantic cloud technology is the ability to link disparate data from various points within the enterprise (R&D, manufacturing, clinical, etc.) and drive it to users’ personal computing devices via secured web services. Users, in turn, see a significant reduction in the amount of time needed to generate key reports for regulatory affairs, clinical outcomes, and business process monitoring (from weeks to seconds). Users have access to data in clear and understandable terms through the use of sophisticated user interfaces that can be customized to their needs and provide near real-time monitoring of systems.

As a direct result, clients recognize a greater return on their investments in data warehouses, semantic models and business intelligence tools, while at the same time reducing costs for future system integration. This is because new capabilities can be quickly and easily built on the base infrastructure, leveraging one’s past investments. The end result is an enterprise-wide application that allows our customers to utilize their data in new ways and improve their competitive position within the market.

Eric Little is the director, information management, at Orbis Technologies. He has published and presented extensively in the field of semantic technologies, and can be reached at elittle@orbistechnologies.com.