YouTube Facebook LinkedIn Google+ Twitter Xinginstagram rss  

Square Pegs in Round Holes?

By Thaddeus H. Grasela

Sept. 13, 2007 | A curious researcher stumbles upon a cache of data on the Internet and after a quick analysis discovers a dangerous public health threat.

Sound like the plot of a new medical thriller? In fact, it is the backdrop for a recent series of newspaper articles on the potential safety risks of Avandia, a medication for Type 2 diabetes mellitus. A meta-analysis performed on data from multiple clinical trials suggested that Avandia might increase the risk of heart attacks in diabetics. The report generated significant concern among patients and physicians about the safety of Avandia. In May, the FDA issued a safety alert for Avandia, but decided to keep the drug on the market. In August, the FDA announced that Avandia, along with other antidiabetic drugs, would feature new warnings on their packaging.

The idea of analyzing data combined from multiple clinical trials (i.e., meta-analysis) is an attractive strategy for monitoring and assuring drug safety. While each clinical trial is designed to answer a specific question (or questions) in a specific patient population, a meta-analysis of the pooled data from multiple clinical trials potentially provides a more complete picture of the risk-to-benefit tradeoffs. A critical consideration in the performance of a meta-analysis involves the choice of studies to pool. Logistical factors that play importantly into the selection of trials include the study designs, patient populations, and outcome metric captured in each trial. Thus, a successful meta-analysis requires the availability of comprehensive information about the types of patients enrolled in clinical trials, such as information on demographic characteristics, disease severity, treatment regimens, and use of concomitant medications among other factors.

The informatics challenges to performing a successful meta-analysis are some of the key driving forces for the pursuit of semantic interoperability and the development of data standards by organizations such as the Clinical Data Interchange Standards Consortium (CDISC). The efforts directed at data standardization come at an important time. The cost of drug development has soared in recent years, and challenges regarding drug safety, such as the potential for an increased risk with Avandia, have drawn the scrutiny of Congress.

Data standardization, as it is currently being practiced, involves bringing a group of experts together to share their experiences and personal perspectives with respect to specific concepts of interest. While this exercise is valuable in exploring nuances, a problem arises when the group moves to develop a standard definition. Often, instead of retaining the rich granularity revealed during the discussions of the concepts, the group moves to achieve consensus by developing a definition that satisfies a majority of the experts.

Imagine a group of experts called together to develop a definition for “happy.” Individuals drawing upon their recent experiences might describe feelings such as glad, content, cheerful, joyful, beaming, ecstatic, jubilant, and rapturous. The consensus definition (in this case drawn from the Oxford English Dictionary, ninth edition) might be “feeling or showing pleasure or contentment.” Unfortunately, the granularity that gave Shakespeare the tools to represent the human condition is lost in the consensus-forming process. So while current efforts at data standardization ensure that the primary statistical calculations for a study can be replicated, the loss of granularity reduces the ability to represent nuances that can be essential for the interpretation of future meta-analyses.

Premature Standards’ Problems
The desire of medical researchers to achieve the promise of semantic interoperability has created a sense of urgency for the development and deployment of data standards. This urgency provides the justification for distributing early versions of a standard, with the idea that the early versions will be improved in subsequent releases.

This rush to implement a standard has two important consequences. First, late adopters have a reason for holding back from implementation because of instability with the standard. Second, and perhaps more importantly, the early adopters are forced to use what is available — resulting in the emergence of different dialects in the accomplishment of tasks not anticipated by the initial version. This need to pound round pegs into square holes creates an obstacle in the pursuit of semantic interoperability because it is difficult to rectify these issues once a premature standard has come into widespread use.

The goal of semantic interoperability, which includes the goal of facilitating analyses across trials for drug safety assessments, will require several changes to the current strategy of data standardization. First, the short-term goal of data standardization must shift from a focus on promulgating standards to an emphasis on unraveling the meanings behind complex concepts. Second, the output of this process must then be encoded in a scientific ontology built on standard formats and methodologies for ontology development, maintenance, and use in order to foster the creation of principled ontologies. (Additional information can be found at )

Semantic interoperability may very well remain elusive for the foreseeable future. One approach to incrementally achieve this goal might be to adopt a short-term focus on developing a strategy to learn about ambiguities sooner so that we can get to a higher level of semantic interoperability faster. This process, known to informaticians as disambiguation, involves the unraveling of complexities that are often implicitly represented in a particular data standard term.

A growing number of ontologies are being created to address various scientific domains. Of particular importance to the complex data standardization efforts in the biomedical sciences is the implementation of a curation effort. This effort aims to consolidate the terms generated from disparate ontologies in order to ensure their reusability, and to ensure compatibility between neighboring ontologies.

This effort has been a critically valuable component in the development of the gene ontology for organizing and mining newly elucidating genomic information. New approaches to drug development must evolve if we are to see continued improvement in research productivity and drug safety. The move towards scientific ontologies as a basis for developing data standards is one approach to preparing for these changes and allowing for the evolution of the informatics backbone for the pharmaceutical and biotechnology industry. 

Thaddeus H. Grasela is president and CEO of Cognigen Corp. Email:

Subscribe to Bio-IT World  magazine.

Click here to login and leave a comment.  


Add Comment

Text Only 2000 character limit

Page 1 of 1

For reprints and/or copyright permission, please contact Angela Parsons, 781.972.5467.