Square Pegs in Round Holes?



By Thaddeus H. Grasela

Sept. 13, 2007 | A curious researcher stumbles upon a cache of data on the Internet and after a quick analysis discovers a dangerous public health threat.

Sound like the plot of a new medical thriller? In fact, it is the backdrop for a recent series of newspaper articles on the potential safety risks of Avandia, a medication for Type 2 diabetes mellitus. A meta-analysis performed on data from multiple clinical trials suggested that Avandia might increase the risk of heart attacks in diabetics. The report generated significant concern among patients and physicians about the safety of Avandia. In May, the FDA issued a safety alert for Avandia, but decided to keep the drug on the market. In August, the FDA announced that Avandia, along with other antidiabetic drugs, would feature new warnings on their packaging.

The idea of analyzing data combined from multiple clinical trials (i.e., meta-analysis) is an attractive strategy for monitoring and assuring drug safety. While each clinical trial is designed to answer a specific question (or questions) in a specific patient population, a meta-analysis of the pooled data from multiple clinical trials potentially provides a more complete picture of the risk-to-benefit tradeoffs. A critical consideration in the performance of a meta-analysis involves the choice of studies to pool. Logistical factors that play importantly into the selection of trials include the study designs, patient populations, and outcome metric captured in each trial. Thus, a successful meta-analysis requires the availability of comprehensive information about the types of patients enrolled in clinical trials, such as information on demographic characteristics, disease severity, treatment regimens, and use of concomitant medications among other factors.

The informatics challenges to performing a successful meta-analysis are some of the key driving forces for the pursuit of semantic interoperability and the development of data standards by organizations such as the Clinical Data Interchange Standards Consortium (CDISC). The efforts directed at data standardization come at an important time. The cost of drug development has soared in recent years, and challenges regarding drug safety, such as the potential for an increased risk with Avandia, have drawn the scrutiny of Congress.

Data standardization, as it is currently being practiced, involves bringing a group of experts together to share their experiences and personal perspectives with respect to specific concepts of interest. While this exercise is valuable in exploring nuances, a problem arises when the group moves to develop a standard definition. Often, instead of retaining the rich granularity revealed during the discussions of the concepts, the group moves to achieve consensus by developing a definition that satisfies a majority of the experts.

Imagine a group of experts called together to develop a definition for “happy.” Individuals drawing upon their recent experiences might describe feelings such as glad, content, cheerful, joyful, beaming, ecstatic, jubilant, and rapturous. The consensus definition (in this case drawn from the Oxford English Dictionary, ninth edition) might be “feeling or showing pleasure or contentment.” Unfortunately, the granularity that gave Shakespeare the tools to represent the human condition is lost in the consensus-forming process. So while current efforts at data standardization ensure that the primary statistical calculations for a study can be replicated, the loss of granularity reduces the ability to represent nuances that can be essential for the interpretation of future meta-analyses.

Premature Standards’ Problems
The desire of medical researchers to achieve the promise of semantic interoperability has created a sense of urgency for the development and deployment of data standards. This urgency provides the justification for distributing early versions of a standard, with the idea that the early versions will be improved in subsequent releases.

This rush to implement a standard has two important consequences. First, late adopters have a reason for holding back from implementation because of instability with the standard. Second, and perhaps more importantly, the early adopters are forced to use what is available — resulting in the emergence of different dialects in the accomplishment of tasks not anticipated by the initial version. This need to pound round pegs into square holes creates an obstacle in the pursuit of semantic interoperability because it is difficult to rectify these issues once a premature standard has come into widespread use.

The goal of semantic interoperability, which includes the goal of facilitating analyses across trials for drug safety assessments, will require several changes to the current strategy of data standardization. First, the short-term goal of data standardization must shift from a focus on promulgating standards to an emphasis on unraveling the meanings behind complex concepts. Second, the output of this process must then be encoded in a scientific ontology built on standard formats and methodologies for ontology development, maintenance, and use in order to foster the creation of principled ontologies. (Additional information can be found at www.obofoundry.org. )

Semantic interoperability may very well remain elusive for the foreseeable future. One approach to incrementally achieve this goal might be to adopt a short-term focus on developing a strategy to learn about ambiguities sooner so that we can get to a higher level of semantic interoperability faster. This process, known to informaticians as disambiguation, involves the unraveling of complexities that are often implicitly represented in a particular data standard term.

A growing number of ontologies are being created to address various scientific domains. Of particular importance to the complex data standardization efforts in the biomedical sciences is the implementation of a curation effort. This effort aims to consolidate the terms generated from disparate ontologies in order to ensure their reusability, and to ensure compatibility between neighboring ontologies.

This effort has been a critically valuable component in the development of the gene ontology for organizing and mining newly elucidating genomic information. New approaches to drug development must evolve if we are to see continued improvement in research productivity and drug safety. The move towards scientific ontologies as a basis for developing data standards is one approach to preparing for these changes and allowing for the evolution of the informatics backbone for the pharmaceutical and biotechnology industry. 

Thaddeus H. Grasela is president and CEO of Cognigen Corp. Email: ted.grasela@cognigencorp.com.

Subscribe to Bio-IT World  magazine.

Click here to login and leave a comment.  

0 Comments

Add Comment

Text Only 2000 character limit

Page 1 of 1



White Papers & Special Reports

sgi whp 2
Managing the Modern Genomics Data Flood
Sponsored by SGI

Managing and storing the perfect storm of multi-disciplined data pouring from next generation sequencers and other omics instruments is a central challenge in life sciences. Discover in this paper how the SGI ArcFiniti storage solution, optimized for unstructured genomics and life sciences data can: 

  • Reduce costs, proactively protect data integrity, and deliver the high performance I/O required for genomics data processing and analysis.  
  • Effectively manage capacities from 156TB to 1.4PB as a disk based, integrated hardware and software platform 


sgi - whp 1
Turning Genomics Data into Practical Insight
Sponsored by SGI

With worldwide sequencing capacity approaching 13 quadrillion DNA bases annually turning genomics data into knowledge is a true computational challenge. Read this paper and learn how the SGI UV coherent shared memory platform can:  

  • Speed results time while cost competitively tackling the most difficult computational problems across all omics disciplines. 
  • Push performance by scaling to extraordinary levels, up to 256 sockets (2,560 cores, 4,096 threads) per single system (one OS image). 

Provide support for up to 16TB of coherent shared memory in a single system image enabling extreme efficiency across a wide range of compute demands. 



accerlys-logo_2012_wh
New Complimentary Market Survey…
Collaborations and Communications Within Drug Discovery Research
Sponsored by Accelrys
This survey was conducted by the Cambridge Healthtech Media Group in January, 2012. It was sponsored by Accelrys related to their HEOS initiative to gather valid information around externalizing collaborative research while improving communications in the cloud. With 310 qualified industry respondents the survey findings reveal useful usage and trends patterns.  An insightful follow-on discussion and webinar related to this survey, and the HEOS by Scynexis SaaS portal is also available on the Bio-IT World website for complementary viewing.
 


Job Openings

tessella logo 
Scientific Software Engineer
Boston MA
$70,000 to $95,000
 
Apply at http://jobs.tessella.com   

oxford nanopore logo 


Early Access Collaborations ManagersClick here to find out more and apply   

Oxford Nanopore's GridION technology, VP, Sales and Marketing Click to  Apply  

For reprints and/or copyright permission, please contact  Tim McLucas, (781) 972-1342, tmclucas@healthtech.com .