YouTube Facebook LinkedIn Google+ Twitter Xinginstagram rss  

Of Data Silos and Sacred Cows

By Kevin Davies

Delegates at CHI’s third annual Bridging Pharma & IT conference share winning informatics strategies that span the drug discovery pipeline.

Nov. 13, 2007 | BOSTON — Of the many illuminating examples of the ways that life sciences and IT/Informatics groups and organizations can work effectively together, the problems that faced the Research Portfolio Group (RPG) at Pfizer were perhaps unique. “The [Pfizer] portfolio has become so large it’s hard to track information,” said Peter Thadeio, a project analyst in the recently formed RPG at Pfizer Global R&D in Groton.

But as Thadeio and his colleague Melinda Rottas showed at CHI’s 3rd annual Bridging Pharma & IT* conference last month, there is no substitute for ingenuity — even for an organization with a budget as big as Pfizer’s.

Thadeio and his colleagues were tasked with producing a system that would afford researchers and executives an overview of all compounds — active and inactive — in development from Phase I-IV, as well as properties such as compound history, structures, status, etc. The challenge was to integrate data from disparate parts of the company. “Discovery and development has been housed in totally separate systems... ne’er the twain shall meet,” said Thadeio. “We ran into a huge roadblock — the information was siloed.” As Thadeio’s group drew up solutions to merge the data, he received discouraging news. “We fell below the line [for funding],” he recalled. Another catch: “Failure to deliver was not an option!”

With no funds, Rottas and colleagues had to cobble together existing software — Business Objects (recently acquired by SAP), Microsoft Excel, Spotfire, and PowerPoint — to create a coherent stream of information that would allow data mining in a standardized format. The key goal was to develop a consolidated file that would contain both discovery and development data, easily updateable on a regular basis.

Step one involved using Business Objects and several data pulls to gather the relevant attributes on specific compounds into a single consolidated view. Next, an Excel data extraction was performed on a proprietary database, and merged with the Business Objects data pull. Excel housed the information where syntax issues that had grown over the years could be resolved (eg. GTN/Groton).

The next step used Spotfire to inspect the data for the presence of duplicates, which could be removed. Finally, the team populated a sequence of tabs with the updated information (from idea to registration) and built a series of formulae around the core data set, which automatically update. These formulae generate a consolidated file of portfolio information.

RPG now produces monthly reports containing graphs and calendars for 12 therapeutic areas, including current status and forecasting, said Rottas. There is interest in a dashboarding system, although licensing rights have been an issue, she said.

“Considering the size of the company and the complexity of the legacy systems, we consider this a ‘win’ to be able to make this happen,” said Thadeio. Rottas’ final words of advice? “You don’t need to be an IT expert to create these reports,” she said. “I’m a chemist by training.”

Reagent Records
At Merck, there was less concern about tracking drug candidates. “We track compounds like gold in Fort Knox,” said Vic Uebele, a research fellow with Merck’s neuroscience group at West Point, Penn. But the situation with reagents was not so good, especially plasmids, cell lines, and antibodies. It was like “pennies at the cashier convenience store,” he said. Each lab tracked its own reagents, often on paper. Uebele’s lab stored information on 2,500 plasmids in 3-ring binders. Problems were compounded by people moving, office relocations, “lost” reagents, legal restrictions, and duplicated projects. In addition to tracking reagents, there was a lengthy list of attributes for each reagent to be recorded, including sequence data, species, source, growth conditions, safety factors, and last but not least, location.

According to Uebele, the impetus for building the reagent tracker actually began with Merck research chief Peter Kim, who said, “You need to talk to Ingrid Akerblom [now Clinical IT] to get this project started.” As Lori Harmon, manager of drug discovery project support explained, the IT infrastructure was built out in three phases, beginning in 2005 by establishing the requirements for tracking cell lines — initially for the oncology division — moving onto plasmids and antibodies the following year. The only stipulation was that the back-end had to be Oracle. “The front end had to ‘feel good’,” said Harmon.

The formal bidding process involved three commercial and two internal systems. The final decision was to enhance an internal application, based on workflow and functionality, implementation time and cost, and ease of use and deployment. They opted for a distributed model, because of reluctance on behalf of many groups to part with local freezers.

The result — deployed in July 2007 — is the MRL BioStore. The web application has an intuitive “drill down” to track freezer inventory by racks, boxes, and individual samples. Boxes are ticked to check out vials from any freezer. Nomenclature is a challenge — the application uses both a standard Merck dictionary and a BioStore dictionary. The system, which tracks some 3000 materials, currently has some 500 users across 20 departments, including the new Merck facility in Boston, which uses BioStore to track every cell line.

No Sacred Cows
Founded in 1993, ArQule went public in 1996, building new technology platforms to identify drug candidates with structure-guided drug design. The company has so far pushed three oncology compounds into  the clinic.

According to Mark Ashwell, VP medicinal chemistry, ArQule has assembled over the years “a comprehensive toolkit of IT solutions for problems that are, on the face of it, often un-addressable for a 100-person biotechnology company.” The firm uses a wide variety of third-party tools, including Spotfire and Activity Base from IDBS. But as the company matured, and with everyone wanting a variety of software tools on their desktop, Ashwell said it was imperative for ArQule to carefully review its software needs as the company moved into the clinic while also carefully managing its resources.

ArQule turned to Tessella, a U.K.-based scientific software consultancy, to review its informatics systems. John Whittle, technical manager with Tessella, said the first phase was to identify the risks to existing systems, prioritize mitigation strategies, and identify unmet requirements. In short, what could ArQule do better? “No sacred cows, every system has to justify its existence,” said Whittle. Areas of priority included the workflow around discovery chemistry and the IT infrastructure, which needed to be brought up to current standards regarding data security and protection. One advantage, Whittle said, was that “conceptually, they don’t think in terms of different therapeutic areas.” A key priority for Tessella was not to damage ArQule’s internal “seamless data structure — don’t introduce silos.”

Jerald Schindler, VP late stage clinical development statistics at Merck, delivered a superb overview of adaptive trials — the notion of using data unavailable at the launch of the trial. The goal is to maximize information collected on effective drug doses, while minimizing that on non-effective doses (see Schindler Adapts to New Trials, Bio•IT World, June 2007).

Under conventional trial designs, clinicians and statisticians don’t learn whether they designed the appropriate trial until the results are decoded. Adaptive trials permit the exploration of additional doses, while the best dose can be selected for phase III.  The result is a merger of Phase I/IIa, focusing on safety and dose response, and Phase IIb/III. This should reduce time in the clinic from typically 5-10 years to somewhere between 3-7 years. No wonder the industry is excited.

Although the benefits are “really obvious, everyone should be doing it,” Schindler stressed it’s not easy. The ideal eCinical system needs two databases, one for data acquisition, the other for review and submission. The goal is to integrate drug supply, randomization, electronic data capture, and IVRS.

Sidebar: Tracking Genomics Data
The Harvard Medical School-Partners Healthcare Center for Genetics and Genomics (HPCGG) Laboratory of Moledcular Medicine tests genetic markers for hearing loss, cancer, and cardiovascular problems, among other things. From the patient to electronic medical record (EMR), the genetic data are gathered, processed, directed to a geneticist, then a clinician, and finally used in treatment decision-making. The workflow presented an IT challenge for Sandy Aronson, director of IT at HPCGG.

Through an “IT lens,” Aronson broke the workflow down into three components. The first “looks a whole lot like a manufacturing process support,” he said. To support the process of gathering samples from patients, processing them, and identifying genetic variants, Aronson and his team developed GIGPAD, or Gateway for Integrated Genomics, Proteomics, Applications, and Data. Like an enterprise LIMS superstructure, GIGPAD manages data from multiple labs through the analysis phase (See Harvard’s Personalized Medicine Gateway, Bio•IT World, Aug. 2005).

The second leg of the workflow, starting with raw data arriving at a geneticist, is knowledge management. Aronson and his team developed a combination of two systems. GeneInsight is the HPGCC database to “store correlations established between genetic variants and clinically relevant facts.” GVIE, Genetic Variation Interpretation Engine, matches the data gathered via GIGPAD with the information in GeneInsight, and generates a default report for the geneticist. Finally, a geneticist approves or adjusts the GVIE-produced report and enters it into the patient’s EMR.

Even with three custom systems, challenges persist. One of the biggest is evolving technologies. “We really want to make sure we can stand up IT support for technologies like Affymetrix microarrays as soon as possible,” said Aronson. In addition, the continuously decreasing cost of sequencing, and the corresponding increase in genetic variants identified and used in molecular diagnostics, means an ever-expanding target for researchers and clinicians. 

Aronson hopes that pharma may be able to help prepare IT systems for new instruments and tests breaking into the clinical realm, and that collaborations in the future might offer “some assistance with some of our key pain points.”  — Allison Proffitt

Subscribe to Bio-IT World  magazine.

Click here to login and leave a comment.  


Add Comment

Text Only 2000 character limit

Page 1 of 1

For reprints and/or copyright permission, please contact Angela Parsons, 781.972.5467.