By Catherine Varmazis
March 1, 2008 | In a major collaboration, researchers from the Cancer Institute of New Jersey (CINJ), several U.S. universities, and IBM are creating grid-enabled tools that perform high-throughput analysis of tissue microarrays to dramatically improve the accuracy and speed of cancer diagnoses.
“Years ago, most patients would go through the same treatment: chemo, for example. If it didn’t work they’ve move on to drug 1. If that didn’t work they’d try drug 2, and so on,” says David Foran, director of the Center for Biomedical Imaging & Informatics at CINJ and lead investigator for the project. “Now we can bypass all these trials and go directly to what therapy is most appropriate based on [a patient’s] expression signature.”
The project, which received a $2.5 million grant from the National Institutes of Health last October, makes use of tissue microarrays, pattern recognition algorithms, and grid-based supercomputing.
Foran says each tiny tissue plug on a microarray contains different types of tissue, and that software that can distinguish between these heterogeneous bits and detect the presence of a specific cancer biomarker. “If it’s present we have computer vision techniques and software that we’ve developed which will tell us if it’s located in a specific tissue or in a certain sub-cellular compartment, like the nucleus or cytoplasm. All of these things have bearing on the clinical outcome of the specific patient we’re looking at.”
To conduct the proof of concept required for funding the project, CINJ researchers took a set of “retrospective studies” of over 100,000 patient tissues for which the diagnoses were already known, and analyzed them using their specialized software. Programmers from IBM grid-enabled the software and ran the analysis over the World Community Grid (WCG) — a virtual supercomputer established by IBM. Computation of this magnitude would have taken a single desktop computer 2900 years to complete, but it took the WCG less than six months, says IBM’s Robin Willner, VP global community initiatives.
When the analysis was complete, “We were able to compare the signatures we had generated and that we hoped would correlate with different stages and types of disease,” says Foran. “We compared them with the patient outcomes and profiles in terms of diagnosis and histologic types and found there was a very strong correlation.”
Foran now plans to expand the number of disorders being investigated, grow the reference library of expression patterns, and build a clinical decision support system so oncologists at cancer centers around the world can download the CINJ client and analyze their own tissue specimens. The computation will be done on caGRID, an open source software infrastructure that has been developed as the main grid architecture of the NCI-sponsored cancer Biomedical Informatics Grid (caBIG) program. In addition, IBM is donating a high-performance supercomputer to the CINJ’s new Center for High-Throughput Data Analysis for use in examining the digitally archived cancer specimens and genomic data.
Joel Saltz, professor and chair of the Department of Biomedical Informatics at Ohio State University (OSU), where most of caGRID has been developed, says, “One of our roles in this project is to develop a caGRID-compliant infrastructure that supports the data and algorithms [that Foran’s group developed] so the tissue microarray and virtual slide data can be integrated with other kinds of experiments and translational research data types.”
For data from different data sets to be compatible, there has to be a mechanism for standardizing the naming of biological terms and another for standardizing how complex data structures from different types of experiments are represented in XML schema. Saltz’s group is developing standard data models and well-defined biomedical ontologies that will be harmonized with the caBIG processes, to avoid isolated “information islands.”
“The caGRID infrastructure is designed to connect databases as well as computational procedures, so it’s like having a worldwide programming environment of databases and procedures,” explains Saltz. “But for this environment to work, you need to know... what the query language is, and that’s where all this language and ontology stuff is, because otherwise if I tell you, ‘We’ve got this wonderful tissue microarray environment, feel free to use it.’ You’d say, ‘Well, thanks, but how am I going to find out how to? And what do you have in there?’”
The complexity and scope of this work made multidisciplinary collaboration involving many organizations essential. “A lot of big science today requires a lot of different levels of expertise,” says Foran. “In fact, when we received our critiques from the NIH, they stated explicitly that this group of individuals [involved in the project] is unique in what they bring to the table.”
Although still in the early stages, the tools are already being used by oncologists at CINJ. The plan for the coming year is to have a prototype system up and running that will be deployed at Arizona State University, Rutgers University, the University of Pennsylvania School of Medicine, Ohio State University, and the CINJ. “That will serve as our testbed for iterative prototyping, and then within the next three years, we’d be constantly updating the software as it becomes refined and optimized and we’re hoping we’ll have a product to put out to the research and clinical communities by year 4,” says Foran.
This article appeared in Bio-IT World Magazine.
Subscriptions are free for qualifying individuals. Apply Today.