Harvard, Coriell biorepositories serve as models for the increasing role in facilitating translational medicine.
By Chris Kronenthal
May 12, 2008 | As personalized medicine emerges as the gold standard in health care, an important but largely overlooked component of the necessary architecture is the biorepository. Predominately utilized as specimen storage depots, biorepositories (or "biobanks") are evolving into robust hubs of genetic and genomic discovery, information brokering, and providers of genetics-based services to health care enterprises.
Personalized medicine aims to provide targeted therapeutics based on knowledge of an individual's environmental and genetic factors, including genetic predispositions or treatment-related facts, such as a patient's particular drug response. In principle, this will improve successful treatment rates, as well as reduce long-term costs due to the impact of preventative measures and individually tailored treatments.
A crucial foundation for personalized medicine is the DNA-containing physical specimen from the patient, anything from saliva to a tissue biopsy. Another major building block includes the patient's phenotypic and environmental data, including allergy, and medication data. As the understanding of genetic and environmental interactions increases, so will the value of more detailed phenotypic data.
Somewhat less visible, but equally important, is the largely technical side of personalized medicine: the "knowledge pool" requirement. Ideally, this amalgamated resource provides population-based, contextual evidence for any patient's data. Its key elements include lists of known variants (monogenic and polygenic), genotypic and phenotypic relationships, as well as biomarker information. Additionally, this knowledge pool should contain links to popular public databases such as dbSNP and PubMed. For example, at the Harvard-Partners Center for Genomics and Genetics (HPCGG), this knowledge pool has been customized to include pre-configured data sets to aid geneticists in generating reports for clinicians to use during patient care.
In recent years, several major biorepository initiatives have been launched, including the National Biospecimen Network Blueprint by the NIH (2003), the National Cancer Institute's Office of Biorepositories and Biospecimen Research (2005), and the Pilot phase of the UK Biobank (2006).
More recently, one of the oldest and largest biorepositories, the Coriell Institute for Medical Research, joined the personalized medicine initiative. Coriell's ambitious Delaware Valley Personalized Medicine Project (DVPMP) will catalogue the phenotype and genotype data of at least 10,000 participants in order to promote genomic discovery and personalized care. Coriell's president and CEO, Michael Christman, left Boston University's Genetics Center after recognizing the critical niche that biorepositories would fill in personalized medicine. Says Christman, "Biobanking is one of the cornerstones of personalized medicine. If Coriell did not already have an established, high-quality biobank, we would have to build one for the Delaware Valley Personalized Medicine Project."
Specimen Processing & Management
In the era of targeted therapeutics, health care entities will recognize the need to store original biospecimens, utilizing them for future discovery. Reanalysis of a particular patient's biospecimen will become increasingly important, as researchers seek new genetic variants and genotypic-phenotypic relationships.
In order to manage, store, and redistribute physical specimens on the scale required by a large health care enterprise practicing personalized medicine, Coriell and other biorepositories employ bar-coded specimen tracking, robotics, and high-throughput quality control systems. For example, Coriell's cryogenics department currently manages over 3.5 million frozen biosamples, adding 50,000 new samples per year. Overall, the number of biorepository-managed specimens reaches hundreds of millions, supported by over 300 facilities nationally and 60 facilities internationally.
Forecasting the need, biobanking leaders have already established customized, comprehensive biorepository management and clinical data systems. Colossal systems such as Coriell's "Queue" (Q), Gene Logic's GX TRIMS (Tissue Repository Information Management System), and HPCGG's Gateway for Integrated Genomics-Proteomics Applications and Data (GIGAPAD) have replaced the tried-and-trusted tracking systems researchers typically employ for localized biospecimen management, such as Microsoft's Access and Excel, and even plain old paper.
The Q system, which has been tracking Coriell's samples for several years, covers diverse facilities such as molecular biology, cytogenetics, cell culturing, genotyping, and microarray services, in addition to essential processes of sample distribution, and workflow management. The Q system also contains all necessary metadata to interact with various databases and ontologies in the public domain, including Online Mendelian Inheritance in Man Database (OMIM), LocusLink, Gene Ontology (GO), GeneCards, dbSNP, and PubMed. By serving as a federated (or virtual) database, the Q system is able to jump-start the fact-finding chores necessary to provide targeted therapeutics, acting as a knowledge pool for the biospecimens tracked within it.
The GIGAPAD system at HPCGG (See, "Harvard's Gateway," Bio•IT World, August 2005) boasts rapid integration into the clinical realm where it interacts with other clinical infrastructure components, including a database of genotype-phenotype correlations (GeneInsight) and the Genomic Variant Interpretation Engine (GVIE). By combining GeneInsight and GVIE with GIGAPAD, HPCGG can guide geneticists through receiving a patient's genotypic information, verifying and supplementing that data, and then pushing that data into a customized report, which is given to a physician.
"We are committed to doing all we can to make personalized medicine, in all of its facets, reach its full potential," says Sandy Aronson, HPCGG director of Information Technology. "For the past five years we have been heavily investing in IT support for biorepositories, laboratory processing, secure patient genetic profiles in the electronic health record, and genetic aware clinical decision support infrastructure." In particular, Aronson cites the support received via a collaboration with Hewlett Packard (HP).
Seeking to understand its upcoming health care software and data-integration challenges more clearly, and without the dual health care/biobanking environment afforded HPCGG, Coriell's personalized medicine team is collaborating with a major health care institution.
Coriell has established partnerships with hospitals including Cooper University Hospital, Fox Chase Medical Center, and Virtua Health. With Coriell supplying the biobank, and partners bringing the health care, the consortium hopes to build a large-scale personalized medicine implementation. Say's Cooper's CMO Simon Samaha, "The outcomes of this study have the potential to revolutionize the way in which medical care is delivered. Access to important genetic information will improve the delivery of medical treatment while also reducing costs."
Genotyping & Data Management Infrastructure
Most health care providers obviously lack the technical and financial resources of an HPCGG or Coriell. Moreover, every 92 samples that are genotyped using the latest 1-million SNP chips produce nearly 100 Gigabytes (GB) of raw data, or 1 Terabyte (TB) of storage for every 1,000 patients. Extrapolate to the amount of patients that visit a major hospital each year, and the numbers become staggering. While much of the raw data can be discarded once analyzed, that "holding tank of data" still needs to be established.
To tackle such problems, Coriell pledged over $5 million towards its personalized medicine project, mostly to establish a state-of-the-art genotyping core and expand its current 30TB of data storage to 120 TB. This new Genotyping and Microarray Center consists of twelve FS450 Affymetrix fluidics stations and three GCS3000 scanners, allowing processing of up to 2,000 DNA or RNA samples per month.
For managing the data, Coriell rewired two floors of its building to implement a completely dedicated fiber-optic network that backbones directly into a HP EVA 8000 Storage Array Network. Why the redundant network? The output of the raw genotyping data would consistently cripple any other network-based applications on the existing 10-Gbps Ethernet-based network. That performance concern is even more serious in a health care setting, where systems may be providing life saving information.
Josefina "Fina" Nash, Coriell's head of IT, is in charge of developing the applications and systems necessary to manage and safeguard the data associated with Coriell's personalized medicine project. "Establishing a secure infrastructure has been an early priority," she says. "There are three principle constituents that will utilize the information systems, namely the participants, researchers, and medical professionals. We have to ensure that the appropriate measures are in place to protect the privacy of the participant's genetic data."
Through its partnership with HP, HPCGG has also upgraded its storage and processing infrastructure, adding a 64-node Cluster Platform 4000 and an XC3000BL Cluster. "Important discoveries are being made and that is why it is important to have a close relationship between medicine and IT," says Raju Kucherlapati, scientific director of HPCGG. Moreover, HPCGG is participating in the i2b2, Informatics for Integrating Biology and the Beside initiative (www.i2b2.org). Biorepositories will be providers of choice for the raw materials utilized in such a project.
By outsourcing to a large biorepository, health care entities are free to participate in the benefits of personalized medicine without huge costs. While some may be concerned with the exchange of genetically oriented data between a biorepository and another entity, some organizations are already blazing the path. A recent article by IBM's Amnon Shabo and Dolev Dotan describes the advantages of its bleeding edge Clinical Genomics Level Seven (CGL7) web services technology platform. Based on Health Level 7's Clinical Genomics standard, the proposed technology has already provided at least one solution to the challenge.
In this capacity, biorepositories will have two primary contributions. The first, likely industry changing, will be that of providing "research in a box." Modern, matured biorepositories have come a long way in streamlining the many processes involved in R&D (materials processing, storage and management, consent management), allowing researchers to focus on tracking their own results. With solid platforms for distribution, like Coriell's first-of-a-kind Google ("Mini") driven eCommerce catalogue of specimens and data, researchers can quickly identify which subjects they are interested in, procure said samples, and download phenotypic, genotypic, and any other relevant knowledge pool data.
In an effort to spur progress by reducing the barriers on the distribution of materials for research, too often locked away in various biobanks, organizations such as Science Commons have recognized the need to standardize current hurdles such as locating specimens across various biobanks and the authorizing of material transfer agreements (or MTAs), thus providing a level of accessibility and fluidity to the normally snag-prone process.
John Wilbanks, executive director of Science Commons, touts the advances of SPARQL (Simple Protocol and RDF Query Language), which was recently released by the World Wide Web Consortium as a solution to these problems. Wilbanks hopes that by establishing semantic networks through SPARQL and by promoting common MTA documents, researchers will gain unprecedented access to samples and data, fueling potential discovery.
Wilbanks is clear on the pivotal role that biorepositories will play in furthering research and personalized medicine: "Right now, we're stuck in a pre-industrial culture of tool making and transfer, where scientists have to beg labs to stop doing research and start making tools... It's absurd that tool making is slowing down even a single experiment if there's a way to avoid it. We have the tools, the technologies, and the legal systems to bring all the benefits of eCommerce to biological tool making - it just takes the willpower of [donors] and universities - but the entire system rests on biobanks for fulfillment. Scientists don't get grants for fulfilling orders for cells."
Various biobanking organizations, including Coriell and Addgene, will participate in the Science Commons project by opening up their respective catalogues to be accessed through the semantic network, allowing materials to be procured by using a single MTA across all repositories.
Publicly funded (and even commercial) biorepositories will also foster growth through knowledge dissemination. Once these biorepositories reach a critical mass of phenotypic and genotypic data, discoveries will accelerate. Indeed, this class of biorepository is ideally positioned to act as an "honest broker" of sorts, especially in the case of verified, actionable genetic variants. The idea being that the faster the information is distributed, the sooner a patient will receive the benefit.
Chris Kronenthal is the IT/Application Manager at Coriell. He can be reached at email@example.com.
This article appeared in Bio-IT World Magazine.
Subscriptions are free for qualifying individuals. Apply Today.