gCell Gives Genentech Corporate Memory of Cell Line Experiments

By Bio-IT World Staff

September 15, 2014 | Like a lot of large pharma and biotech companies adapting to the era of big data, Genentech has lately found that its preclinical models, the animals and cell lines in its inventory, don’t scale up as neatly as the information they generate. It’s one kind of challenge to share tox screen results across a large organization, but another issue entirely to keep tabs on thousands of rodents and cancer cells traveling across the facilities.

Genentech, however, has stood apart in the industry for making a concentrated effort to upgrade its legacy systems where they no longer meet the expanding needs of its preclinical studies. “The organization over the last seven or eight years has made leaps and bounds in unifying and streamlining processes that are important for the research pipeline,” says Richard Neve, a scientist in the company’s Discovery Oncology program.

Last year, Bio-IT World awarded Genentech a Best Practices Award for work on a platform that tracks the history and whereabouts of the tens of thousands of animals kept in the company’s Mouse Genetics Department. That project demonstrated that attention to these kinds of mundane processes can yield significant efficiencies and savings, and the company has continued to apply that philosophy in other areas. At this year’s Bio-IT World Conference & Expo in April, Genentech was again recognized, this time for its gCell system, which has revamped the ordering and use of cell lines in basic research. The project, on which Neve took a lead role along with Senior Software Engineer Jean Yuan, won Bio-IT World’s top prize in the category of Knowledge Management at the Best Practices Awards ceremony.

A Sprawling Network of Cell Lines

Genentech is a multi-billion-dollar titan of biotech, but before gCell, its procedures for working with cell lines were not much different from the smallest startups. If a project lead needed a new cell line for a particular experiment, she would contact a vendor, have it delivered to her lab, and take responsibility for analysis and storage. Cells might be distributed to other labs, or siloed in a freezer; data might be shared with collaborators, or locked away in project notes.

“There was no formalized process for organizing cell lines in Genentech,” says Neve. “You often found that multiple labs had ordered the same cell line, and were keeping separate stocks.” Worse than the redundancy was the risk of errors creeping into the preclinical pipeline. Left unattended, cell lines have a tendency to quickly rack up mutations as they adapt to the unnatural environment of the lab. They can also pick up some tenacious contaminants; one recent study suggested as many as one in ten projects using cell lines may be affected by contamination with mycoplasma, tiny bacteria that are resistant to most antibiotics. While genetic assays that can identify cell lines by the short tandem repeats (STRs) in their genomes are widely available, Genentech previously had no oversight of whether and how labs made use of them.

Meanwhile, the company’s stock of cell lines continued to grow. Genentech now stores an estimated 90,000 vials of cells, in which over 1,800 lines are represented from different tissues, disease states, and species. In 2009, the company began a project to track cell lines that enter and travel through Genentech facilities in support of a central biobank, and over the past five years, that system has evolved into gCell, which now features not only the bank and tracking measures, but also a robust bioinformatics platform for identifying each cell line and keeping its internal history.

Even identifying the cell lines is more difficult than it sounds. “No one in the field has a standardized nomenclature,” says Neve, so vendors and academic groups use a variety of naming systems when a new cell line is created. Genentech implemented three unique identifiers for each cell line in its stocks: the vendor’s ID, rendered in a uniform syntax; an STR profile; and a single-nucleotide polymorphism (SNP) profile, for faster and less expensive cell line profiling.

The gCell platform also introduced a controlled vocabulary for describing the cell lines in its library, including their tissues of origin and diagnoses. “The most important part comes down to the associated metadata, like the pathologic terms that are used,” says Neve. “For example, we found in different vendor databases that there are almost 70 different ways to define adenocarcinoma. We worked with pathologists to define standardized terms, and then we went through three or four thousand cell lines that we have manually curated.”

By describing all cell lines with the same standard terms, gCell makes the Genentech cell bank rapidly searchable. A PI can immediately pull up every cell line in the library representing a particular subset of cancer, or every cell line from the liver, and order vials. The gCell vocabulary also helps with unified data analysis downstream, as large projects pull together datasets from different labs, experiments, and cell sources.

Unique Cell Line Histories

Although core fields in gCelluse a rigid set of terms, the profiles are also flexible enough to capture other types of information researchers would want to know. A comments section allows gCell administrators to record things like assay results, drug sensitivities, and phenotypic observations.

“That gives us corporate memory on what has happened to a given cell line,” says Neve. “We can also follow that back based upon the batches and the log profile in the database, to really know what was done with a cell line, what happened to it, and why certain actions were taken.”

Applying that memory requires Genentech to keep tabs on its cell lines at every stage of use. The company uses unique barcodes assigned to each vial of cells and location on the corporate campus. By scanning barcodes on the vials of cells and the racks and tanks where they’re stored, researchers keep a real-time record of where cell lines travel. “We have an ongoing project to track a sample from the original inventory down,” says Jean Yuan, a bioinformatics lead on the project. “Because gCell has become a centralized resource, it has become easier to track who ordered a cell line, what kind of assays have been done, and in which labs.” Genentech also uses its cell line IDs to note parental and daughter lines, so researchers can trace back when related cell lines diverged.

The volume of preclinical testing at Genentech means that these processes have to be as user friendly as possible. “We’re supplying around fifty orders a day to research,” says Neve, adding up to around 40,000 deliveries over the life of the program. While gCell adds some quality control routines to the use of cell lines, it also dramatically speeds up distribution: Genentech now promises next-day delivery to PIs who order frozen vials from the biobank.

In the course of implementing gCell, Genentech has built up a database of STR and SNP profiles that could potentially offer benefits to science well beyond the company’s walls. When the tracking program was first unrolled, Genentech created profiles for every cell line in its possession — in some cases discovering that its stocks were mislabeled or contaminated in the process. It also expanded this reference database to popular cell lines in vendor archives, often improving on the canonical STR profiles, according to Neve. He and his colleagues have now submitted a paper to Nature that details this genetic fingerprinting reference of over 3,000 cell lines, as well as offering recommendations for a standard nomenclature.

“We’re hoping that people will start to use this unified vocabulary and annotation,” he says. “That will make it much easier, not only to search for samples and data externally, but also to integrate that data for large analyses.”

New Efficiencies

It isn’t easy to put a dollar value on gCell’s benefit to Genentech, but the company estimates it is now saving $1.7 million a year simply by eliminating redundancies in ordering cell lines, performing all the freezing and storage in a single location, and reducing the frequency of genetic profiling of cell lines. The gCell platform can also tamp down on redundant assays and experiments, by tying results permanently to a cell line’s profile and making them accessible to all researchers in the company.

Still, Neve and Yuan say the biggest value of gCell is greater confidence in preclinical studies. By controlling for contamination, and making it easier for labs to check one another’s data, gCell can go a long way toward holding down experimental error. This added security in preclinical testing can make all the difference when it comes time to choose which projects advance to clinical trials and which are left on the cutting board.

“In my own lab, we’ve been making some drug-resistant lines,” adds Neve, “and because we’re very alert to this, we’ve been doing the SNP profiling on them. We found a couple of mistakes, and luckily we caught them before we did any important experiments.”

As with other improvements to Genentech’s preclinical pipeline, gCell may seem like little more than an accounting exercise, but an enormous amount of thought goes into making it smoothly functional and delivering a benefit that will be felt by every researcher in the company. As Genentech begins to publicize its system and fingerprinting database over the coming months, one can hope to see other drug companies follow the example of gCell, and ensure that the management of preclinical models keeps pace with discovery pipelines that depend more and more on enormous stores of data.