The $1,000,000 Genome Interpretation

October 1, 2010

Groups of clinicians, academics, and some savvy software companies are crafting the tools and ecosystem to make medical sense of the sequence. 

October 1, 2010 | It is doubtful that the scientists and physicians who first started talking about the $1,000 genome in 2001 could have imagined that we would be on the verge of that achievement within the decade. As the cost of sequencing continues to freefall, the challenge of solving the data analysis and storage problems becomes more pressing. But those issues are nothing compared to the challenge facing the clinical community who are seeking to mine the genome for clinically actionable information—what one respected clinical geneticist calls “the $1 million interpretation.” From the first handful of published human genome sequences, the size of that task is immense.

Although early days, a number of groups are making progress in creating new pipelines and educational programs to prepare a medical ecosystem that is ill-equipped to cope with the imminent flood of personal genome sequencing.

Pathologists’ Clearing House

The pathology department at one of Boston’s most storied hospitals isn’t necessarily the place where one might expect to find the stirrings of a medical genomics revolution, but that’s what’s happening at Beth Israel Deaconess Medical Center (BIDMC) under the auspices of department chairman Jeffrey Saffitz.

“I see this as ground-breaking change in pathology and in medicine,” he says.

Together with Mark Boguski and colleagues, Saffitz has introduced a genomic medicine module for his residents (see “Training Day”). And under the stewardship of applied mathematician Peter Tonellato, he is building an open-source genome annotation pipeline that might pave the way for routine medical inspection once whole-genome sequencing crosses the $1,000 genome threshold.

All well and good: but why pathology? “We are the stewards of tissue and we perform all the clinical laboratory testing. This has been our function historically for many years. But we have a sense that the landscape is changing,” says Saffitz. Genetic testing, he argues, must be conducted under the same type of quality assessment, regulatory oversight, and CLIA certification as provided by the College of American Pathologists (CAP), “and should be done by physicians who are specifically trained to do this. That’s us!”

“The brilliance of that,” says Boguski, a pathologist by training, “is that it removes a lot of the mysticism surrounding genomics and makes it just another laboratory test.” There’s really nothing magical or different about DNA, insists Saffitz. “We regard a file of sequence data as a specimen that you send to the lab, just like a urine specimen!”

BIDMC is a medium-sized hospital that conducts 7 million tests a year. Arriving in Boston five years ago, Saffitz began recruiting visionaries to shape “the future of molecular diagnostics” and help the discipline of pathology become a clearinghouse for genomic medicine in a way that is “going to revolutionize the way we do medicine.”

Boguski is best known as a bioinformatician who spent a decade at the National Center for Biotechnology Information (NCBI). He sums up the genomic medicine informatics challenge thus: “You have 3 billion pieces of information that have to be reduced to six bytes of clinically actionable information. That’s what pathologists do! They take in samples—body fluids and tissues—and we give either a yes/no answer or a very small range of values that allow those clinicians to make a decision.”

Increasingly, he says, pathology will become a discipline that depends on high-performance computing to extract clinically actionable information from genome data. That frightens many physicians, but Boguski cites a precedent. “Modern imaging technology would not be possible were it not for high-performance computing, but it’s built into the machine!” he says. “Most practicing radiologists don’t think about the algorithms for reconstructing images from the X-rays. Most pathologists in the future won’t think about that stuff either—it will just be part and parcel of their trade. Nevertheless, we have to invent those technologies.”

Math Lab

Mathematician Peter Tonellato has a deep interest in software systems for the clinic, and formulated the idea of a whole-genome clinical clearinghouse within pathology. “We have to start thinking about genetics as just another component of data information and knowledge that has to be integrated into the electronic health record. Stop labeling genetics as something different and new and completely outside the mainstream medical establishment and move it back into the fundamental foundational effort of medical activity.”

Come the $1,000 genome, it will simply make sense to sequence everyone’s tumor, he says. Just as pathologists study tissue biopsies under a microscope, “we’re going to be sequencing it in parallel and figuring out which pathways and targets are pertinent to that person’s condition.” Simply doing more specialized tests isn’t the solution. “How many tens of millions of dollars and how many years has it taken to validate [the warfarin] test?” asks Boguski. “Multiply that by 10,000 other genes and it simply doesn’t scale. We’re going to have to look at this in a whole new way.”

Tonellato has been funded by Siemens and Partners HealthCare to construct an open-source, whole-genome analysis pipeline. Although not commercially released, the pipeline is built and being used for some pilot projects. He is also partnering with companies—including GenomeQuest—who want to do the sequencing analysis in a best-of-breed competition to establish the most refined NGS mapping utilities and annotation tools. The goal is to annotate those variants in a clinically actionable way down to Boguski’s six bytes of information and the drug response recommendation. “We think we’re as far forward in terms of doing that in an innovative and pragmatic way as anyone,” says Tonellato.

Using the cloud (Amazon Web Services), his team has lowered the cost of whole-genome annotation to less than $2,000. “Everybody talks about the $1,000 genome, but they don’t talk about the $2,000 mapping problem behind the $1,000 genome,” he says. It takes Tonellato’s group about one week using five nodes for the resequencing, mapping and variant calling, while the medical annotation takes three people about a month. High-quality computer scientists have to be paid too, he says. “You can’t just talk about the sequencing costs.”

Of course, it is most unlikely that hospitals will start running massive NGS and compute centers. “We envision a day where every clinical laboratory in every hospital in this country can do this testing,” says Saffitz. “They’re not going to do the sequencing, but there’ll be a machine where they can basically acquire the data, analyze it, and send a report to the doctor saying, ‘This is what we found, this is what it means, this is what you do.’” Where the sequencing is done isn’t of great concern. “We actually treat sequencing as a black box,” says Boguski. What’s important is that the hospital’s cost requirements and quality standards (and those of the FDA) are met. But Tonellato reckons it would be “very odd to have U.S. samples sent abroad for sequencing to Hong Kong or India... and then sit around and wait for the CLIA-certified, clinically accurate results to come back to us. That may happen in the future, but we have to get our own house in order first.”

Another problem is the current state of the gene variant databases, which Boguski calls “completely inadequate” in terms of clinical grade annotation. Where such a resource belongs is open to debate but Boguski is certain it does not belong with the government. “The government is not a health care delivery organization. Whatever that database is, it needs to operate under the same CLIA standards as the actual tests.”

Pathologists have traditionally interacted with patients when they are sick. “But more and more,” says Saffitz, “we’re going to be analyzing the genomes of people who are well, and I hope assuming a very prominent role in the preservation of health and preempting disease.”

Quake Aftershocks

The most comprehensive clinical genome analysis to date was reported in May 2010 in the Lancet. Stanford cardiologist Euan Ashley and colleagues, including Atul Butte and Russ Altman, Stanford’s chair of bioengineering, appraised the genome of Stephen Quake (see, “A Single Man,” Bio•IT World, Sept 2009). “This really needs to be done for a clinical audience to show them what the future is going to be like,” says Altman, who is also director of the biomedical informatics program and chief architect of the PharmGKB pharmacogenomics knowledgebase. The task of interpreting Quake’s genome involved more than 20 collaborators, including bioethicist Hank Greeley and genetic counselor, Kelly Ormond. When discussions turned to the risk of sudden cardiac arrest (Quake’s family has a history of heart disease), Ormond would invite Quake to leave the room until a consensus was reached.

Altman’s own group was able to predict Quake’s response to about 100 drugs. Some of it was imprecise, but he realized that, “especially for the pharmacogenomics, we are much closer [to clinical relevance] than I realized.” He said he would “bet the house” on the results dealing with statin myopathy, warfarin and clopidogrel dosing. The Stanford team also tried linking environmental and disease risk, but Altman admits that is farther from clinical practice. The Lancet study drew high praise from the BIDMC team. “As good as it gets,” is Tonellato’s verdict. “But go down to some town in the middle of America and say, ‘What are you going to do with this genome dataset for your patient?’... Is medicine ready for genetics yet or not? There is a long way to go.”

Since the publication, Altman has received inquiries from companies interested in doing similar “genomic markups” and licensing his group’s annotations. Altman intends to hire an M.D. curator to complement his Ph.D. curators, someone who can highlight the clinical significance of research data. Altman says he would be happy to have PharmGKB data included “in any and all pipelines. Meanwhile, Ashley is leading a Stanford program to make a computer pipeline to reproduce the Quake analysis on a larger scale.

In a rational world, Altman says, it seems logical to sequence human genomes at birth and put the data in a secure database, querying it only when you know what you’re going to do with the results. That’s in an ideal world. In the United States, he notes dryly, some people do not trust governmental databases. “I could imagine if it’s cheap enough, that people will actually resequence the genome on a need-to-know basis, simply so they don’t have to store it. I think that’s a little bit silly, but in order to get genomic medicine effected, I’m not going to lose the fight over the database.”

Whoever ends up doing clinical genomic sequencing in the future, Altman says they will have to document high-quality data with a rapid turnaround. “We will then put [the data] through the pipeline—hopefully the Stanford pipeline or whatever pipeline seems to be winning—and then we will query it as needed and as requested by the physicians on a need-to-know basis.”

1,500 Mutations

Genome Commons was established by Berkeley computational biologist Steve Brenner to foster the creation of public tools and resources for personal genome interpretation. He wants to build an open access Genome Commons Database and the Genome Commons Navigator. He is also launching a community experiment called CAGI (The Critical Assessment of Genome Interpretation) to evaluate computational methods for predicting phenotypes from genome variation data (

One notable private effort in clinical genome annotation is that of Omicia, a San Francisco-based software company founded by Martin Reese in 2002.

Omicia is taking genome data and extracting clinical meaning, focusing on DNA variation, rather than gene expression or pathways. “We have one of the best systems for interpreting the genome clinically,” claims Reese. He started with Victor McKusick’s classic Mendelian Inheritance in Man catalogue, which now lives online as OMIM, mapping a “golden set” of disease mutations to the reference genome. Omicia is also developing algorithms to predict the effect of protein-coding variants to better understand which mutations are medically relevant.

Reese sums up the goal: “You have 21,000 protein coding mutations compared to the reference genome. 10,000 of them are non-synonymous. We have 3,500 in disease genes. That’s roughly 15%. So 15% of 10,000 is 1,500 protein coding mutations. The goal is to interpret 1,500 mutations.”

For the time being, Omicia is offering its services through collaborations. Reese has a three-year collaboration with Applied Biosystems, and was a co-author on the first NGS human genome paper using the SOLiD platform in 2009. Then there is the Genome Cancer Care Alliance, a cancer genome alliance, featuring various medical centers and Life Technologies. “We’re doing their interpretation of these cancer genomes for 100 untreatable cancers,” says Reese.

Presenting the data for a physician is a challenge, says Kiruluta, but not as bad as the scant amount of time a physician has to see a patient. “The reporting is to help a physician make a decision quickly—green light, red light. Then there’s a much more detailed interface behind the scenes,” where other medical professionals can study the patient’s data in more detail.

Reese sees advantages to the commercial approach for genome software compared to academic solutions. “This will be a big play in next few years as people make clinical decisions. So the quality of the software, the QC of the assembly, how transparent you are, the annotation, is critical. It will be a big problem for academia to do that—you know how it is when a postdoc writes something!”

Reese has also been spearheading the effort to develop a new Genome Variation Format with Mark Yandell (University of Utah) and others, which was recently published in Genome Biology.

DNA Partners 

The challenge facing the affable Samuel (Sandy) Aronson, executive director for IT at the Partners HealthCare Center for Personalized Genetic Medicine (PCPGM) and PCPGM’s clinical laboratory director, Heidi Rehm, is to deliver clinically actionable information to physicians in the Partners HealthCare network. “This challenge cannot be entirely solved by a single institution,” Aronson notes. “It takes a network of institutions working together.”

Rehm maintains a knowledge base of 95 genes that are routinely curated by the PCPGM’s Laboratory of Molecular Medicine and supplies information to physicians on the status of those genes in their patients in real time. The PCPGM’s GeneInsight suite, developed by Aronson’s team, has been in use for about seven years. There are two components—one for the laboratory, the other for the clinician. The lab section consists of a knowledgebase—the tests, genes, variants, drug dosing, etc—as well as an infrastructure to generate reports via the Genome Variant Interpretation Engine (GVIE).

On the clinical side is a new entity, the Patient Genome Explorer (PGE), which allows clinicians to receive test results from an affiliated lab and query patient records. “The PGE, without a doubt, is one of its kind,” says Rehm. “There’s no other system out there. There’s a lot of excitement about it. Labs are choosing us for testing because we offer that service.” When an update is made to the PCPGM knowledgebase on a variant that is clinically significant, the PGE proactively notifies the clinicians caring for patients with that variant. If there are 100 clinics with 10 patients each, and Rehm updates the knowledgebase, then 1,000 patient updates are dispatched automatically.

For inherited disease testing, the alert changes the variant from one of five categories to another: 1) pathogenic 2) likely pathogenic 3) unknown 4) likely benign, or 5) benign. The PGE made its debut last summer in the Brigham and Women’s Hospital Department of Cardiology. When the system launched, a dozen “high alerts” (meaning a variant has shifted from one major category to another) were immediately dispatched. The physicians’ response has been really positive, says Aronson. “There’s a significant disconnect between the level of quality of data being used for clinical purposes and the quality of data in the research environment,” says Rehm. “Our hope with the distribution of this infrastructure is to get more data validated for clinical use.”

Core Challenges

The Partners effort is a worthy start, but the larger goal is to build a network where labs with expertise in other genetic disorders such as cystic fibrosis contribute their data, perhaps by offering attribution or a nominal transaction fee. “We can’t maintain data on every gene, but we’re willing to establish nodes of expertise,” says Rehm. As for the IT infrastructure, Aronson hopes to enable organizations to create a node on the network, link to the PGEs, and then operate under their own business models—whatever it takes to make the data accessible. The first external partner that linked to GeneInsight was Intermountain Healthcare (IHC) in Utah. “We believe this is the first transfer of fully structured genetic results between institutions so that they got into IHC’s electronic health record and are now available for decision support,” says Aronson.

Aronson anticipates a day where whole-genome sequencing for patients will be a clinical reality. “It’s very much on our radar,” he says, but doesn’t appear unduly concerned. After all, he says, the PGE is designed to store highly validated clinical information, and he doesn’t expect the millions of variants in a whole genome to contain enough clinically actionable variants to overwhelm the database. The challenge will come in understanding complex/low-penetrance diseases, “where we’re more algorithmically dependent. That will require new infrastructure.”

A bigger problem is facilitating the business models that will solve personalized medicine challenges. “Our goal is to expand networking, adding labs, PGEs and going after a network effect,” says Aronson. “We have a structure that could present an answer to how do you—in a true patient-specific, clinically actionable way that clinicians can use in their workflow—help interpret the data?”

Training Day

It was in 2009 that Mark Boguski conceived the innovative training program for Beth Israel’s pathology residents, and yet remarkably his boss, Jeffrey Saffitz, thinks every pathology department in North America might have introduced a similar program by 2012. The program offers residents the chance to be personally genotyped. It is, after all, a time-honored tradition in medicine for medical students to draw blood and exam each other, so this is nothing new.

Boguski selected Navigenics for the genotyping service because it focuses on medically actionable conditions and has a built-in genetic counseling component. All but two of the 17 BIDMC residents chose to get tested. Afterwards, Saffitz, Richard Haspel and colleagues published a “call to action” for the pathology community in the American Journal of Clinical Pathology. The response was incredibly positive, and many programs are eagerly signing on.

“We want the residents to see that a typical bioinformatics analysis of a genome — pages and pages of text and references using natural language processing or something — isn’t going to work,” says Saffitz. “We actually use that as a teachable moment to say, ‘this is the state-of-the-art today, but here’s how we have to move it forward.’”

Stanford has introduced a similar course in personal genomics where students could optionally get genotyped and analyze their genome, allowing the students to calculate their disease risks and optimal drug dosage. “It felt somewhat historic, actually,” says Altman. There were divisions among the faculty as to whether it should be offered, but Altman hopes the course will become a staple in the curriculum. K.D.

















This article also appeared in the September-October 2010 issue of Bio-IT World Magazine. Subscriptions are free for qualifying individuals. Apply today.