YouTube Facebook LinkedIn Google+ Twitter Xingrss  

Genome Analytics for All

Pauline Ng is planning open source, open access analytics for the genomes to come.

By Allison Proffitt

August 2, 2011 | SINGAPORE—Pauline Ng’s office is the Genome building of the Biopolis science park in Singapore, a fitting home for one of the authors of the first published personal genome, that of J. Craig Venter, published in 2007 while Ng was a senior scientist at the J. Craig Venter Institute.

Now Ng leads an expanding group of three bioinformaticists (she’s hiring!) at the Genome Institute of Singapore (GIS). Before her stint at the Venter Institute, Ng worked for Illumina as well as the Fred Hutchinson Cancer Center in Seattle, where she wrote the powerful SIFT algorithm (, a widely used tool to predict the effect of a given amino acid substitution on protein function.

“We put the algorithm on a Web server,” she said. “Ten years ago people would publish their algorithms, but they wouldn’t necessarily put them on a Web server. But my Ph.D. advisors were very emphatic, ‘You need to do this.’ That actually was very informative, because people used it. That opened it up for clinicians and geneticists to use the algorithm, instead of just pure bioinformaticists.”

Ng believes that access is very important. “What’s happening is [sequencing is] accessible to academic institutions like GIS. We can sequence; we can analyze that data. The Broad Institute, University of Washington, Baylor—these are very highly regarded institutions with collaborations with a medical center. But if you’re anyone else, you may not have access to those types of resources.”

In 2009, Ng co-authored a much-discussed Nature commentary outlining an agenda for personalized medicine in which they compared the results of two commercial consumer genomics tests. They found that the accuracy of raw data in both 23andMe and Navigenics tests was high, but one third of risk predictions (for five anonymous individuals) did not agree between the tests. A disappointing result for Ng. “At that time I though, wow, there’s something not quite right,” she said.

“When you get a health diagnosis, you don’t consider it a prediction, you expect it to be correct. Just like you go to the doctor and he says, ‘Take this drug because you’re at risk for heart disease,’ or something. But if you went to another doctor and they said something else, it would reduce the credibility overall.”

GIS a Job

Ng moved to Singapore in 2010, but hasn’t quite shaken her discomfort. “All of this together: working on individual genomes, making tools that are accessible to everybody, and just getting exposure to direct-to-consumer [tests]” has shaped what she now hopes to do at GIS: make bioinformatics accessible to everyone. Like SIFT, Ng’s next tools will be open source. “The plan is not to let just doctors access the software, but really anybody.”

She acknowledges that bioinformatics is “a bit specialized,” but also believes that the patient is his own best advocate. She cites Hugh Reinhoff’s work on his daughter’s DNA (see, “Hugh Reinhoff’s Voyage Round his Daughter’s DNA,” Bio•IT World, Sept 2010). “There’s someone with a huge self interest in finding out what is wrong with his daughter. That’s one example, but you can probably imagine all across the world there are families like this where doctors probably don’t have time or resources to do it. But if there truly is a $1,000 genome, that means that for $5,000 they can get the full family sequenced.”

Affordable sequencing is still a limiting factor, but Ng is confident in that progression. And the types of diseases that Ng hopes to address need full genome sequencing. “The 23andMe data, they’ve squeezed as much as they can from it. But the applications—cancer, Mendelian disorders—they’re tailored toward the rare variants or somatic variants which you need [to get] from sequencing.” She expects that to be easy enough to outsource in about two years.

But sequencing and analysis—today at least—cost the same. “The problem is that right now, companies like Knome are actually charging the same amount for bioinformatics as they are for sequencing. If you sequence more individuals, I’d expect the bioinformatics to go down, but it’s the same price. That means the price is double! If we can make these tools online, accessible for free or at least at cost, I think I can get it to a tenth of the cost.”

Ng plans to do the computation on the Amazon Cloud and, at today’s rates, expects a genome analysis to cost $500. She hopes that these price points will enable doctors and individuals to use genomics. “If we could say, OK, outsource [the sequencing] to these companies. You’re going to get a hard disk. Mail it to Amazon and get your results in a week.”

Ng is not promising a magic cure, and doesn’t even think that this model should be the only one. She just hopes to drive prices down and open the market. “There’s never a guarantee of an answer,” she says. “Even with the software we write, there may not be a guarantee of an answer, but at least…” she pauses and begins again, emphatically. “We can definitely give you the basic annotation and provide the tools that everyone uses. And if it doesn’t work, then you go to an expensive company that really uses the same tools as the academics but with a couple of more bells and whistles. If you try our stuff first, at least you’ve invested only $500 instead of $5,000.”   

This article also appeared in the 2011 July-August issue of Bio-IT World.

Click here to login and leave a comment.  


  • Avatar

    Nathan Pearson here (from Knome). Wanted to commend Pauline -- sage and thoughtful, as always -- for her insights on challenges (and promise) inherent to whole-genome analysis.

    Also want to note that these days, Knome (and perhaps our competitors) actually charge our main clients (researchers) much less to analyze genomes than to sequence them. Moving analysis to the cloud, per Pauline's vision, has helped us do so.

    That said, in the long run, genome analysis costs will likely fall slower than sequencing costs, largely because computational analysis:

    a) naturally tracks Moore's Law more closely than do 'Even More's Law'-paced sequencing costs (which are coupled tightly to advances in reagent supply, microfluidics, imaging hardware, etc.)

    b) entails ongoing efforts to efficiently gather and meaningfully interpret ever-growing, ever-changing reference data on genotype-phenotype associations;

    c) will be prognostic (in the context of personal health), not just descriptive -- and so will require slower, more methodologically varied, and potentially costlier, validation than sequencing itself does;


    d) must convey simple, clear insights distilled from complex data -- a task that will require intuitive, engaging software. While we'll see shining examples of both open-source and proprietary tools that do just that, software development costs -- whether paid by public or private money -- tend to fall more slowly than do hardware costs.

    In any case, here's to that prospect of a thorough, insightful $500 analysis -- and beyond!

Add Comment

Text Only 2000 character limit

Page 1 of 1

For reprints and/or copyright permission, please contact  Jay Mulhern, (781) 972-1359,