Two new papers unveil a new dimension to commercial next-generation sequencing applications – one that could potentially pose a threat to more-established microarray technologies. Using the Genome Analyzer from Illumina/Solexa, two groups working independently have been able to map the locations across the genome where a specific DNA-binding protein latches onto the DNA.
The new method is called ChIP-sequencing (ChIPSeq) – a combination of chromatin immunoprecipitation and next-generation, or parallel, sequencing. The feat was performed “with a speed and precision that goes beyond what has been achieved with previous technologies,” comments University of Washington geneticist Stanley Fields, in an accompanying essay in Science.
The precisely choreographed interplay of cellular gene activity is controlled by a vast cast of DNA-binding proteins – transcription factors and enzymes mostly. ChIP is a well-established lab technique to identify those specific sites where proteins latch onto the DNA. Cells are treated with a chemical to fossilize the links between DNA and protein, the chromatin is then isolated, the DNA broken up, and the attached proteins immunoprecipitated. Finally, the DNA stuck to the protein can be released and analyzed. Until now, the most high-throughput application of this technique involved using microarrays containing thousands of gene spots able to identify binding sites for transcription factors and the like.
On and Off
Writing in Science, David Johnson and colleagues at Stanford University use ChIPSeq to identify the binding sites for a transcription factor – NRSF (neuron-restrictive silencer factor), which turns off neuronal genes in non-neuronal cells. The DNA motif that NRSF recognizes consists of a 21-base pair core fragment. Using the Solexa/Illumina platform, “because high-read numbers contribute to high sensitivity and comprehensiveness in large genomes,” Johnson et al. performed a ChIP experiment in a T-cell line. They sequenced the released DNA fragments – some 2 to 5 million per sample -- of which about half were successfully mapped back to the reference genome sequence.
The Stanford group discovered a total of 1946 NRSF-binding locations in the human genome, including DNA motifs controlling more than 100 other transcription factors and 22 micro-RNAs. The most common binding target was identified more than 6700 times in the experiment. Most of the sites were identified as expected, but so too were some previously unrecognized binding motifs that did not fit the previously known rules for NRSF binding.
The authors conclude that ChIPSeq is a cost-effective alternative to microarray methods, with a significant upside. “Other ultrahigh-throughput sequencing platforms, such as the one from 454 Life Sciences, could also be used to assay ChIP products, but whatever sequencing platform is used, our results indicate that read number capacity and input ChIP DNA size are key parameters,” Johnson et al. write.
Meanwhile, Gordon Robertson, Steven Jones, and colleagues at the British Columbia Cancer Agency Genome Sciences Centre in Vancouver, performed a similar analysis, again using the Illumina Genome Analyzer because of its high throughput. Here, they looked at binding of a transcription factor called STAT1. The Vancouver group generated a total of more than 28 million fragments (in two types of cells), identifying more than 42,000 putative STAT1-binding regions.
The group suggests that ChIPSeq might be an order of magnitude cheaper than microarray alternatives, with the eight flow cell lanes in the Genome Analyzer offering excellent design flexibility. Fewer materials are required, and the method can be applied to any organism – it is not restricted to available gene arrays.
According to Fields, the advantages of ChIPSeq over ChIP-chip include the ability to interrogate the entire genome rather than just the genes represented on a microarray. (For example, Johnson et al. point out that a similar experiment using Affymetrix-style microarrays would require roughly 1 billion features per array.) There is also the benefit of sidestepping known hybridization complications with microarray platforms. “Perhaps most usefully,” writes Fields, “ChIPSeq can immediately be applied to any of those [available] genomes, rather than only those for which microarrays are available.”
Fields anticipates that similar experiments will quickly identify the binding locales of numerous other transcription factors, structural chromatin components, histone proteins, and various enzymes. The addition of ChIPSeq to the next-generation sequencing repertoire, as well as the ability to quantify captured gene sequences in a single sample, illustrate the growing breadth of next-generation sequencing applications.
Fields concludes his essay with a provocative thought: “The technology that is most threatened by the widespread adoption of ultrahigh-throughput sequencing? The DNA microarray.”
D. S. Johnson et al. “Genome-wide mapping of in vivo protein-DNA interactions.” Science 316, 1497-1502 (2007).
G. Robertson et al. “Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing.” Nature Methods (published online 071107).
Subscribe to Bio-IT World magazine.