New Type of CRISPR Screen Probes the Regulatory Genome

February 8, 2016

By Aaron Krol

February 8, 2016 | When a geneticist stares down the 3 billion DNA base pairs of the human genome, searching for a clue to what’s gone awry in a single patient, it helps to narrow the field. One of the most popular places to look is the exome, the tiny fraction of our DNA―less than 2%―that actually codes for proteins. For patients with rare genetic diseases, which might be fully explained by one key mutation, many studies sequence the whole exome and leave all the noncoding DNA out. Similarly, personalized cancer tests, which can help bring to light unexpected treatment options, often sequence the tumor exome, or a smaller panel of protein-coding genes.

Unfortunately, we know that’s not the whole picture. “There are a substantial number of noncoding regions that are just as effective at turning off a gene as a mutation in the gene itself,” says Richard Sherwood, a geneticist at Brigham and Women’s Hospital in Boston. “Exome sequencing is not going to be a good proxy for what genes are working.”

Sherwood studies regulatory DNA, the vast segment of the genome that governs which genes are turned on or off in any cell at a given time. It’s a confounding area of genetics; we don’t even know how much of the genome is made up of these regulatory elements. While genes can be recognized by the presence of “start” and “stop” codons―sequences of three DNA letters that tell the cell’s molecular machinery which stretches of DNA to transcribe into RNA, and eventually into protein―there are no definite signs like this for regulatory DNA.

Instead, studies to discover new regulatory elements have been somewhat trial-and-error. If you suspect a gene’s activity might be regulated by a nearby DNA element, you can inhibit that element in a living cell, and see if your gene shuts down with it.

With these painstaking experiments, scientists can slowly work their way through potential regulatory regions―but they can’t sweep across the genome with the kind of high-throughput testing that other areas of genetics thrive on. “Previously, you couldn’t do these sorts of tests in a large form, like 4,000 of them at once,” says David Gifford, a computational biologist at MIT. “You would really need to have a more hypothesis-directed methodology.”

Recently, Gifford and Sherwood collaborated on a paper, published in Nature Biotechnology, which presents a new method for testing thousands of DNA loci for regulatory activity at once. Their assay, called MERA (multiplexed editing regulatory assay), is built on the recent technology boom in CRISPR-Cas9 gene editing, which lets scientists quickly and easily cut specific sequences of DNA out of the genome.

So far, their team, including lead author Nisha Rajagopal from Gifford’s lab, has used MERA to study the regulation of four genes involved in the development of embryonic stem cells. Already, the results have defied the accepted wisdom about regulatory DNA. Many areas of the genome flagged by MERA as important factors in gene expression do not fall into any known categories of regulatory elements, and would likely never have been tested with previous-generation methods.

“Our approach allows you to look away from the lampposts,” says Sherwood. “The more unbiased you can be, the more we’ll actually know.”

A New Kind of CRISPR Screen

In the past three years, CRISPR-Cas9 experiments have taken all areas of molecular biology by storm, and Sherwood and Gifford are far from the first to use the technology to run large numbers of tests in parallel. CRISPR screens are an excellent way to learn which genes are involved in a cellular process, like tumor growth or drug resistance. In these assays, scientists knock out entire genes, one by one, and see what happens to cells without them.

This kind of CRISPR screen, however, operates on too small a scale to study the regulatory genome. For each gene knocked out in a CRISPR screen, you have to engineer a strain of virus to deliver a “guide RNA” into the cellular genome, showing the vicelike Cas9 molecule which DNA region to cut. That works well if you know exactly where a gene lies and only need to cut it once—but in a high-throughput regulatory test, you would want to blanket vast stretches of DNA with cuts, not knowing which areas will turn out to contain regulatory elements. Creating a new virus for each of these cuts is hugely impractical.

The insight behind MERA is that, with the right preparation, most of the genetic engineering can be done in advance. Gifford and Sherwood’s team used a standard viral vector to put a “dummy” guide RNA sequence, one that wouldn’t tell Cas9 to cut anything, into an embryonic stem cell’s genome. Then they grew plenty of cells with this prebuilt CRISPR system inside, and attacked each one with a Cas9 molecule targeted to the dummy sequence, chopping out the fake guide.

Normally, the result would just be a gap in the CRISPR system where the guide once was. But along with Cas9, the researchers also exposed the cells to new, “real” guide RNA sequences. Through a DNA repair mechanism called homologous recombination, the cells dutifully patched over the gaps with new guides, whose sequences were very similar to the missing dummy code. At the end of the process, each cell had a unique guide sequence ready to make cuts at a specific DNA locus—just like in a standard CRISPR screen, but with much less hands-on engineering.

By using a large enough library of guide RNA molecules, a MERA screen can include thousands of cuts that completely tile a broad region of the genome, providing an agnostic look at anywhere regulatory elements might be hiding. “It’s a lot easier [than a typical CRISPR screen],” says Sherwood. “The day the library comes in, you just perform one PCR reaction, and the cells do the rest of the work.”

In the team’s first batch of MERA screens, they created almost 4,000 guide RNAs for each gene they studied, covering roughly 40,000 DNA bases of the “cis-regulatory region,” or the area surrounding the gene where most regulatory elements are thought to lie. It’s unclear just how large any gene’s cis-regulatory region is, but 40,000 bases is a big leap from the highly targeted assays that have come before.

“We’re now starting to do follow-up studies where we increase the number of guide RNAs,” Sherwood adds. “Eventually, what you’d like is to be able to tile an entire chromosome.”

Far From the Lampposts

Sherwood and Gifford tried to focus their assays on regions that would be rich in regulatory elements. To that end, they made sure their guide RNAs covered parts of the genome with well-known signs of regulatory activity, like histone markers and transcription factor binding sites. For many of these areas, Cas9 cuts did, in fact, shut down gene expression in the MERA screens.

But the study also targeted regions around each gene that were empty of any known regulatory features. “We tiled some other regions that we thought might serve as negative controls,” explains Gifford. “But they turned out not to be negative at all.”

The study’s most surprising finding was that several cuts to seemingly random areas of the genome caused genes to become nonfunctional. The authors named these DNA regions “unmarked regulatory elements,” or UREs. They were especially prevalent around the genes Tdgf1 and Zfp42, and in many cases, seemed to be every bit as necessary to gene activity as more predictable hits on the MERA screen.

These results caught the researchers so off guard that it was natural to wonder if MERA screens are prone to false positives. Yet follow-up experiments strongly supported the existence of UREs. Switching the guide RNAs from a Tdgf1 MERA screen and a Zfp42 screen, for example, produced almost no positive results: the UREs’ regulatory effects were indeed specific to the genes near them.

In a more specific test, the researchers chose a particular URE connected to Tdgf1, and cut it out of a brand new population of cells for a closer look. “We showed that, if we deleted that region from the genome, the cells lost expression of the gene,” says Sherwood. “And then when we put it back in, the gene became expressed again. Which was good proof to us that the URE itself was responsible.”

From these results, it seems likely that follow-up MERA screens will find even more unknown stretches of regulatory DNA. Gifford and Sherwood’s experiments didn’t try to cover as much ground around their target genes as they might have, because the researchers assumed that MERA would mostly confirm what was already known. At best, they hoped MERA would rule out some suspected regulatory regions, and help show which regulatory elements have the biggest effect on gene expression.

“We tended to prioritize regions that had been known before,” Sherwood says. “Unfortunately, in the end, our datasets weren’t ideally suited to discovering these UREs.”

Getting to Basic Principles

MERA could open up huge swaths of the regulatory genome to investigation. Compared to an ordinary CRISPR screen, says Sherwood, “there’s only upside,” as MERA is cheaper, easier, and faster to run.

Still, interpreting the results is not trivial. Like other CRISPR screens, MERA makes cuts at precise points in the genome, but does not tell cells to repair those cuts in any particular way. As a result, a population of cells all carrying the same guide RNA can have a huge variety of different gaps and scars in their genomes, typically deletions in the range of 10 to 100 bases long. Gifford and Sherwood created up to 100 cells for each of their guides, and sometimes found that gene expression was affected in some but not all of them; only sequencing the genomes of their mutated cells could reveal exactly what changes had been made.

By repeating these experiments many times, and learning which mutations affect gene expression, it will eventually be possible to pin down the exact DNA bases that make up each regulatory element. Future studies might even be able to distinguish between regulatory elements with small and large effects on gene expression. In Gifford and Sherwood’s MERA screens, the target genes were altered to produce a green fluorescent protein, so the results were read in terms of whether cells gave off fluorescent light. But a more precise, though expensive, approach would be to perform RNA sequencing, to learn which cuts reduced the cell’s ability to transcribe a gene into RNA, and by how much.

A MERA screen offers a rich volume of data on the behavior of the regulatory genome. Yet, as with so much else in genetics, there are few robust principles to let scientists know where they should be focusing their efforts. Histone markers provide only a very rough sketch of regulatory elements, often proving to be red herrings on closer examination. And the existence of UREs, if confirmed by future experiments, shows that we don’t yet even know which areas of the genome to rule out in the hunt for regulatory regions.

“Every dataset we get comes closer and closer to computational principles that let us predict these regions,” says Sherwood. As more studies are conducted, patterns may emerge in the DNA sequences of regulatory elements that link UREs together, or reveal which histone markers truly point toward regulatory effects. There might also be functional clues hidden in these sequences, hinting at what is happening on a molecular level as regulatory elements turn genes on and off in the course of a cell’s development.

For now, however, the data is still rough and disorganized. For better and for worse, high-throughput tools like MERA are becoming the foundation for most discoveries in genetics—and that means there is a lot more work to do before the regulatory genome begins to come into focus.

CORRECTED 2/9/16: Originally, this story incorrectly stated that only certain cell types could be assayed with MERA for reasons related to homologous recombination. In fact, the authors see no reason MERA could not be applied to any in vitro cell line, and hope to perform screens in a wide range of cell types. The text has been edited to correct the error.