Machine Learning Predicts How DNA Breaks Under CRISPR

November 8, 2018

By Allison Proffitt

November 8, 2018 | In a study published yesterday in Nature, researchers created a machine-learning model—inDelphi—that predicts how human and mouse cells will respond to CRISPR-induced breaks in DNA. The team discovered that cells often repair broken genes in ways that are precise and predictable, sometimes even returning mutated genes back to their healthy version.

The work—led by David Liu of the Broad Institute; David Gifford of MIT; and Richard Sherwood of Brigham and Women’s Hospital—suggests that the cell’s own repair mechanisms could one day be combined with CRISPR-based therapies that correct gene mutations by simply cutting DNA precisely and allowing the cell to naturally heal the damage (

When CRISPR makes a targeted break in DNA, repairs can be made with a DNA template, which can be very precise, but is not very efficient, and by end-joining, efficient but not very controllable.

“End joining can be thought of as a ‘willy-nilly’, no-holds-barred effort to repair the cut into a form that CRISPR can no longer recut,” write the authors on the inDelphi public website. “The results are ‘ugly’ insertions and deletions which are hugely diverse and heterogeneous in different cells.”

And yet templated repair is a “minority product,” explains Rich Sherwood. “Even in the best of cases, it’s happening at maybe 20-30% efficiency. The rest of the cells are doing end-joining repair.”

So how do all the rest of those repairs work? To make sense of it, Sherwood and his colleagues constructed a library of 2,000 Cas9 guide RNAs (gRNAs) paired with DNA target sites, and they observed how cells repaired the breaks at each of the 2,000 sites.

“We find that the average [break] site has well over 200 different repair genotypes, and they’re distributed non-uniformly,” Sherwood said. For instance, maybe a particular break spot would be repaired with a single base pair insertion 12% of the time, but a 1-bp deletion only happened 3.5% of the time. “We tended to break each site at least 1,000 times to get to that level of confidence.”

The researchers put all of the resulting data into inDelphi, training the algorithm to predict how the cell would responded to cuts at each site. And they found that inDelphi could discern patterns at cut sites that predicted what insertions and deletions would be made in the corrected gene. The algorithm was able to predict the heterogeneous (100+ unique) mixture of indels resulting from microhomology-mediated end-joining (MMEJ) and non-homologous end-joining (NHEJ) at a CRISPR-induced cut. Across cell-types, they found that three repair classes constitute 80-95% of repair outcomes: 1-bp insertions, microhomology deletions, and microhomology-less deletions. For each cut site, inDelphi assigns a precision score, a microhomology strength score, and a frameshift frequency.

To confirm these findings, the team used select gRNAs to correct mutations in cells collected from patients with genetic diseases that result from microduplications: Hermansky-Pudlak syndrome, especially common in Puerto Ricans, which causes blood clotting deficiency and albinism; and Menkes disease, which results in copper deficiency. The team also generated cells with microduplications found in patients to result in familial hypercholesterolemia a disease in which LDL cholesterol levels are abnormally high. For all three diseases, delivering the appropriate Cas9 and guide RNA corrected the mutation with high efficiency.

InDelphi is available through a web portal (, allowing researchers around the globe to design guide-RNAs for making precise edits. Scientists interested in repairing pathogenic mutations can query the site to see where they might be able to cut DNA and get their desired outcomes. In addition, scientists may also use the site to confirm the efficiency of DNA cuts intended to turn genes off, or to determine the end-joining byproducts of a template-driven repair. A Python implementation of the inDelphi model is available for users interested in running inDelphi at a larger scope than supported by the online implementation.

Controlled Breaks

InDelphi has predictive limits. The algorithm was trained on five cell types—mESCs, U2OS, HEK293, HCT116, and K562. “Though we observe moderate to large cell-type variability in our data, we do expect inDelphi to be relevant beyond these five cell types to other mammalian cell types. Human embryonic stem cells, for instance, are likely to have similar repair outcomes as mESCs,” the research team writes. They do not expect inDelphi to generalize well to bacteria, plants, and non-mammalian eukaryotes such as yeast. inDelphi was also trained on Illumina data, unable (at least before last week) to detect long deletions.

But Sherwood believes even the proof-of-concept will and should cause some paradigm shifts in the way researchers think about CRISPR-Cas9 editing. End-joining was believed to be “causing random changes, causing random mutations,” he said. “When we work with template-free genome editing, people have always thought of that as less predictable and only useful for breaking things. And now we’re showing that it could be useful for gain-of-function or for disease repair.”

InDelphi helps predict which break sites are the best to target regardless of the repair mechanism you are hoping for. At certain genomic sites, one particular mutation dominates; the team used the term "precise-50" to indicate when a single such mutation comprised more than 50% of all major editing products. But only between 5% and 11% of Cas9 guide RNAs met the "precise-50" standard.

“What we found was that there are different guide RNAs you can use. Some of them give precise outcomes; some of them don’t. Given that information, even if your goal is just to break a gene using CRISPR, if you want to use that therapeutically, controlling the way that gene is broken adds a lot of value.”

The team will now work toward understanding why certain insertions or deletions are so much more common than others, and what impacts precision. Could we raise the bar to a “precise-90” standard? Sherwood also sees value in looking at cell types where “template-directed repair isn’t really applicable, like neurons or muscle cells or liver cells,” he said. He’s eager to move into animal tissue models of disease.

But he emphasized that this type of CRISPR-Cas9 editing is already happening; in fact, it’s in human clinical trials.

“In terms of the disease-repair mechanisms that we’re proposing, we’re far from wanting to test those clinically. At the same time, there are clinical trials already starting based on the exact procedure that we’ve now shown has a dimension people were ignoring,” Sherwood said. “It will be interesting to see how the regulatory agencies deal with this new information.”