By Aaron Krol
January 30, 2014 | This Monday, New York-based startup GenePeeks announced that it has been awarded a patent for a set of algorithms that use genotypes of two parents to create hypothetical “digital children” following natural patterns of inheritance. Last year, Bio-IT World spoke with the founders of GenePeeks, CEO Anne Morriss and CSO Lee Silver, to learn how the company will use these algorithms to improve risk assessment for prospective mothers who plan to use sperm bank services. By genotyping both the mothers and donors, and combining their information into digital children, the company’s Matchright service will be able to flag risks of inheriting rare genetic disorders, and point mothers toward donors with the lowest risks. This week’s news gave Bio-IT World an opportunity to speak to the GenePeeks founders again, and learn more about the computational process at the heart of this model.
Silver, a professor of molecular biology at Princeton, has been developing the Matchright algorithms since 2008. He designed them to be completely flexible with the type and quantity of genetic information they can handle, so that as GenePeeks scales up its genetic screens, the software will already be in place. “It is scalable to a whole genome,” says Silver. “So it actually could be anything as simple as SNPs from a SNP Chip.” GenePeeks will launch its service with a panel somewhere between these extremes: a targeted exon sequence of around two million bases across 500 genes, all associated with rare genetic disorders.
Since each parent may carry two different alleles for each of these exons, Silver’s algorithms use a Monte Carlo process – repeated random sampling – to move a randomized assortment of alleles from each parent into each digital child. Only after the children are created is any genotype actually tied to disease risk. In other words, the program is uninterested in carrier status: nothing in either parent’s genome is labeled as conferring a risk of passing on a disorder. Instead, the digital children’s genomes are analyzed to see if a disorder would actually be expressed. “So each genome is discrete,” says Silver, “and even though each genome is hypothetical – it doesn’t actually exist – we treat it as a real, individual genome, and we do a disease risk analysis on that individual genome.”
The information linking genetic variants to disease is stored in an in-house database that draws from public sources like ClinVar, OMIM and dbSNPs. After calling the disease status of around 1,000 digital children for each parental pair, the program comes up with an overall risk analysis for that combination of parents.
“Right now,” Silver adds, “the way the industry operates is people get carrier tested, and people are told whether they’re carriers or not – even though that information, if they don’t have a partner, is almost irrelevant.” By ignoring carrier status, the Matchright service is more responsive to real-life risk. It also makes it easier to evaluate mutations that haven’t been clinically validated as causes of disease, but that may still confer risk.
The carrier status model essentially treats disease as an “on-off” state, but GenePeeks instead looks at genetic disorders in terms of how much normal protein is expressed. It can even look at novel mutations that have never been seen before, by estimating how those mutations might affect protein synthesis. “Part of our algorithm is the use of molecular biology… We look at how much functional protein is likely to be made by one copy of the gene in the genome, we look at functional protein by the second gene copy, and we add those together,” says Silver. “That’s giving us our first-level risk.”
This expansive approach to mutation means that GenePeeks takes into account far more of its digital children’s genomes than standard tests. Morriss uses the example of MCAD deficiency, a condition that is particularly important to her because her son is affected. “A typical carrier test [for MCAD deficiency] will look at two or three clinically validated mutations,” she says. “We look at all twelve clinically validated mutations, we look at another twenty to forty known mutations on that gene, and then we look at another 1800 points on the gene where a novel mutation is likely to occur.” While these broad brush strokes would be a poor approach to diagnosis of real people, who stand the risk of being falsely flagged as carriers, it’s a great method for minimizing disease risk when looking at digital children.
“We are overconservative,” says Silver, “and so we’re going to call risk wherever we see risk … And that means that we’re likely to eliminate 10-15% of the donors from a particular woman’s list.”
“We remove the matches where we see a heightened risk for disease,” adds Morriss, so that what customers actually see is “a personalized risk-screened catalogue of donors that’s unique for every client.” With a donor list likely to number in the hundreds, prospective mothers will still have plenty of options even after this very conservative selection process – and because they don’t see the donors who are eliminated, GenePeeks doesn’t have to worry about reporting back carrier status or which specific disorders appeared in the digital children.
The company doesn’t narrow down donors before the Matchright screening, but runs the mother’s genotype against every donor currently available through their sperm bank partners, a process that Silver says will take only “a few hours.”
“We do give people the ability to flag their favorite donors before that process,” Morriss says, “which is consistent with how people in the industry select donors now. But we really want to encourage people to make their selections after this analysis has been run.”
At the moment, the 500 conditions GenePeeks screens for are all simple Mendelian disorders, with a one-to-one relation between genes and phenotypes. Thanks to Silver’s algorithms for creating digital children, however, the company will also be well-positioned to consider polygenic traits like autism and schizophrenia, disorders whose genetic causes are complex and still little-understood. “Ultimately, the power of the technology will really get expressed when we get to the challenge of estimating disease risk for complex diseases, where clusters of genes are involved,” says Morriss, although she stresses that “there are very powerful and relevant interaction effects even when you’re talking about single locus diseases.”
GenePeeks expects to launch its service this April, and has already established partnerships with two sperm banks: Manhattan Cryobank in New York and the European Sperm Bank USA in Seattle.