New Computational Tool from St. Jude Tackles ‘The Other 98%’ of Human Genome

August 13, 2020

By Deborah Borfitz

August 13, 2020 | St. Jude Children's Research Hospital scientists have developed a computational tool to identify cancer-causing mutations lurking in the vast and largely unexplored regions of the human genome. The cis-expression (cis-X) method is a significant departure from existing approaches that require thousands of tumor samples and only identify noncoding variants that appear frequently, which is rarely the case, according to Jinghui Zhang, Ph.D., chair of the St. Jude department of computational biology. And the algorithm can effectively search not just the simple genomes of leukemia but also the relatively complex ones of solid tumors.

Gene-coding regions constitute a small fraction of the human genome, notes Zhang. But novel pathogenic variants in the regulatory noncoding DNA of patient tumors activate oncogenes. The new cis-X analytic method zeroes in on these mutations by identifying abnormal allele-specific expression of tumor RNA.

A growing body of evidence suggests that the noncoding regions, which constitute 80% of the human genome, may regulate gene expression, says Zhang. These regions can include variants that affect the expression of nearby genes and thereby elevate cancer risk. However, because the discovery of these non-coding variants requires large numbers of tumor samples combined with genome-wide discovery methods, few are known.

In Zhang’s laboratory, genome sequencing data and RNA expression data are linked together in a two-part search. RNA sequencing is performed to identify genes that are expressed at aberrantly high levels on just one allele. The cis-X software then searches for the cause (e.g., alterations such as chromosomal rearrangements and point mutations) in nearby regulatory regions of noncoding DNA.

Gene over-expression has two possible and entirely different causes, explains Zhang. Cis-acting factors are mechanisms that affect gene expression only on the same chromosomal allele, while trans-factors act equally on both alleles.

With “cis” regulation, the allele is identifiable because the gene activation is caused by abnormality in one of the two copies in the tumor genome (assuming diploidy), she says. Only the chromosome with the abnormality will have the signature allowing heightened genetic expression, which is why it makes sense to use cis-X to identify regulatory variants controlling expression of an oncogene. “This way, we bypass the required elements of profiling the DNA [looking for epigenetic changes] … we can directly utilize patient samples to perform this analysis.”

Zhang says that researchers in the transcriptomics field have been focused on DNA mutations that can cause protein changes and then look for the chemical compounds that would target the protein-coding variants. “We know 98% of the genome does not code for proteins, but that does not mean this is junk DNA.”

Looking for the function of regulatory DNA is difficult because scientists don’t have a “readout” on what may be leading to a disruption of gene regulation in noncoding regions, Zhang continues. The only way to discover cancer-causing mutations in this case—at least before the alternative cis-X approach came along—was to conduct time-consuming experiments, such as those using cell line or all-patient xenograft models requiring vast quantities of DNA.

In a recently published article in Nature Genetics (DOI: 10.1038/s41588-020-0659-5), Zhang and her colleagues validated the cis-X approach in an analysis of the cancer genomes of 13 T-cell acute lymphoblastic leukemia (T-ALL) patients. The algorithm identified known and novel oncogene-activating noncoding variants as well as a possible new T-ALL oncogene, PRLR.

Importantly, researchers also showed the tool could ascertain cis activation in adult and pediatric solid tumors, including the childhood cancer neuroblastoma. The vast majority of tumors include cells with an abnormally high number of unevenly distributed (not necessarily paired) chromosomes, so picking out the ones that have been activated is more mathematically complicated.

Now that Zhang’s team has properly showcased the value of the tool, it is up to other scientists to use it on their datasets to expand the search. Three versions of the cis-X software are publicly available at no cost to researchers through GitHub repository (code only), St. Jude Cloud (for immediate use after data upload), and Zhang's laboratory page (software download). Zhang is encouraging people to use the St. Jude Cloud, where cis-X has been optimized for users with no formal training in computer science.

Meanwhile, she says, her group is collaborating on a project with other researchers scouting for therapeutic options to address activation of the newly discovered PRLR oncogene.

Cis-X was inspired by previous work by Thomas Look, M.D., at the Dana-Farber Cancer Institute, says Zhang, who is a co-author on the Nature Genetics paper. Working in cell lines, his team identified noncoding DNA variants for abnormal activation of an oncogene (TAL1) that led to T-ALL.

Suppressing activation of the transcription machinery in the noncoding region of the genome is but one research avenue that has opened, says Zhang. Another possibility is to try correcting the noncoding DNA as a strategy for gene therapy in treating cancer. While harder, this therapeutic tactic is already being explored for patients suffering from germline hereditary diseases, including Mendelian disorders caused by a single gene.

So, how soon might all this impact the standard of care at St. Jude? Sooner than you might think, says Zhang, especially if the target is a well-known oncogene that can be suppressed with existing therapies.