By Kevin Davies
June 17, 2004 | Bioinformatics researchers at the University of California at Santa Cruz have discovered hundreds of sizeable tracts of DNA in the human genome that are 100 percent identical in mice and rats, as if frozen across hundreds of millions of years of evolution.
Gill Bejerano, David Haussler, and colleagues have identified 481 ultra-conserved segments, each longer than 200 bases, which are completely identical in humans, mice, and rats. These “ultras” are also 99 percent conserved in dogs, and 95 percent conserved in chickens. A further 5,000 sequence tracts over 100 bases in length are also perfectly conserved in humans and rodents, as are tens of thousands of shorter tracks.
Bejerano says the discovery, published in Science, came as a complete surprise. It’s been known for a year or two that about 5 percent of the human genome is highly conserved compared to the mouse, even though only 1.5 percent codes for proteins. Bejerano was pursuing this by studying a couple of unusually well-conserved segments in an intron (noncoding region) of a gene. “We asked ourselves, ‘How high can we crank up the conservation restriction and still find blocks of meaningful lengths shared between the three species?’” he says.
As High As It Gets
Using the Blastz program, Bejerano’s team analyzed human, mouse, and the recently finished rat genome sequences taken from the UCSC Golden Path portal, genome.ucsc.edu. “And the answer, presented in this paper is, well, as high as it gets!”
The ultra-conserved stretches of 200 bases or more are scattered across all human chromosomes except 21 and Y. The chances of finding even one such ultra-conserved element is put at 1 in 10,000,000,000,000,000,000,000! All told, the aggregated human segments contain only six single nucleotide polymorphisms (SNPs), whereas 119 would have been expected by chance -- a twenty-fold under-representation.
The function of the ultras remains to be determined, but according to Haussler, they “represent evolutionary innovations that must have happened sometime during vertebrate development.” 111 of the 481 conserved stretches overlap the RNA transcripts of known protein-coding genes, and are classified as “exonic.” Most of the corresponding “type I” genes encode proteins involved in RNA binding and regulation of RNA splicing.
Of the ultra elements, 256 are non-exonic -- that is, they do not appear to fall in coding regions of any gene; although of these, 100 lie within introns of genes. These associated “type II” genes are enriched in transcription regulation and DNA-binding proteins. 140 non-exonic elements are found in gene deserts bereft of coding sequences. The remaining 114 stretches are deemed possibly exonic elements, and require further analysis.
Bejerano and colleagues speculate that these ultras may have been preserved by virtue of a highly elevated negative selection rate, suppressing the acquisition of mutations. Another intriguing possibility is that these regions exhibit a highly reduced mutation rate, either because mutations occur less frequently or because DNA repair is more efficient.
“The only other sequences we know of that are so well conserved (in parts) are the ribosomal-encoded RNAs,” Bejerano says. “These sequences perform roles that are vital to the existence of a living cell. They give us hope that the [ultra-conserved] elements may lead to exciting new discoveries.”
Adds Haussler: “It's extraordinarily exciting to think that there are these ultra-conserved elements, so many of which are near well-studied genes, that weren't noticed by the scientific community before because we didn't have the comparative data that highlighted these regions. The real credit goes to the prodigious efforts in sequencing these multiple genomes, which have given us this tremendous opportunity, opening our eyes to these very unusual genomic elements.”