By Malorye A. Branca
September 15, 2003 | With public celebrations in 2000, 2001, and earlier this year, one could be forgiven for believing that the human genome has been sequenced in its entirety. But much uncharted territory remains in the genome. Although the International Human Genome Sequencing Consortium sequenced 99 percent of the gene-containing regions, fully 400 significant gaps remain in the sequence. Now, a group of Australian mathematicians are proposing a novel solution to that problem, and start-up Combinomics has been formed to plug the gaps.
DNA sequence can prove unreadable by conventional methods for numerous reasons. “The middle of every chromosome is tough, and the ends,” says Ellen Beasley, director of bioinformatics at Celera Genomics. “Then you have patches of tough-to-sequence bits in between.” Typically, these are repetitive stretches of sequence. High percentages of one or other set of base pairs -- ATs or CGs -- can also gum up the chemistry of DNA sequencing.
The chromosome ends, or telomeres, of the human genome are slowly being conquered, according to Elaine Mardis, co-director of the Genome Sequencing Center at Washington University Medical School in St. Louis. “But the centromeres [the middle] are still a particular problem.”
Whether any of this intractable sequence is worth anything “is an almost religious question,” Beasley says.
Many people, including Celera’s management, don’t think so; hence, they are concentrating on validating and exploring sequence they already have. But many scientists suspect there might even be genes buried in some of these regions. At the recent International Congress of Genetics in Melbourne, NHGRI director Francis Collins implored scientists to help fill in these gaps.
“In some cases, [intractable sequence] doesn’t matter; in others, it matters a lot,” Mardis says. The Y chromosome, for example, hosts many difficult-to-sequence regions that were critical for understanding its unique structure. Nor is the problem limited to human DNA. Many other organisms’ genomes also contain hard-to-crack sequence.
Error on Base
Combinomics’ approach to this problem arose through “reading a bit about DNA and thinking a bit about mathematics,” according to mathematician Peter Adams, lecturer at University of Queensland, and head of software development at Combinomics. Adams and colleagues decided to turn the sequencing paradigm on its head. “One of the major goals in sequencing is to eliminate errors,” he says. “Our goal is to introduce changes, so as to make difficult targets easy to clone and sequence.” They collaborated with scientists at the Australian Genome Research Facility to make the technique practical in vitro.
Using their Novoseq platform, the scientists introduce random mutations before sequencing multiple copies of the problem region. The native sequence is then reconstructed using proprietary software.
To test the approach, Adams and colleagues ran computer simulations in which they resequenced in silico challenging regions that had been previously sequenced by others. The simulations were run on a cluster of 200 Sun and Linux computers. All that computing power is needed only for the test simulations, Adams says — the actual reconstruction step in the technique can be done on a PC.
Starting with small regions, Combinomics is now tackling larger segments. Surprisingly, Adams says, “the coverage we need is very similar to what is required for standard shotgun sequencing.” (Some critics had worried they would need to sequence the regions up to 1,000 times over.) The company says that, with a substitution rate of just 10 percent, Novoseq can reconstruct original sequence with an accuracy of less than one error per 10,000 bases.
Ultimately, Combinomics hopes to sell Novoseq kits to “sequencing centers and laboratories around the world -- basically anyone who does DNA sequencing,” says CEO and managing director Peter Devine.