BGI Researchers Publish New Structural Variation Pipeline, Push for De Novo Assembly

July 26, 2011

By Allison Proffitt 

July 26, 2011 | BGI researchers just released the single-nucleotide resolution structural variations (SVs) of an Asian and African genome discovered de novo genome assembly. The research was published online on Tuesday in Nature Biotechnology.    

Speaking about the research earlier this month at BGI’s BioIT APAC conference*, Yingrui Li, director of science and technology at BGI and the co-lead author of the study, explained that identifying structural variations is difficult, but the problems generally stem from mapping reads to the reference genome.  

“It’s very hard to align reads across a structural variation, because structural variation is not substitution of a sequence, it’s a rearrangement of a sequence,… so you have to solve the rearrangement of the structure,” Li said. “This problem can only be solved by assembly, because assembly doesn’t care about the reference genome.”  

The BGI team did just that. Using second generation sequencing data and the YH (Asian) and NA18507 (African) genomes, the team created a novel pipeline for whole genome assembly to identify small and intermediate size homozygous SVs (1- 50kbp), including insertions, deletions, inversions, and complex rearrangements with precise breakpoints and genotypes previously difficult to define by other approaches. 

Through this new method, researchers identified and validated 277,243 SVs, ranging from 1bp to 23kbp in assembled regions of both genomes.  

“We provide a new method, at a relatively low cost and high speed, to establish in greater detail the presence and patterns of SVs in different genomes, and the results have a high accuracy and a wider range of length spectrum coverage in comparison with previous methods,” said Honglong Wu, bioinformatician at BGI and one senior author of the study in a press release.  

“Now we have more than one genome assembly and we can have a little bit of insight into the biology of structural variations,” Li said. “Structural variation events are more individual-specific, that means that the structural variations in your body are more likely to be unique to yourself than a SNP would be. SNPs are typically shared among the human population. Structural variations are not. They have a higher impact on the genome,” Li contends. “One structural variation that happens in the coding sequence could be deleterious or much more negative than a typical SNP.”  

The researchers believe that the study proves that de novo assembly is crucial for developing more complete personal genomes than resequencing-based mapping, and hope that the pipeline offers a new solution for developing a more comprehensive SV map of individuals. 

“When [the industry in general talks] about the personal genome, they’re talking about resequencing.  They’re talking about SNPs with a little bit of indels and structural variations. They’re not talking about the whole, the real human genome,” Li said at the conference. “Real human genomes are much different from each other. If we really want to have the concept of the personal genome, we have to ask what the personal genome is and what’s the difference between the genomes. We have to ignore the achievements we have made on SNPs; we have to look into the difficult ones: structural variation.”  

*BioIT APAC, Shezhen, China, July 5-7, 2011