By Kevin Davies
May 14, 2010 |
Scientists at the Kurchatov Institute in Moscow, the former Soviet atomic research center, have sequenced the first Russian genome. The results were published late last year in Acta Naturae, a new Russian journal, but news has been slow to emerge.
Egor Prokhortchouk, second author on the paper, spoke to Bio-IT World about the study. He speaks fluent English, having spent five years working on DNA methylation with Adrian Bird at the University of Edinburgh.
The first author is Konstantin Skryabin, who has a long-standing interest in DNA sequencing. In 1977, he co-authored some of the first RNA sequences in Nature using Maxam-Gilbert sequencing while a postdoc with Walter Gilbert at Harvard.
The paper’s senior author is the director of the Kurchatov Institute, Mikhail Kovalchuk. “It was his initiative to bring physicists, biologists, informatics people, and cognition scientists all together under one roof,” says Prokhortchouk. “He was the key person in getting the money, the interface between the government and the institute.”
The Russian paper describes the normal genome assembly of a Russian male who happens to have renal cancer (see below). The published sequence is derived from healthy tissue; the results of sequencing the renal tumor have been submitted for future publication.
The sequencing equipment arrived in December 2008 – three Illumina GAIIs (due to be upgraded soon to IIx shortly) machines and two SOLiD instruments (since upgraded to 4.0). The sequencing was done in the summer of 2009 using both platforms – 60 billion bases on Illumina (14 runs) and about 40 billion on SOLiD (5 runs, dual flow cells). Data generation took about three months, as Prokhortchouk admits his team was newcomers in bioinformatics analysis. Help was provided from BGI (Beijing Genomics Institute), with a visit from BGI director Yang Huanming, and a reciprocal visit to Shenzhen. “We spent more time on bioinformatics than on generation of the data,” says Prokhortchouk.
There were advantages to using two systems, says Prokhortchouk, not least help in obtaining competitive pricing. “On the other hand, we wanted to buy more machines, and we still have this intention, but we had to decide which platform we should move further on. An aim was to compare the productivity of the future of these machines to choose the best performance.”
Using Illumina, the Russian team obtained only 75% coverage of the genome, but Prokhortchouk says it discovered about 1.82 million single nucleotide polymorphisms (SNPs). With SOLiD, “we covered 95% of the genome, but found only 410,000 SNPs (covered at least 15 times),” says Prokhortchouk. “But when we combined the data, we ended up with 2.92 million SNPs, because of the complementarity of the data. About 1.47 million SNPS were novel (compared to dbSNP).”
The bioinformatics analysis focused on mapping SNPs and small insertions/deletions (indels). The Russians used software from vendors (Eland and CoronaLite), from BGI (Beijing Genomics Institute), open-source BWA as well as some homegrown tools. (Malaysia-based Novocraft was used for evaluation but its results were not included in final paper.) Reads that were not mapped were used for de novo contig assembly. In short, Prokhortchouk says, “It’s not an ideal genome, but of comparable quality with what I’ve seen in the literature.”
The DNA donor was selected from a large pool of more than 1,300 volunteers who are participating in an unpublished genotyping survey of Russia’s ethnic populations. The goal of that study is eventually to analyze 4,000 people from about 40 nationalities living around Russia and surrounding territories, from Uzbekistan in central Asia to the Arctic Ocean in the north, and from the Pacific to Poland in the west (essentially spanning the former Soviet Union).
“Using Principal Component Analysis (PCA), we could put different individuals into various ethnic groups governing this geographical distribution,” says Prokhortchouk. “You can distinguish Russians as an ethnic group from Tatars or Poles, Siberia, and so on.”
Based on the PCA, Patient N was selected as an archetypal Russian genome. “Mathematically speaking, he’s Russian!” says Prokhortchouk. “I don’t know anything about his parents or what language he speaks or where he lives, but I know that under mathematical rules [PCA], he’s Russian!”
He is also a renal cancer patient, which provides a further rationale for studying this genome. That work is ongoing. The Prokhortchouk and Skryabin labs are part of the International Cancer Genome Consortium (see Nature, April 2010). Preliminary analysis of the renal cancer focused on SNPs that have been associated in genome wide association studies [GWAS] in renal cancer. “Using linkage disequilibrium, you can trace the SNPs [in the Patient N genome] close to the marker SNPs and go to particular exons in particular genes.” One of those looks particularly interesting, he said, but declined to elaborate.
Although the decision to publish in an unknown Russian journal, which is not yet recognized in PubMed, stifled media attention abroad, the paper attracted considerable media attention inside Russia when it was published last year. The big question on reporters’ minds was: Who is Patient N? “They thought it was Prime Minister Putin. It’s not true!” said Prokhortchouk. Speculation was not unreasonable, given that according to Wikipedia, the institute director Kovalchuk’s brother is described as the “personal banker” to Putin.
“We got informed consent from this individual,” continued Prokhortchouk. “I know his name, but he is not a politically significant person. He’s a normal guy, about 60 years old.” Apparently the donor was eager for some publicity, but the rules of Russian Oncology Center under which the sequence data were submitted to NCBI prohibited disclosure. The sequence data were submitted to NCBI on March 20th.
Prokhortchouk admits that his colleagues could have chosen a better known journal, but decided to support a promising new Russian publication. “The Russian Federation thinks this journal can be a stage for debating science and publishing interesting Russian papers. As we used Russian taxpayer money, not international grants, and because the publishing time was very quick [a few weeks], we decided this should be published, increasing its strength and attractiveness for other people.” Follow-up studies on population genetics and the renal cancer genome have been submitted to a pair of leading journals.
This first Russian genome is but a first step, says Prokhortchouk. “In terms of the infrastructure and capacity of the center, it just says we can do this kind of research in this country. This is just equipment we’re buying from Americans. We can get this equipment to work, connect it to supercomputers, and we can compute and understand the data.”