By Kevin Davies
February 5, 2009 | MARCO ISLAND, Florida – Opening the tenth anniversary of the Advances in Genome Biology and Technology conference, Broad Institute director Eric Lander saluted the “intellectual and commercial ferment in the field” of genome sequencing that is not only generating unprecedented volumes of data, but also revealing unanticipated discoveries in biology.
“It’s breathtaking, an entire community, academic and commercial, coming together,” said Lander.
Today’s next-generation sequencing instruments are capable of producing 2 billion bases of sequence per day, but “those numbers will look puny by the next time we meet at Marco Island,” Lander predicted. “Sequencing will become a general purpose tool for so many problems in molecular biology,” he added. Just as computers used to be thought of as special purpose machines, sequencing is now the de facto tool to study a wide range of biological problems, from epigenomics and transcriptomes to cancer mutations and ancient DNA.
Lander highlighted several prime applications of next-generation sequencing. A key application of sequencing is to document variation in known genomes, and not just humans. “The organism doesn’t even have to exist to be able to do it,” joked Lander, alluding to early efforts in sequencing Neanderthal and the woolly mammoth.
Medical resequencing has been aided by the explosion in genome-wide association studies (GWAS), which has revealed hundreds of complex disease-associated genes, such as the 60 genes implicated in Crohn’s disease. But the affects of these loci can be weak. “We have barely scratched the surface of genes, of biology,” said Lander. Although researchers are doing a great job of finding common polymorphisms, success is only sporadic in the intermediate 0.5-5% range. “Below that, we can’t build complete catalogues. For each disease, we’re going to have to sequence thousands of patients,” Lander said. “We have seen sequences of individuals rolling off... In a couple of years, papers will have hundreds of individuals, thousands of individuals.”
Identifying sequence variants in other species has profound implications for biology and medicine. Work has pinpointed genetic differences that account for the outbreak of drug-resistant tuberculosis in South Africa. Just 40 variants exist between the drug-resistant and –sensitive strains. Lander also noted work in collaboration with Stanford/HHMI’s David Kingsley, on natural selection in the North American stickleback population. By collecting and performing very light sequencing on sticklebacks from 20 different environments, Kingsley’s team has identified shared haplotypes between freshwater and marine populations, documenting the history of selection in the stickleback population.
Lander also discussed applications in cancer, such as the discovery of new genes in glioblastoma. One note of caution was the need to detect very rare signals – 1 /million bases. “99.999% accurate is not good enough,” said Lander. “There’s a big difference between genotyping and finding de novo mutations in the genome.”
454 sequencing, with individual DNA reads of 200-500 bases, has become routine for whole genome de novo assembly: “Short-read technologies are beginning to take such data and assemble them into whole genomes,” said Lander. “I have no doubt that over next year or so… short-read genome assembly will become routine.”
A Vignette
Lander concluded by asking whether sequencing will enable biologists to uncover “completely new phenomena” rather than just generating more data. The answer is almost certainly yes. The example he gave comes from the field of gene regulation and epigenomics The goal, said Lander, is to create “chromatin state-maps across development and tissues. An enormous amount of biology will emerge from that.”
The example Lander discussed involves long intergenic non-coding RNAs (lincRNAs), work led by his colleagues John Rinn and Aviv Regev, published online last week by nature. Recent studies have revealed remarkable low-level amounts of transcription all over the genome. But does this low-level transcription across non-conserved regions have any physiological relevance, or is it just noise?
Until now, only about a dozen functional large non-coding RNAs (ncRNA)s have been found, such as the XIST gene in X-chromosome inactivation. Rinn and colleagues identified two specific epigenomic modifications, lysine methylations of histone H3 known as K4 (in the promoter region) and K36 (along the transcript). “If you see a K4 and a K36, it marks a gene,” said Lander. By studying DNA regions marked with this K4/K36 pattern, the group found a staggering 1600 novel intergenic transcribed K4-K36 regions.
What is the biological purpose of these lincRNAs? The genes can be associated with known pathways, such as fatty acid metabolism, p53, and developmental process, e.g. gametogenesis, brain development, immune response, cell cycle. Recently, the group has identified lincRNAs regulated by p53. Lander said that no-one has been able to identify the protein presumed to repress genes regulated by the p53 tumor suppressor. But, he said, “there’s a lincRNA, near p21, when you knock it out, it causes up-regulation of those genes that were suppressed in the p53 pathway.”
Like some of the best characterized ncRNAs, more than 50% lincRNAs have been implicated in chromatin remodeling, a glimpse of a new world of gene regulation, “anti-transcription factors that play distinct roles in shutting down genes as part of pathways.” Interestingly, lincRNAs would be excellent candidates for those GWAS hits that land in gene deserts.
PCAST
In response to a question, Lander briefly reviewed his latest appointment as co-chair of the presidential science council, along with Nobelist Harold Varmus and presidential science advisor John Holdren. In the 1950s, President Eisenhower assembled the President’s Scientific Advisory Committee. It lapsed during the Nixon administration largely because of conflicts over Vietnam, but returned under President Clinton and Bush as PCAST – the President’s Council of Advisors on Science and Technology.
When Lander met the President-elect shortly before the inauguration, Obama’s first question was: “Tell me about what’s happened since the Human Genome Project?” Lander said: “For that to be the first question in the meeting we had spoke volumes about what at least we hope the next eight (sic) years [will hold]… It’s great to know that what this community has done is on the radar.”
Lander added: “I would say, without being partisan, in the last eight years it’s not been a very effective process, because if the person asking doesn’t really care…” Lander’s comments were drowned out in laughter from the audience. He said the appointment of Nobelist Steve Chu as the president’s energy secretary, as well as the appointment of two biologists as co-chairs of PCAST, spoke volumes about the Obama administration’s approach to science.