April 12, 2007 | Last month, the new CAMERA database officially opened for business. CAMERA stands for the Community Cyberinfrastructure for Advanced Marine Microbial Ecology Research and Analysis, and was developed to store and disseminate the flood of genetic data being generated by work such as J. Craig Venter’s Global Ocean Sampling (GOS) expedition.
The launch of the database coincided with Venter’s long-awaited follow-up publication to the initial GOS study, published in 2004. According to Venter, in a video commentary accompanying the publication of three papers in PLoS Biology last month, CAMERA will “not only house the sequence data but all the metadata associated with it. All the sequences and assemblies will be there, along with geographic data, site coordinates,” and more including satellite photos and high-definition media.
Sea of Sequence
In the new reports, Venter and colleagues describe the results of a 5,000-mile voyage of the Sorcerer II, Venter’s “floating lab,” that began sampling marine microbes from Nova Scotia to French Polynesia in February 2003. After the Sargasso pilot study (see Venter Makes Waves Again, Bio-IT World, April 2004), Sorcerer II sailed down across the Caribbean, through the Panama Canal, and into the South Pacific. The team dropped anchor every 200 miles to collect ocean samples, including in several locations around the Galapagos Islands.
The results of Venter’s latest metagenomic survey of marine life are mind-boggling. Sequencing 6.3 billion base pairs of DNA (7.7 million sequencing reads), Venter’s team identified 1.2 million new genes. 85% of the sequence data could be assembled; 57% of the data that wasn’t assembled was essentially unique. In total, researchers identified genes for more than 6 million proteins covering nearly all prokaryotic protein families with some representing new families.
The team studied 41 samples (including data from the pilot study) of marine planktonic microbiota collected from water at the ocean’s surface, about one foot deep in Ecuador, to more than 4,500 meters deep off of Mexico’s Yucatan Peninsula. The filtered samples were subject to genome shotgun sequencing and assembled using a modified version of the Celera Assembler program.
“Our results highlight the astounding diversity contained within microbial communities, as revealed through whole-genome shotgun sequencing carried out on a global scale,” wrote Venter and co-authors. “Our ability to make these observations derived from not only the large volumes of data but also from the development of new tools and techniques to filter and organize the information in manageable ways.”
Lead author Douglas Rusch told Bio-IT World, “Some samples were similar to each other but geographically separated.” Some cyanobacteria found in the Caribbean and in the Pacific west of Central America were identical except for higher levels of phosphate binding proteins. “The Atlantic reads had the higher abundance of phosphate proteins,” says Rusch. “You can’t tell the populations apart from the Atlantic and Pacific except for the phosphate-binding proteins.”
Within the abundant species found, the study showed many changes in genes across the data that suggest evolutionary adaptations, and genetically isolated populations that showed evidence for distinct environmental preference among the organisms.
For his next trick, Venter is sailing Sorcerer II back through the Panama Canal and up the west coast of North America to Alaska. The goal, he says, is “to see, by sampling diverse sites, whether we can have the CAMERA database be truly representative of the diversity of life on this planet.”
Sidebar: CAMERA In Focus
The CAMERA database, launched last month, is supported by a sizeable $24.5 million grant from the Gordon & Betty Moore Foundation. The database includes not only sequence data on millions of microbial genes but also reveals their geographic site of origin. This could be crucial not only for researchers studying environmental diversity, but in a more practical sense, should certain governments choose to exercise claims on intellectual property.
Venter says another unique feature of CAMERA is “its compute infrastructure in terms of massively parallel computing to allow researchers that log on and use this to do computes that probably would not be possible … from their own institutional computers.”
Venter reasonably predicts that this new GOS dataset with 1.2 million genes, will become one of the most studied datasets in life sciences, and that CAMERA “will be the prototype database for the future.”
Subscribe to Bio-IT World magazine.