By Allison Proffitt
May 12, 2014 | Helen Berman, a professor of chemistry and chemical biology at the Center for Integrative Proteomics Research at Rutgers University, was honored with the Benjamin Franklin Award from Bioinformatics.org at the 2014 Bio-IT World Conference & Expo for her work building the Protein Data Bank (PDB). The Franklin Award has recognized contributions to free and open science since 2002.
Berman serves as the director of the Research Collaboratory for Structural Bioinformatics (RCSB) Protein Data Bank.
The process of building the PDB took 40 years, Berman said, and it shouldn't have.
"We need to collapse the way we think about things, and go a lot faster... so that all disciplines can open up their data and share data. It's the only way we're going to get advances in science and medicine."
Benjamin Franklin, for whom the award is named, "had huge knowledge about science, about technology, and--what I think is most important--community," said Berman. "Those three areas must come together in order to create an open environment."
For the PDB, the science and technology evolved together since the crystallization of the first protein in 1934. Scientists began by focusing on RNA, progressed to DNA, and can now characterize large macromolecular machines. On the technology side, X-ray diffraction gave way to electron microscopy then to high throughput structural genomics and hybrid methods.
But the key component that led to the success of the PDB, Berman said, was the community that evolved to support the open science effort.
Berman traces the PDB's theoretical genesis back to the work of John Desmond Bernal, who along with Dorothy Hodgkin crystallized the first protein.
J. D. Bernal was a charismatic man and social thinker, Berman said. "He influenced the way in which this whole community has operated from the beginning in terms of data sharing, in terms of wanting to be a collaborative community."
In practice, the PDB had its birth in a 1970 petition that Berman and colleagues signed. "I'm not even sure who we thought we were writing this petition for!" Berman said. But a year later, at a Cold Springs Harbor event organized by Jim Watson, Walter Hamilton, a crystallographer at the Brookhaven National Laboratory, agreed to host the Databank.
"The first thing [Hamilton] did after saying, 'Yes, we'll do it at Brookhaven,'" Berman remembers, was, "he then flew over to England and talked to his colleagues at Cambridge, because he knew what very few people knew back then: that you can't do anything unless you make it global and share with other countries."
That Databank was announced in an October 1971 article in a Nature journal called Nature New Biology, and included seven structures. "All that fuss over seven structures," Berman commented, "but it turned out to be very important."
When the PDB launched, there were no requirements that data be submitted to the Databank, Berman said. "But community attitudes from the beginning evolved in such a way that we could create this database and also create the standards and figure out how to operate between data sources."
In the early 1980's, one researcher called it "immoral" not to contribute data to the Data Bank. Later, contribution became mandatory for publication.
From there, the community worked to develop standards and a dictionary; to form an international organizing collaborative; to include experimental data; and most recently to validate and curate incoming data.
The community "self-policed," Berman said, on each of these issues, pushing toward a more open, rich, and organized dataset.
Today the Protein Data Bank comprises nearly 100,000 3-D structures stored in an worldwide archival database accessed by a wide range of researchers, many of whom are not structural biologists. It's interesting to see how the Data Bank is being used, Berman said. "Sometimes it has nothing whatsoever to do with biology! Basically, we just have a really great dataset to do statistical analyses."
The PDB's success lies in the fact that the data is worth keeping and interesting to use and access, and that the data archiving technologies--the underlying IT--are constantly updated.
But most importantly, Berman said, is understanding the communities of data producers and data users and getting them to work together. While the science and technology are important, engaging the invested communities is essential to the achieving open data.