Open Science, Open Data, Open Access

By Matt Luchette

April 16, 2013 | BOSTON–In two compelling presentations at the Bio-IT World Conference* last week, Atul Butte and Steven Salzberg provided formidable advocacy for the virtues of open data and open science.

Salzberg, a computer scientist at Johns Hopkins University, accepted the 2013 Benjamin Franklin Award for Open Access in the Life Sciences for his work promoting “free and open access to the materials and methods used in the life sciences.”

Salzberg is perhaps best-known for developing a series of popular open-source software platforms, including Glimmer (a bacterial gene finder) and the Tuxedo software suite of next generation sequencing tools (see, "Steven Salzberg on Microbial Genomes, Open Access, Flu Shots, and Gene Patents") . Salzberg insists the software stay open-source because “free software gets used.”

Salzberg has also been a fervent advocate for a more open atmosphere in science for over a decade. In a 2003 letter to the editor of Nature, he asserted that “genome data-collection projects should be freely available to the entire scientific community, immediately and with no restrictions or conditions.” In a paper last year on “the perils of gene patents,” he argued that “gene patents are antithetical to scientific process.”

After accepting the award from Jeff Bizzaro, president of Bioinformatics.org, Salzberg delivered a fast-paced talk on three components of open-science he feels are essential: free software, open data, and open access publication. He discussed some of his lab’s accomplishments in encouraging researchers to be more open with their experimental data.

For the Influenza Genome Sequencing Project he co-founded with NCBI’s David Lipman, for example, Salzberg set out to sequence strains of the flu, a project he hoped would help researchers develop therapies and “improve understanding of the overall molecular evolution of influenza.” The project helped sequence more than 10,000 flu genomes. Even more surprisingly, though, the group published the genomes in real time to assist collaborating research teams. Prior to his project, “the community wasn’t doing that at all,” he said.

Salzberg ended on the importance of open-access publications in disseminating knowledge.

“We already write the papers, we already review the papers, why can’t we be the ones who publish?” he asked. The Public Library of Science (PLoS) was one of the first open-access publishing projects to employ this model—and earned co-founder Michael Eisen the first Benjamin Franklin award in 2002—but Salzberg hopes this method will become ubiquitous. He added that it was important for researchers to make the raw data behind their published work available as well. Such an approach could provide scientists a much wider data pool than what they alone can produce in their lab.

“Open science makes us free!” Salzberg remarked in closing. “It allows us to do our work and not worry about all the restrictions on it.”

Open Science vs Free Tools

Salzberg’s talk echoed many of the themes in the preceding keynote from Stanford University’s Atul Butte, who highlighted how open access to experimental data could democratize science (see, "Bits and SNPs: Atul Butte and Medicine in the Era of Big Data").

His talk came just two months after the White House’s Office of Science and Technology released a memorandum directing federal agencies with more than $100 million in R&D expenditures to develop plans for making their experimental data publicly available. With vast, publicly available data libraries, perhaps one day, Butte mused, students could create biotech startups “out of their garage,” the way many technology startups began in the past few decades.

“If you’re not going to do this, try and get your kids interested,” Butte challenged the packed audience. He sprinkled his talk with examples of commercial services such as Assay Depot, offering easy access to cell lines, animal models and so on, greatly expediting new experimental ideas.

While the plenary speakers suggested the halcyon days of open access in biomedical research are on the near horizon, the tone was a little different in the exhibit hall downstairs.

While many company representatives at the conference agreed that open-source software and open access to experimental data provide enormous benefits for researchers, they argue a "pay for profit" model of resource distribution provides advantages that open-access platforms aren't prepared to address.

Some companies stressed that while publicly available experimental data sets will be invaluable for researchers, not all of the data is created equally. Thomas Reuters’ program MetaCore, for example, providers its clients access to a “high-quality, manually-curated database” formed from “2,700 scientific journals, reviewed by PhD and M.D. level research professionals.”

“We don't just look at the data,” said one Reuters representative. “We look at the experimental design for appropriate controls and that the conclusions are valid."

Source Code

Other companies emphasized the on-demand support network users can turn to when they encounter problems. “When I have a problem with [an open-access coding program like] Python, I can look at the source code or send an email and hope somebody responds,” a representative from Wolfram explained. But when software is purchased, he continued, there’s a team of paid developers you can turn to if things go wrong.

Like MetaCore, in addition to a support staff, Wolfram’s Mathematica provides users with “gigabytes of carefully curated and continually updated data” from multiple academic fields, in addition to the program’s computational capabilities.

“It’s fine if you’re just starting out and don’t have a lot of money,” one Pekin Elmer representative said about open-software, “but it isn’t scalable to a larger company with lots of researchers,” where lost time due to software issues means lost money.

But for many researchers, the rate that the curated databases are updated or the software is upgraded just isn’t fast enough.

“If I’m a researcher, open-source software is always going to be better,” said one representative from Seven Bridges Genomics. Open-source genome analysis programs provide scientists with state-of-the-art computational power, he explained, and are updated faster than many for-profit products as distributors manage licensing and patent logistics before release. Scientists can adapt the software to fit their evolving needs whenever they like. What companies like Seven Bridges provide instead is the ability for researchers to customize their experiments and analysis with a team of genomics experts, as well as the infrastructure to run their analysis and store their data.

The advantages and drawbacks of open-science versus for-profit research resources have been a popular conversation topic in academic and industry circles recently. And as seen in the Myriad Genetics patent hearing at the US Supreme Court earlier this week, voices on both sides have grown more resolved.

While open-software has taken up strong roots in many research fields, and experts like Salzberg or Butte are confident open-science could revolutionize the way scientists conduct their research, companies still argue that the cost of for-profit resources is not likely to exceed its value anytime soon.

*Bio-IT World Conference & Expo, Boston, April 9-11, 2013.