By Matt Luchette
May 15, 2014 | In the era of big data, drug developers are pushing for data
analysis to keep up with increased data acquisition. At this April’s Bio-IT
World Conference & Expo, Andrea DeSouza, from the Broad Institute, and Anne
Mai Wassermann, from Novartis, spoke on new technologies they hope will squeeze
meaning from researchers’ mounds of data.
At the Broad Institute, DeSouza, the Director of
Informatics, Data Analysis, and Finance, noticed a bottleneck: scientists
studying molecular probes, compounds that help characterize biological pathways
and spur drug development, were producing data faster than it was being
analyzed. These probes were often assayed at multiple centers, and to share their
data, researchers had to write scripts that would annotate their results,
taking away time they could otherwise use for data analysis.
DeSouza’s goal, she explained, was to help scientists find
“that meaningful data set so you can speed up hypothesis generation.”
To open the bottleneck, DeSouza, with a team from seven
other research institutes, developed the BioAssay Research Database (BARD). The
project, De Souza explained, had three aims: to understand the data scientists
were producing, clean up the data, and “mask the complexity from the scientists
in the lab.” By annotating the data and making it more accessible, DeSouza
hopes BARD will improve communication within multicenter projects and speed
Starting in early 2012, BARD was built as an open source
program to help scientists query data from the Molecular Libraries Program, an
NIH-funded initiative to accelerate probe discovery. Today, BARD recognizes
over 2,000 assay definitions, houses data from over 3,000 experiments and 100
molecular probes, and supports 15 plug-ins for further data analysis.
A key challenge in developing the program, DeSouza
explained, was handling the vast diversity of terms scientists use to report
their results. While analyzing the PubChem database, for instance, the team
found that scientists used 1,800 different phrases just to represent the terms
“percent inhibition” and “10 uM.” How could the program annotate data, when
everyone reports data differently?
To handle the diversity, BARD controls the vocabulary
scientists use for reporting their results. But it hasn’t been easy, says
DeSouza. “The harder part was getting [scientists] to engage with [BARD] as
they were cleaning up the data.” If scientists were going to use BARD, the
program needed to be nimble and quickly adjust to how scientists wanted to use
it. “Don’t stand in the [scientists’] way of a new term being added to the
system,” DeSouza concluded.
A Vision for Drug Candidate Analysis
Where BARD aims to improve analysis across multiple
institutions, companies like Novartis have been improving data visualization
within their own company. In a talk shortly before DeSouza’s, Dr. Anne Mai
Wassermann, a researcher at Novartis, spoke about the company’s suite of three
visualization tools: HTS-Explorer, Chemotography, and ConTour. The tools work
synergistically to help researchers identify promising drug candidates.
HTS-Explorer, the base of the suite, is aimed at medicinal
chemists, said Wassermann. A chemist testing a compound’s activity against a
specific protein may be interested in how similar compounds have fared in the
past. Explorer lumps Novartis’s compounds into chemical classes, and colors the
compounds based on how they have performed in previous screens against the same
protein family. The program can also be run in Spotfire for further data
“It’s the best of both worlds,” said Wassermann. “Flexible
data visualization from Spotfire, as well as data annotations from HTS-Explorer.”
But the same researcher may also be interested in the
chemical similarity of compounds that affect a particular biological pathway.
Like Explorer, Chemotography encodes chemical similarity by color, but overlays
those colors on the scientist’s target pathway. The tool is meant to illustrate
the diversity of compounds that hit certain pathway elements. Chemical classes
that affect a particular target more than others could point the development
team towards more promising drug candidates.
However, chemical classes aren’t the full picture –
chemically similar compounds may have different biological activities.
Similarly, some compounds may have a similar mechanism of action, without being
in the same chemical class. ConTour clusters compounds based on their
biological activity, and like Chemotography’s chemical classes, shows which
clusters may selectively affect a specific target.
Hit assessment isn’t just about picking the most potent
compound from a screen, Wassermann explained. Assays only represent a part of
the biological picture. Novartis’s visualization tools hope to accelerate drug
development by understanding how candidate compounds work in their biological
Both BARD and Novartis’s suite are built to be seamless
tools that help scientists’ data analysis keep step with data production. But
in building BARD, DeSouza recognized how challenging it is for scientists to
communicate their data outside of their own lab notebooks. Much of BARD’s
initial development depended on student interns who entered data for the
scientists. DeSouza hopes the project helped the students understand the
importance of high quality, well annotated, and easily communicated data in
their own careers. “Without the help of the students,” she said, “I’m not sure
we would be where we are today.”