By Aaron Krol
May 14, 2014 | More than a hundred companies set up booths at the 2014 Bio-IT World Conference & Expo to demonstrate their products, but it seems safe to say only one of them
could claim the Miami Heat as a customer.
Ayasdi, whose eclectic client base also includes Merck, General
Electric, Citigroup, and the U.S. Department of Agriculture, has actually published
a paper in Nature Scientific Reports
on the nature of basketball positions, in which the authors propose that the
three classic designations of guard, forward, and center can be further divided
into thirteen subcategories with distinct and definable styles of play. The
company’s flagship product, a data analysis platform, specializes in this kind
of splintering, finding meaningful groups within large cohorts where huge
numbers of variables may be in play. In the basketball study, Ayasdi considered
players’ rebounds, assists, turnovers, steals, blocked shots, personal fouls,
and points scored per minute, but in other fields of study, the datasets can
sprawl to thousands or even millions of variables.
This is especially true in the life sciences, where researchers and
even clinical labs are increasingly looking at patients and tissue samples
through the lens of genetics, RNA expression, and proteomics. This yields
massive numbers of data points that might be key to understanding why some
patients respond better to certain drugs, or have different disease prognoses.
Compute power and storage architectures are largely keeping pace with this explosion of
information, but analytical tools have been slower to adapt. The search for
meaning in biomedical big data tends to be piecemeal, zeroing in on a few of
the most promising leads without taking a holistic approach to similarities among the patients themselves.
Pek Lum, Ayasdi’s Vice President of Solutions and Chief Data
Scientist, believes her company is ready to change that, and had come to the conference in Boston to promote her company’s growing commitment to biomedicine. For ten years, Lum
worked at Stephen
Friend’s company Rosetta Inpharmatics developing pattern recognition
software for in vitro gene expression
data, continuing at Rosetta when the company was acquired by Merck. Since
joining Ayasdi in 2010, Lum has helped the company pivot toward drug
development and human health.
“We really want to make a difference to
healthcare,” she says. “Healthcare, and big pharma and drug discovery, really
needs a revolutionary application, or a whole new way of thinking about data,
and we want to be a part of that.”
The Shape of Data
Ayasdi’s software is based on topological analysis, which visualizes
relationships between data points as 2D shapes made up of interconnected nodes.
The nodes represent the subjects under analysis: basketball players, say, or
patients at a clinic. The most similar nodes will cluster together, with lines
drawn between them, while highly dissimilar nodes will be split into separate
shapes. The idea is that emergent patterns, like forks, loops, or separated
clusters, will reveal meaningfully different subpopulations in the data.
To understand the program in action, consider a study that
Lum has featured in detail on the Ayasdi
Blog. Lum and her colleagues retrieved information on 870 tumors from The
Cancer Genome Atlas (TCGA), an online database. The tumor samples came
from cases of either breast or ovarian cancer, and included both genetic data
and RNA expression levels. Lum’s team wanted to see if previous
analyses of these tumors had missed any clinically relevant details about
certain groups of cancer cases. They input each tumor into Ayasdi as a row, and each
genetic characteristic as a column.
A topological network of 870 TCGA tumor samples generated in Ayasdi software. Three different views highlight mutations to TP53, and expression levels of two related genes. Image credit: Ayasdi
Ayasdi doesn’t use any preformed hypotheses in its analysis
– Lum didn’t, for instance, tell the program in advance which tumors came from
breast cancers, or ask it to arrange samples by the expression level of a
particular gene. Instead, all the columns are sorted into “bins,” based on how
closely they hang together. If the presence of a particular mutation to gene A
correlates closely with expression levels of genes B and C, those three columns
might be binned together. Bins can also overlap, so if the expression level of
B also correlates with a mutation to gene D, then B’s expression may find
itself in two bins.
The bins are a shortcut for independently associating every
variable with every other variable, which makes the program very fast – even a
very large dataset, like 870 tumors with thousands of genetic characteristics,
can be resolved within seconds. “We can do up to a couple of million columns,
sometimes in a couple of thousand patients,” Lum told Bio-IT World.
To create its visualizations, Ayasdi sorts the nodes into a
2D network based on their overall similarity, prioritizing the bins that best
differentiate groups. At this point, with a basic topological map of the data,
users can start asking more targeted questions.
“Once you have a network, you can ask what variables best
explain a group of interest,” says Lum. “The software can run through the data
automatically, and suggest features for you to select.” You can also layer
metadata onto the network, coloring different nodes based on certain
attributes. In the TCGA study, one important piece of metadata was which tumors
came from breast cancer, and which from ovarian. Coloring those nodes revealed
that Ayasdi had successfully separated the two types of cancer, using only the
genetic data. Asking the program which features best distinguished between the
breast and ovarian tumors also revealed a number of well-known oncogenes, giving
important validation to the approach.
More uniquely, this particular study found a region of overlap between breast and ovarian cancers, which on closer
inspection turned out to be triple-negative breast cancer cases, a clinical subgroup
that has been especially difficult to treat. Probing the program on why it had
placed these tumors in between the other breast cancers and the ovarian
cancers, Lum’s team discovered that mutations to the tumor suppressor gene TP53 were very common in both ovarian
and triple-negative breast cancer, although extremely rare in other breast
“People have suspected triple-negative may be a little
different, but to show that TP53
mutations are pretty prevalent in triple-negative, which is very similar to the
profile of ovarian cancer, is a very interesting finding,” says Lum. She also
points out that the tumors in TCGA have already been studied exhaustively
through traditional methods, without giving away this potentially actionable
detail – future drug developers might now consider TP53 as a drug target for triple-negative breast cancer.
“Cancer is so heterogeneous that you might need to find more
and more subpopulations to treat it more specifically,” Lum adds. “That’s why
you need game-changing methods to get into it. If you use traditional methods
that are more for looking at big signals, all the low-hanging fruits are
Built for Biology
Ayasdi has a number of clients in financial services, and is
exploring partnerships in retail analysis. The company also has a large base of
government contracts; much of the founders’ early research funding came from
DARPA and the NSF, and big agencies like the FDA and CDC are now regular users.
But the life sciences, and especially drug discovery, require a more
“There are many things specific to biology, like gene
pathways, next-generation sequencing, compounds… which are very specific for
people doing drug discovery, patient segmentation, and biomarker discovery,”
says Lum. At the Bio-IT World Conference, Ayasdi unveiled both a new product
and a major partnership to better tailor their platform to these use
The product is Ayasdi Cure, which received a top prize at the conference’s Best
of Show competition for new products. “Ayasdi Cure is an application on top
of our core platform,” says Lum. “It is made for big pharma and life sciences,
to tackle drug discovery from end to end.” The new application brings in tools
that let users dig deeper into their raw data. A variant viewer lets users
translate data on genetic variants from their topological maps to a Manhattan
plot, quickly identifying the variants most closely associated with a trait of
interest and showing where they occur in the genome. Ayasdi Cure also has
built-in awareness of genetic pathways: rather than just telling users which
individual genes are enriched or suppressed in a subpopulation, it can highlight
an entire pathway whose expression levels are changed.
A network of compounds in Ayasdi Cure. On the right, the chemical structures of selected compounds are visualized onscreen. Image credit: Ayasdi
For drug developers, Ayasdi Cure also incorporates compound
structures, derived from the public database ChEMBL and users’ proprietary
datasets. “Any kind of compound databases will allow us to build a topological
map of millions of compounds,” says Lum, “so you can quickly search for which
compound violates the Lipinski rule, for example, or which compounds have great
PK/PD.” Highlighting a region of the map will bring up the compounds’
structures, so chemists can inspect them manually.
“You can imagine this would be very useful all the way until
you find your lead compound,” adds Lum. “And you still might want to go back
and check how your lead compounds do against another database of compounds.”
Pharmaceutical companies like Boehringer Ingelheim, which gave a collaborative
presentation with Ayasdi at Bio-IT World, have already adopted Ayasdi Cure in
their research pipelines.
Ayasdi’s second announcement at the conference was a
partnership with CLC bio (now a part of
QIAGEN), the world’s leading commercial provider of software for first-tier
bioinformatics. CLC bio specializes in turning raw sequencing reads into useful
data files, offering a software architecture that prepares samples for the more
complex, functional type of analysis that takes place in Ayasdi Cure. “Because they
have the structure already in place at CLC bio, and they also have partners
that use them for mapping VCF files and getting sequences out, it’s a win-win
solution for us,” says Lum. The two companies are now offering customers joint
access to both CLC bio’s platforms and Ayasdi Cure.
“We add a lot of value and become an end-to-end solution,”
While these new developments are targeted to basic research,
some users are already moving Ayasdi’s platform into clinical settings. With
unprecedented amounts of data on real patients from electronic medical records
and next-generation sequencing, clinical programs are producing some of
Ayasdi’s most exciting results and innovative projects.
At the Icahn School of Medicine at Mt. Sinai in New York
City, Ayasdi has been deployed to rethink some of the most common chronic
diseases in the U.S., asking if patients can be better sorted into subpopulations for more personalized treatment. Mt. Sinai has released the
results of one study of Type 2 diabetes that answers this question with a
In a study of over 11,000 diabetic patients, Mt. Sinai’s
Director of Biomedical Informatics Joel Dudley and his team input nearly
200,000 columns of both genetic markers and clinical data, such as blood levels and
complicating symptoms, into the Ayasdi platform. Their topological analysis
split patients into three major groups with little overlap: one with complications
from kidney disease, one with cardiovascular and immune disorders, and one characterized
by arterial disease and abnormal cholesterol levels. Representatives from both
Mt. Sinai and Ayasdi have suggested that Type 2 diabetes may not be a meaningful
clinical category at all; instead, patients might be better served by thinking
about three entirely new types of diabetes.
“I think this is something that every researcher in diabetes
has thought about, but it’s really hard to get to this answer,” says Lum. “I’m
not surprised at all [by these results]. In fact, I think that if we increased
the resolution*, we might even find more subpopulations. Diabetes is extremely
heterogeneous. There are probably different co-morbidities that affect
diabetes, and these patients probably respond to different kinds of medication,
and maybe have different adverse events.”
This study won the Icahn School of Medicine the Editors’
Choice Award at the 2014
Bio-IT World Best Practices Awards, making Ayasdi the only company involved
in top prizes in both the Best of Show and Best Practices competitions.
Meanwhile, Joel Dudley’s team has been scaling up the Type 2 diabetes study to
30,000 patients, hoping to find underlying genetic variants or other biomarkers
that could help rapidly sort patients into the appropriate categories, and
possibly one day lead to new treatments. Mt. Sinai is also extending
topological analysis to other chronic disease areas.
Ayasdi is hoping to pull off a similar trick with acute
asthma. U-BIOPRED, an international consortium of academic partners, patient
groups, and pharmaceutical companies, has been gathering very detailed data on
a cohort of patients with asthma, including clinical histories and medications,
but also gene expression in blood and lung cells and blood proteomics.
“Acute asthma is very much like diabetes,” says Lum, who has
been recruited to the statistical advisory board of U-BIOPRED. “They know that
there are probably many types of acute asthma, but the only way to [classify
patients] in the clinic today is to blow into this apparatus, measure lung
capacity, and say this person is mild, moderate, or severe. They know that’s
not good enough.” Using Ayasdi’s platform, among other bioinformatics tools,
U-BIOPRED wants to find subgroups that can be identified with blood markers
and treated with unique medication regimens.
Pek Lum, Ayasdi's Vice President of Solutions and Chief Data Scientist. Image credit: Ayasdi
Ayasdi has also taken on some very novel sources of data. The Michael J. Fox Foundation recently partnered with Ayasdi
to analyze data from a cell phone app distributed to people with Parkinson’s
The Foundation was not aware of Ayasdi when it first
undertook a trial study with the app, which collects audio data, gyroscopic
information, and information from an accelerometer during the course of users’ everyday movements. In a limited pilot project,
a small group of Parkinson’s patients and controls were given the app, and readings
were recorded every second for a month. When Ayasdi heard about the project,
they were excited to work with this little-explored longitudinal data, especially with such a high volume of data per subject.
“It’s a fairly new type of data, and it’s very, very noisy,”
says Lum. “Imagine anyone carrying a cell phone – you could drop it, you could
pick it up, you could go to bed or read a book. One of the challenges we talk
about is how our platform approaches this kind of data.”
Ayasdi plotted each subject into a topological map based
only on the data read out from the phones. “Not only did we manage to
differentiate between controls and Parkinson’s, we think there are actually two
subgroups,” says Lum. “Those could be mild and severe – we don’t know at this
point, because we don’t know enough about the patients. But there are
definitely two quite distinct subgroups of patients that have slightly
different movements.” Ayasdi also discovered that their maps could be resolved
using the accelerometer data alone, which, if the observation holds in larger
studies, could make later research into patient movement easier. “You can
imagine, in the future, if we do find that the accelerometer is the best, you
can give them a very nice, small wearable device on their wrists or legs that
only measures acceleration.”
Ayasdi is grateful for the experience working with
longitudinal data, which could eventually become important in clinical practice.
Mapping a patient’s movement over time might help to rapidly place them in an
appropriate subgroup, by superimposing their motion data on a known topological map –
and the same could hold true for other clinical measures. Lum imagines care
centers collecting data from their patients in real time, constantly updating
networks to find the best matches and treatments.
Lum and her colleagues think of Ayasdi’s analysis as
fundamentally a new generation of bioinformatics. Traditional analysis, in her
mind, has reached a sort of roadblock, where the limits are due less to the
amount of available data, and more to an inability to find the right questions
to ask. By visualizing data without hypotheses or suspected correlations,
Ayasdi hopes to find the hidden routes through that barrier.
*One common technique for topological analysis with the
Ayasdi software is to change the “resolution” of the networks. A lower
resolution means that columns are sorted into larger bins, while at a higher
resolution, the bins are made smaller. At the graphic level, this means that a
higher resolution network will preserve only the strongest connections between
nodes – groups that are more weakly associated tend to break apart as the bins
are split. This can be used as quality control, to show only the strongest
associations between variables at high resolutions, or only the most basic
subpopulations at low resolutions.
In one creative project, Ayasdi has even used maps at
different resolutions as a sort of data point in and of themselves. With the
FDA, Ayasdi created a network of bacterial species based on their genetic data.
Decrease the resolution, and you see more ancient taxonomic groups; gradually
increase the resolution, and you start to see the evolutionary history of these
species, as more and more closely related species break apart. In this way,
Ayasdi built a phylogenetic tree of this set of bacteria.