Final Keynotes at Bio-IT World Wrestle with Genetic Privacy

By Aaron Krol

May 9, 2014 | Halfway through the closing keynote session of last week’s Bio-IT World Conference & Expo, one could be forgiven for thinking the speakers were planning an elaborate heist at the J. Craig Venter Institute. Yaniv Erlich, of the Whitehead Institute for Biomedical Research, had played a video of himself breaking into a locked area of an Israeli bank using its intercom system, and proceeded to prove that he could track down J. Craig Venter’s name, address, and even ex-wives using only the noted geneticist’s publicly available genomic data. Just minutes earlier, opening speaker Heather Dewey-Hagborg had unveiled a pair of perfume-like spray bottles that, used together, would eliminate DNA traces left behind on touched objects, and replace them with a fake DNA signature. The keynote’s final speaker, Isaac Kohane of Harvard Medical School, had not yet come onstage, but one could imagine he was waiting in the getaway car.

In fact, the session was addressing the issue of genetic privacy, a subject that has been sensitive since people first started volunteering to make their genomes open to the public in the early 2000’s, and has only grown more resonant since. Now in its thirteenth year, the Bio-IT World Conference had for the first time added a track on “Data Security,” and this keynote would ring in the event’s last day by reminding attendees why the topic needed to be addressed.

Invisibility

Heather Dewey-Hagborg is a Brooklyn-based artist whose conceptual series “Stranger Visions” illustrates how casually we treat our DNA traces in a world where this material can more and more be used to personally identify us, and even reveal sensitive truths like non-paternity or rare hereditary disease. “We’re leaving our DNA around all the time, without giving it a second thought,” she says.

For “Stranger Visions,” Dewey-Hagborg collects items like stray hairs and discarded cigarette butts from around Brooklyn, not knowing who left them behind. She extracts DNA from the samples, and uses them to create face casts that try to reconstruct her subjects’ features. The resulting figures have a hypnotic, frozen quality, with docile expressions that underscore the question of their vulnerability.

It’s a labor-intensive process. Dewey-Hagborg learned to use PCR to identify individual SNPs in a New York community lab called Genspace, and checks around 50 different SNPs for each subject by hand. She focuses on loci found on SNPedia or the 23andMe website that correlate with physical features. The final product, perhaps, shows the limitations of genetic analysis as much as the perils – Dewey-Hagborg’s self-portrait, which she showed for the audience at Bio-IT World, bears a sort of family resemblance to the artist, but certainly couldn’t be used to pick her out of a lineup. Still, she’s very serious about the misuse of genetic data in more sensitive contexts, especially in regard to police action.

“Law enforcement now routinely profile individuals convicted of petty crimes, tending toward the permanent retention of biological samples and profiles, and are collecting profiles from people who are arrested and not even convicted of a crime,” she warns. She points out that DNA evidence is now considered a smoking gun despite rarely telling a conclusive story, and the increasing ease with which it can be faked. “We give way too much weight to DNA evidence alone,” she says, adding that there is a “gaping legislative hole that makes all manner of ethically questionable forensic applications totally legal.”

With this circumstance in mind, Dewey-Hagborg used her keynote address to unveil her latest art project, “Invisible.” This is the pair of DNA obfuscation products in spray bottles, which she calls “Erase” and “Replace,” and which will become commercially available in June. Dewey-Hagborg even prepared a promotional video, which has the tenor of the more sensational sort of political attack ads, complete with dissonant music and short clips of newscasters speaking about police collection of DNA, interspersed with snippets of 23andMe television ads, before cutting to Dewey-Hagborg telling the audience, “Sometimes, I wish I was invisible.” The style is tongue-in-cheek, but the concern is genuine.

Yaniv Erlich turned the conversation from the DNA we accidentally leave behind, to the information we deliberately make public. Hundreds of thousands of people around the world have now had some amount of their DNA sequenced or genotyped for either research projects or commercial services. While only a small minority, including James Watson, the ten founding members of Harvard’s Personal Genome Project, and of course J. Craig Venter, have chosen to make their data fully public, many more make their anonymized data available for research purposes. Erlich has shown repeatedly that, when it comes to genetics, data can never be truly non-identifying.

Erlich once worked in cyber security – hence his bank break-in – and has since turned the same skills toward online genetic data. In his address, Erlich demonstrated how sites like ysearch.org, which assists in genealogy studies by letting users probabilistically link Y chromosome haplotypes to surnames, can dramatically narrow the odds of connecting public DNA data to a single individual. Ysearch.org takes advantage of the fact that Y chromosomes and last names both travel through the male line. In most cases, said Erlich, the surnames retrieved in these searches occur in less than one in four thousand people. When combined with non-HIPAA-protected information like age and state of residence, this pool shrinks to just twelve people on average; in J. Craig Venter’s case, which Erlich used as a non-sensitive test run, the Y haplotype, age and state alone are enough to pinpoint just two individuals who match Venter’s profile.

The Perils and the Promise

Erlich has shared these results before, feats that have earned him the dubious label of “genome hacker.” Just yesterday, he published a commentary in Nature Reviews Genetics, with coauthor Arvind Narayanan, that provides a general overview of routes by which genetic privacy can be compromised, and possible motives for doing so.

Lately, however, Erlich has also been revealing more hopeful results from the intersection of genomics and social media.

This is the contradiction of genetic privacy: work that exposes vulnerabilities in public DNA data, like Dewey-Hagborg’s art projects and Erlich’s widely-reported genome re-identification, draws immediate notice and instinctive anxiety. Meanwhile, valuable research that relies on access to partially public genetic datasets can rarely capture the same attention.

The last keynote speaker at the Bio-IT World Conference, Isaac Kohane, worried about what will be lost if well-meaning privacy advocates convince people to withdraw their genetic data from the public sphere. Kohane has first-hand experience with the value of public medical data. At the Partners HealthCare network of Boston-area care centers, Kohane directs the i2b2 program, which consolidates a huge variety of clinical data across points of care, in an anonymized format, to give researchers rapid access to information that is normally too sensitive to view without special permissions.

The i2b2 infrastructure has provided major insights with direct consequences for public health. Kohane recounted one study that examined seasonal flu events segregated by age, and revealed that flu cycles tend to cluster first among very young children, before two weeks later reaching the elderly, the population where the disease is most likely to be fatal. This study paved the way for Massachusetts to lower the vaccination age to three years old, cutting off this cycle at the start. When it comes to health-related data of all kinds, Kohane said, “Privacy’s at stake, no doubt about it, and we care a lot about privacy. Autonomy’s at stake, and we care a lot about autonomy. But personal and public health is also at stake, and I want to emphasize this.”

Kohane feels strongly that fears about genetic data are out of proportion to its sensitivity. Showing a diagram of the data that people routinely generate – including on social media profiles, through credit card transactions, and in their interactions with the hospital system – he proclaimed that “DNA is the smallest part of all these data.” While acknowledging that nefarious uses of genetic data are possible, Kohane advocated for targeted legislative fixes, which prevent wrongful use without dismantling the public availability of this crucial information for research.

Erlich’s final presentation gave some support to Kohane’s argument. In a more recent project, Erlich’s lab has mined a public genealogy site, geni.com, for answers to a fundamental genetic riddle: are traits additive, meaning that each genetic variant contributes slightly and independently to the trait, or are they epigenetic, meaning variants rely on interactions with each other to affect function?

Erlich pointed out that, to test these two hypotheses, one would need a large set of individuals with different, known degrees of genetic relation to one another. Under an additive model, as we move from siblings to aunts and uncles to cousins to second cousins, individuals should grow less similar at a steady rate, as half of the shared variants are lost at each step. Under the epigenetic model, however, the change should be exponential, as sets of variants are broken up much faster.

Erlich’s lab used longevity as an example trait, and gathered a huge number of profiles – 44 million individuals – from geni.com’s genealogical trees. After cleaning and sorting the data, they were able to resolve these profiles into a small number of very large trees, the largest of which joined 13 million people together in carefully mapped out lines of descent. With this basic dataset, the lab computed how quickly longevity diverged as relationships grew more distant, and found that their data closely fit the additive model, while rejecting a more complex epigenetic model. The trend was even more pronounced once data on twins was added from the Danish Twin Registry.

This finding, with large potential implications for our understanding of our basic genetic architecture, would have been impossible without a massive, freely available source of individual data like the geni.com family trees. As a group, the keynote speakers at Bio-IT World struggled to reach consensus on how public genetic resources could be maintained, while offering privacy measures that will protect potential contributors. Kohane hopes small-bore legislative action can protect against malicious use of data, but as an audience member quickly pointed out, Congress has already fallen short in this regard with the Genetic Information Nondiscrimination Act, which failed to prohibit insurance companies from denying life or disability coverage on the basis of genetic information.

It also remains an open question whether legislation could ever be sufficient protection for something as ubiquitous as DNA. “If I want to analyze your DNA, it would be almost impossible to stop me from finding some trace of you that you’ve left behind, and analyzing it in my kitchen,” said Dewey-Hagborg. “So legislation will only take us so far.”

For more coverage of the Bio-IT World Conference & Expo, see:

Bio-IT World 2014: Notes from the Expo Floor

2014 Bio-IT World Best Practices Award Winners Named

Best of Show Winners Named at 2014 Bio-IT World Conference

Topple the Walls, Open the Data