Taking a World Genography Test

Interview by Kevin Davies

June 14, 2005 | In April, IBM announced that it was partnering with the National Geographic Society to provide the computational and IT infrastructure for the Genographic Project. The 5-year, $40-million program is intended to type DNA samples from indigenous populations around the world, thereby helping to recreate the movement and development of human populations around the globe over the past 100,000 years. The IBM Program Director is Kristopher Lichter. An immunologist by training, Lichter started in biotech before joining IBM six years ago. As Kevin Davies discovered, Lichter hopes his combination of life science and IBM experience will enable him to manage IBM’s multifaceted contributions to the project.

Q: How did IBM get involved in the National Genographic Project?
A: National Geographic approached us. Spencer Wells had finished his pilot project and was looking to open the results to the world and get a larger sample population. They realized that, to be as comprehensive and as scientifically valid as possible, they needed an IT partner — someone with life science expertise who could understand why IT systems could be applicable. Their first choice was to come to us, over a year ago. We had many discussions, but we felt we were compelled to do this. We were instantly excited.

Why so much excitement for such a purely scientific project?
It really represents the type of innovative project that impacts the world in a positive manner — whether it’s through our life science expertise, computational biology center and advanced algorithms, or general IT solutions to make sure the system is as advanced, secure, and flexible as it should be. This project will be evolving over five years. This is exactly what we’re about. It’s in our DNA!

What’s the response been like inside the company?
This is why I work with IBM, to move the dial in a positive way. We’ve received hundreds of e-mails from colleagues. A note went out from [CEO] Sam Palmisano encouraging people, and the response from IBM employees has been overwhelming.

Was there any internal debate about what IBM gets out of this?
Of course, there’s been discussion, but we actually grew the course of our participation over that discussion. The National Genographic Project had a less expansive role in mind for us originally, but during that discussion, we said, we have this Computational Biology Center (CBC), and that inspired the internal and external decision-making process. There was never any doubt we wouldn’t do this. It advances our application of technologies from the CBC. We get to learn, and learning is part of who we are. We can apply these advanced technologies. And you never know what you’ll learn along the way.

What is the nature of IBM’s contribution?
IBM will be participating through IBM Research, Carol Kovac’s Life Sciences organization, and our CIO’s office, which provide the IT design systems. It takes a lot of coordination. We have to work well with the National Geographic Society and principal investigators around the world. It needs to move on with the integrity it requires. Also, through the IBM Foundation, we are directly funding the project, along with the IT systems, people, and time to make the project come together. [CBC Director] Ajay Royyuru, the CBC’s time, the technologies, and so forth.
We’re setting up a project that’s at least five years long. We intend to turn this data out to the public; it’s owned by the world at large. People will come along after us. We’re just trying to plant a very strong sapling.

Which IT and informatics systems are you contributing to the project?
We’re developing a state-of-the-art analytical capability, but as the field evolves, we’ll apply to make those technologies better. As the IT partner, IBM will always be providing a system one step ahead of where the scientists need to be. We’ll look to apply Blue Gene if it’s relevant. The central repository will house all electronic samples. We have the data collection systems of each of the principal investigators. The communication protocols to make sure the data are securely transferred and stored at the National Geographic Society headquarters [in Washington, DC]. Those IBM systems include a Linux-based Blade Center, E-Server X series link with DB2 and Websphere, as well as MQ product for communication. There’s over 1 terabyte of attached storage.

What type of DNA data are you collecting?
It’s specifically Y- and mitochondrial [male- and female-specific] DNA, to make sure we keep to specific nonmedical markers that are strictly built around descent. This helps people understand the scope of the project.

What’s the purpose of the $99.95 public participation kits?

I do think people are fascinated to know about common roots. You can become an associate researcher in this project. It’s a way for the public to get involved while learning about their deep ancestry. The response has been overwhelming. It adds to the sample base and the science. It complements the indigenous populations we’re working with. There’s a growth to the project. There’s also a feeling among the team that the legacy project needed to be done. Proceeds [from sale of the public participation kits] go back to critical infrastructure to help indigenous populations. Everybody wins. It helps the public get involved.

How will the data be shared with the research community?
Once we’ve done the initial scans, then it will be turned out to the broader community. It’s hard to know when that will be — it depends on sample collection, analysis, etcetera, but it will become public domain, a Commons ownership. We’re not patenting or owning the data. We have anthropologists, paleontologists, and linguists on the advisory board to keep the broader picture in mind. The first results will be published by the Genographic Project team — that will be the first glimpse of the data.

With this mountain of data, isn’t there a temptation to apply it for medical use, in a sort of global biobank?

Absolutely not. The integrity of the [Genographic] project is enough. There’s a tremendous amount to be gained by doing valid science for the world, letting that be owned by everyone. Let it be the legacy for the planet. It [medical use] doesn’t compute.

How are you avoiding the controversy and pitfalls that befell the Human Genome Diversity Project (HGDP) a decade ago?
In general, there are high-level goals that are similar [to the HGDP], using genetics to understand humanity better. That’s common, but there are specific differences. The specifics of this project, the clarity of the mission — we’re just studying the human journey; there’s no medical research, no intention of that at all. We won’t be owning the data.

We have indigenous leaders on the advisory board; it’s very important to be working with them off the bat. There’s no Genographic Project without the advice and leadership of those leaders. It’s that simple. There was never a point in which they weren’t going to be applying cultural sensitivity. The internal review board out of the University of Pennsylvania, to whom we applied, stressed that was an important part of the project. How we do the sample collection, how we articulate the project (on a strictly voluntary basis), or how the funding gets applied — we look to the indigenous leaders to tell us.

Spencer Wells has done a lot of very positive groundwork in his previous studies — there is trust there. In general, the response has been extremely positive. Scientists in regions not directly involved, in the field, say we’ve done a very solid job of teaching the scope of the project. It’s been extremely positive.

