Eugene Myers prepares to BLAST off again.
By John Russell
January 20, 2009 | The gratuitous use of BLAST here will probably make Eugene (Gene) Myers cringe—not that the Howard Hughes Medical Institute (HHMI) investigator doesn’t want to repeat his past success, this time in imaging informatics. Myers and colleagues at the NIH developed BLAST in the 1990s to cope with the rising tide of sequence data. After writing an audacious proposal on the theoretical shotgun sequencing of the human genome, he lobbied to join Craig Venter at Celera, where as VP informatics research he put his computational chops on the line.
Having returned east from UC Berkeley in 2006 to become one of the founding investigators at HHMI’s Janelia Farm (see, “Marshall’s IT Plan for Janelia Farm,” Bio•IT World, October 2006), Myers is again wading into a data flood. But instead of dealing with the puny 3 billion base pairs of human DNA, now he is tackling the 4.2 trillion voxels required to image a mouse’s brain. Affordable and improved imaging technology is promising to light up the molecular landscape of living systems with far-reaching impact on basic research, drug development, and in the clinic.
Myers points to two major technology jumps in microscopy in recent years. First is the ability to capture information digitally. “CCD (charge-coupled device) detectors that are cheap and high resolution and sensitive are a recent development. We’ve only had really great ones for the last five or so years,” he says. Second, Myers points to the development of fluorophores that can be expressed. “We have the entire genomes of many organisms so that we can make any part of the genome glow and any part of protein that gets produced by a genome, we can make glow.”
“Now, we have the opportunity to observe in vivo, in situ, in the cell directly, the expression of genes. This is much different than [microarrays], which just give you a readout for how much there is in some gamish of cells. Here you can go into individual cells and see where it is, and see how much of it there is, and what the distribution is. This is much higher dimensional data, and qualitatively much more interesting information. Of course it’s harder to get, but it is getting progressively easier to do this.”
The key problem, as Myers puts it, is that, “People have the ability to generate lots of data but they don’t have the software infrastructure for handling it. Basically people are rolling their own and in a lot of cases they’re asking for help.” The situation is not unlike genomics in the ’80s, he says—a rapidly emerging technology with software development playing catch up.
“That’s kind of why I’ve entered the arena,” says Myers. “There are new computational challenges and problems, some of which can be solved in part by using existing methods in the imaging literature, but they are not solved adequately by those techniques. Moreover nobody is in a position to deploy these things in the contexts that are arising.”
“I kind of placed my bet. I’ve sat myself down right next to the biologist… that’s really the only way that it works. If you try to do it remotely, it doesn’t work very well. You have to really immerse yourself in the pipelines that are producing the data.”
Myers hopes to hit another home run—like BLAST—but the work is challenging. “I’m having more trouble finding the killer app, but I’m convinced that a couple of killer apps will ultimately emerge,” he says. Less certain is whether this field will become as big as genomics, because it is somewhat more technology-intensive and cross-disciplinary.
Myers believes imaging will be a bigger driver of knowledge in cell biology, protein interactions and behavior and systems biology. “It’s going to create opportunities that don’t otherwise exist and create knowledge that it would be very, very hard to get by looking at an array of numbers that tell you kind of generally how [gene] expression went up or down in a collection of thousands of cells.”
Myers knows something about wrestling with data challenges from his Celera days, so he’s not fazed by what lies ahead. “We’re going to generate data sets that are larger than the Celera data set,” he says. The data will fall into three broad categories: 1) Looking inside cells to observe proteins and various elements; 2) looking at collections of cells to see how they are organizing and interacting; and 3) interpreting video or other imaging data that indicates the behavior of the resulting systems like a mouse moving or a fly flying.
Imaging an entire mouse brain represents 4.2 trillion voxels, or terabytes of raw data requiring interpretation. “That’s a big number,” jokes Myers. Interpreting those data begins with a large Beowulf cluster. “The unusual thing about our cluster, unlike say a Google-type cluster, is that our machines are very high memory. We have more expensive machines and the reason for that is that images are very large and you want to operate on large 3-dimensional arrays. It’s really kind of a convenience; we’re buying our way out of a hole rather than really struggling with it.”
Myers applied a similar buy-their-way-out strategy a decade ago at Celera. “We were buying a 64-gigabyte memory, which was actually one of the largest commercial memory you could buy. Now we routinely have quite a few processors with those big memories on them. The smallest memory on any of our machines is 8 gigs.” Moving the data is another headache. “You’ve got to get 4.2 terabytes out of the disk system to the various processors. So you have to move huge volumes and so distributed file systems are very important to use.”
Myers plans to release tools using open source. “It’s been extremely tricky to go commercial anyway in the scientific enterprise,” he says. “It’s really hard to find an edge where you have exclusivity and customers willing to pay the requisite overhead. I’m more interested in getting stuff out there.”
Beyond his own research, Myers sees huge opportunities for imaging in diagnostics, while stressing that he’s merely a scientist, not a physician. He cites work of Walter Schubert, Otto-von-Guericke-University Magdeburg (Germany), Molecular Pattern Recognition Research Group, Institute of Medical Neurobiology, from a few years ago, who was staining cells for molecular targets. “As soon as you put down this marker for basically the presence of an antibody, you could see that the antibody in one case had penetrated into the skin layer and in the other it hadn’t.”
In cancer, Myers says it will be possible to mark a cancer with certain reagents and look for markers. “It will be about the distribution of that protein and its presence in certain cells that you can’t get from an expression array. It’s that high-dimensional aspect of actually seeing the distribution and the pattern in an actual histological context that will tell us what the disease will be.”
High-throughput microscopy will enable pharma teams to screen the uptake of a chemical into thousands of cells, with a digital readout as to whether the cell is dividing.
Myers’ major ongoing research projects, both involving C. elegans, are developing a single-cell expression atlas of C. elegans, and studying the biophysics of mitosis. Further interests include developing a complete light level atlas of the fly’s brain with its complete developmental trajectory. His group is also working on behavioral scans—recording mouse whiskers while the animal is being studied electro-physiologically, and trying to build a high-throughput microscope to capture high-dimensional, entire volumes of brains to understand stochastically the fine-grained flow of neuronal information.
Asked to divulge more about the microscope, Myers demurs. “I want to keep my edge for a while longer.”
This article also appeared in the Jan-Feb 2009 issue of Bio-IT World Magazine.
Subscriptions are free for qualifying individuals. Apply today.