Eugene Myers Prepares to BLAST Off Again



IMAGING INFORMATICS

By John Russell

Oct. 23, 2008 | OK, the gratuitous use of BLAST here will probably make Eugene (Gene) Myers cringe – not that the Howard Hughes Medical Institute investigator doesn’t want to repeat his past success, this time in imaging informatics. Myers and distinguished colleagues at the NIH invented BLAST to cope with the growing flood of sequence data that was then befuddling researchers. He later served as VP Informatics Research at Celera and spearheaded development of its whole-genome shotgun sequencing protocol and algorithms.

Tough acts to follow, for sure. Today, Myers is again wading into a data flood, but instead of dealing with the puny 3.1 billion base pairs of human DNA, he is tackling the 4.2 trillion voxels that imaging a mouse’s brain will produce – in just one week of the project. Affordable and improved imaging technology is promising to light up the molecular landscape of living systems with far-reaching impact on basic research, drug development, and in the clinic. Myers spoke with Predictive Biomedicine editor John Russell about some of the opportunities and challenges imaging and imaging informatics present.

JR: Maybe you could start with why imaging informatics is becoming more important.

Myers: The imaging stuff is really just that. It’s about microscopy; it’s about what you can see in a microscope at various levels of resolution, either with a light microscope or an electron microscope (EM). A lot of the resurgence, frankly, is with respect to light as opposed to EM, although there are some interesting developments in terms of EM and increasing the throughput with which you can collect data.

We’ve had microscopy forever, right, but what’s changed technologically are a number of factors. One is that you can capture the information digitally. We need to appreciate that. CCD (charge-coupled device) detectors that are cheap and high resolution and sensitive are a recent development. We’ve only had really great ones for the last five or so years.

The other development that’s been very important is, how should one say it, the development of genetic fluorophores that can be expressed and produce fluorescence, and that we have the entire genomes of many organisms so that we can make any part of the genome glow and any part of protein that gets produced by a genome, we can make glow.

So now, we have the opportunity to observe in vivo, in situ, in the cell directly, the expression of genes. This is much different than the chips [microarrays] which just give you a readout for how much there is in some gamish of cells. Here you can go into individual cells and see where it is, and see how much of it there is, and what the distribution is. This is much higher dimensional data, and qualitatively much more interesting information. Of course it’s harder to get, but it is getting progressively easier to do this.

JR: Are many people actually doing that?

Myers: This is going on all over the place. We have the' recombineering' now so we have ways for producing all these constructs. My friend Tony Hyman did one of the first global surveys with light of the cell division in C. elegans. That was several years ago. Erin O’Shea, I believe is now at Harvard, did genome-wide screen of the expression of proteins in yeast. We’re now doing the same thing with the fly here at Janelia [Farm] in terms of the brain and at Berkeley they are doing the rest of the genome.  So these kinds of things are going on and everybody understands it.

So it’s being able to capture the information digitally, it’s being able to literally illuminate molecularly the cells. Basically label things based on molecular markers. And until recently, we may have had GFP [green fluorescent protein] but we couldn’t put the GFP anyplace we wanted to until we had the genomes, right? Really we’ve only had these genomes since 2001. Finally, I think another thing that can’t be overlooked is there are a lot of very interesting developments in terms of optical physics that we can do with light [such as] structured illumination and spinning disk confocal technology. These are all recent new uses of light.

JR:  What are some of the challenges in terms of handling and interpreting all this data?

Myers: That’s exactly the point. People have the ability to generate lots of data but they don’t have the software infrastructure for handling it. Basically people are rolling their own and in a lot of cases they’re asking for help. It’s just the way it was with genomics data in the ’80s, where it was, “holy cow, how do I do anything, how do I compare these sequences, what can I do with these things? Can you tell me if this protein is like that protein?” It was basically an area where the technology was coming fast and the software wasn’t there.

That’s kind of why I’ve entered the arena. There are new computational challenges and problems, some of which can be solved in part by using existing methods in the imaging literature but they are not solved adequately by those techniques; moreover nobody is in a position to deploy these things in the contexts that are arising.

So I kind of placed my bet. I’ve sat myself down right next to the biologist because my experience is that’s really the only way that it works. If you try to do it remotely it doesn’t work very well. You have to really immerse yourself in the pipelines that are producing the data.

JR: Would you expect to have tools and algorithms emerge from your work and to make those available through the open source way?

Myers: That’s the plan. I hope we come up with some home runs – like BLAST.

JR: How far along is the work?

Myers: It’s a little bit harder in this case. I’m having more trouble finding the killer app, but I’m convinced that a couple of killer apps will ultimately emerge. One thing that’s not 100 percent certain is [whether] this business is going to get as big as the genomics business. I think it’s going to get very big but I don’t know if it’s going to get as big or as popular because it is somewhat more technology-intensive and cross-disciplinary.

JR:  You mentioned in Toronto (ISMB 2008 keynote, July 2008) that you thought imaging data would become the driver in biomedical research and would overwhelm, at least in quantity, genomics data.

Myers: Well I think it will be a bigger driver of the development of new knowledge, new insights into how cells are working, how proteins are interacting and what they are doing in cells and systems. That’s my bet. It’s going to create opportunities that don’t otherwise exist and create knowledge that it would be very, very hard to get by looking at an array of numbers that tell you kind of generally how [gene] expression went up or down in a collection of thousands of cells.

I think a huge amount of knowledge is going to be generated. I don’t want to say there won’t be important things learned by biochemistry or other methods that are currently under use. We are learning a great deal. I just think that this is going to become a major source of information.

JR: I was thinking about some of the IT challenges and thinking back to Celera where you had all these HP machines humming away in the background trying to crunch through the data. What are the IT challenges here?

Myers:  We’re going to generate data sets that are larger than the Celera data set. You may remember that I gave three levels of the problem [speaking in Toronto]. One class of work is looking inside the cell, to see what proteins and what various elements are in the cell; the other level is looking at collections of cells to see how they are organizing themselves and understand what kind of cell types you have and how are those cell types interacting. Finally the third level is to actually be interpreting video or other imaging data that indicates the behavior of the resulting systems like a mouse moving or a fly flying.

For example one thing is this idea of whole brain imaging in the mouse. That data set, for one brain, is going to be 4.2 trillion voxels. OK, so the human genome was 3 billion. The total amount of data in one of these brain data sets – and we will collect it in less than one week if we hit our target – is 4.2 trillion (terabytes) of raw data that needs to be interpreted. That’s a big number.

How are we going to interpret that? So yeah, we’ve got our Boewulf cluster, a big one, downstairs. And the unusual thing about our cluster, unlike say a Google type cluster, is that our machines are very high memory. We have more expensive machines and the reason for that is that images are very large and you want to operate on large 3-dimensional arrays. It’s really kind of a convenience; we’re buying our way out of a hole rather than really struggling with it. So giving ourselves the ability to handle large object instances by being able to accommodate large memory is a good idea. A lot of our things can be done in linear sweeps, but there’s a lot of data. It’s three dimensions and treating the boundaries on a three-dimensional grid is very difficult.

Of course in 1998 that’s what we did at Celera too – we tended to buy our way out, and at that time we were buying a 64-gigabyte memory, which was actually one of the largest commercial memory you could buy. Now we routinely have quite a few processors with those big memories on them. The smallest memory on any of our machines is 8 gigs.

The other aspect from an IT perspective – and I don’t think this is news – but it’s clear that moving the data is a real bottleneck. So we’re talking about just to do a compute on this thing, you’ve got to get 4.2 terabytes out of the disk system to the various processors. So you have to move huge volumes and so distributed file systems are very important to use. We have one here and we use it.

JR: Will you make tools available through open source or commercially?

Myers: I’m probably going to do open source. I think that it’s been extremely tricky to go commercial anyway in the scientific enterprise. It’s really hard to find an edge where you have exclusivity and customers are willing to pay the requisite overhead.  I’m more interested in getting stuff out there.

JR What are your thoughts on informatics challenges facing next-generation sequencing technologies?

Myers: Most of what’s going on with the next-gen is [that] the instrument itself is producing data at such a high rate that it’s difficult to keep up. So one designs tiered systems and triages very simple algorithms that get the job done. If you have anything that’s really complex, then it’s hard to keep up with the data. It’s literally hard to keep up in real time with the data. On the instrument you’ve got to keep up, and you’ve got to do something in the instrument because you can’t just dump all that data out on a wire or overwhelm the consumers.

So I think that’s one thing. The other thing that’s kind of overwhelming because of the amount of data is to do the basic problems like assembly and analysis. I think those things are getting more and more challenging, and in a way I do think it represents a niche for a really good commercial entity to come in and really engineer and solve those problems. The problem with the academic enterprise is after a certain point there isn’t sufficient reward. So far, it still seems to be going the kind of academic open-source way. If a commercial entity comes in and offers users something that’s of sufficient value, then that’s a good way to go.  

JR: How do you expect imaging will be used in biopharmaceutical research or in healthcare?

Myers: Let me try to answer your question with that caveat about my expertise and focus – I’m not a physician, I am a scientist. I think there’s a huge opportunity in diagnostics, in being able to molecularly mark a tissue sample. For example Walter Schubert, four or five years ago, was staining cells for particular molecular targets and actually taking histological views of those at 63x. I forget the two conditions [but] they were presenting gross symptoms exactly the same; you couldn’t tell any difference. But as soon as you put down this marker for basically the presence of an antibody, you could see that the antibody in one case had penetrated into the skin layer and in the other it hadn’t; so you have a very clear marker and you get it by looking in a microscope at the two samples.

It’s also the case that when you look at cancerous cells, you’ll be able to look at a cancer and mark it with certain reagents or proteins and look for markers, and we won’t do it by doing an expression assay; it will be about the distribution of that protein and its presence in certain cells that you can’t get from an expression array. It’s that high-dimensional aspect of actually seeing the distribution and the pattern in an actual histological context that will tell us what the disease will be, and it’s all going to be with this kind of stuff.

At this point, we can literally watch the uptake of a chemical into a cell and literally screen thousands of cells. So another thing pharmas might end up doing or may already be doing is to use high-throughput microscopy where in 384 wells you have samples you’re applying various chemicals to and can read out digitally whether the cell is dividing or not.

JR: Can you discuss the progress of your group’s projects?

Myers: It’s still early days. Most of them are recent projects that we’ve been working on for maybe a year or a year and half and the one mature one for maybe three years. The oldest one is developing a single cell expression atlas of the worm, a kind of cellular level project. Another is we’ve been looking at the biophysics of mitosis with Tony Hyman, again, in the first division of C. elegans, although it could pretty much be anything. That’s an example of an intracellular project we working on. Here at Janelia the projects include developing a complete light level atlas of the fly’s brain with its complete developmental trajectory.

We’re also working on behavioral scans involving observing the whiskers of a mouse while it’s being recorded electro-physiologically with probes, and then the other things that we’re doing is obviously we’re trying to build this high-throughput microscope to capture high-dimensional,  entire volumes of brains to understand stochastically the fine-grained flow of neuronal information.

JR: What kind of a microscope is that?

Myers: You know I’m not going to say because it’s one of those kinds of things where if I did, it will be pretty obvious to people who know. I want to keep my edge for a while longer.

----------------------------------

This article first appeared in Bio-IT World’s Predictive Biomedicine newsletter. Click here for a free subscription.

 

Click here to login and leave a comment.  

0 Comments

Add Comment

Text Only 2000 character limit

Page 1 of 1



White Papers & Special Reports

sgi - whp 1
Turning Genomics Data into Practical Insight
Sponsored by SGI

With worldwide sequencing capacity approaching 13 quadrillion DNA bases annually turning genomics data into knowledge is a true computational challenge. Read this paper and learn how the SGI UV coherent shared memory platform can:  

  • Speed results time while cost competitively tackling the most difficult computational problems across all omics disciplines. 
  • Push performance by scaling to extraordinary levels, up to 256 sockets (2,560 cores, 4,096 threads) per single system (one OS image). 

Provide support for up to 16TB of coherent shared memory in a single system image enabling extreme efficiency across a wide range of compute demands. 



accerlys-logo_2012_wh
New Complimentary Market Survey…
Collaborations and Communications Within Drug Discovery Research
Sponsored by Accelrys
This survey was conducted by the Cambridge Healthtech Media Group in January, 2012. It was sponsored by Accelrys related to their HEOS initiative to gather valid information around externalizing collaborative research while improving communications in the cloud. With 310 qualified industry respondents the survey findings reveal useful usage and trends patterns.  An insightful follow-on discussion and webinar related to this survey, and the HEOS by Scynexis SaaS portal is also available on the Bio-IT World website for complementary viewing.
 


Job Openings

tessella logo 
Scientific Software Engineer
Boston MA
$70,000 to $95,000
 

Tessella delivers software engineering and consulting services to leading pharmaceutical and biotech companies. We are recruiting Software Engineersto work with skilled bioinformaticians and scientists to identify business needs and recommend and develop technical solutions. Applicants require BS, MS or PhD in bioinformatics, biology or chemistry and 2+ years of software development in either: Java, C#, C++, C or VB.NET. 

Apply at http://jobs.tessella.com   

 

oxford nanopore logo 


 Early Access Collaborations Managers
Oxford Nanopore Technologies is developing a novel technology, GridIONTM for the direct, electronic analysis of DNA/RNA and other analytes.  As the system approaches the market, we are building a team of technically knowledgeable, highly motivated candidates with excellent customer service and facilitation skills to join our company as Collaboration Managers.  This is a unique opportunity to work with world-leading genomics customers throughout the early adoption phase of a new generation of DNA sequencing technology.. This is a facilitative, enabling role with responsibility for managing technology development collaborations with key customers at leading genomics institutions.  It will include long term management of the collaboration plan and milestones and associated meetings and documentation. Click here to find out more and apply   

Oxford Nanopore's GridION technology, VP, Sales and Marketing Oxford Nanopore Technologies is a fast-moving technology company that is developing a novel electronic molecular analysis technology. The technology is adaptable for the analysis of DNA/RNA, proteins, chemicals and other molecules.  It is therefore suitable for use in a variety of markets including scientific research and clinical applications.  As the technology approaches the market, Oxford Nanopore is seeking a visionary VP of sales and marketing to join the senior team.  The candidate will embrace the opportunities afforded by entering the market with a truly disruptive technology that has the potential to expand the number of users and the variety of applications in each target market.  This is a rare opportunity to influence the commercial strategy at an early phase of its commercial lifetime, in a well funded company.  Oxford Nanopore welcomes applications from candidates with a track record of high-level strategic commercial  leadership, who wish to apply a fresh approach to existing markets.  Experience in Life Sciences/DNA sequencing is central to this role, however we will consider your application if you have experience of disruptive technologies in other related industries.  We are particularly interested in candidates with strong expertise in the use of digital technologies for sales and marketing of scientific/technical products.  Click to  Apply  


 

For reprints and/or copyright permission, please contact  Tim McLucas, (781) 972-1342, tmclucas@healthtech.com .