Earlier this year, David Wishart and colleagues at the University of Alberta published the Human Metabolome Database (HMDB), funded by a multi-million dollar Canadian grant for the "Human Metabolome Project." The database contains a rich array of biological and chemical information on more than 2,000 human metabolites, which could prove invaluable for studying cell biology, disease biomarkers, and much more.
But perhaps because of the work's branding as a logical big biology follow-up to the Human Genome Project, it has provoked some unusually stern criticism in some quarters. Bio-IT World Editor-in-Chief Kevin Davies caught up with lead investigator David Wishart to see what all the fuss is about.
David, you published the Human Metabolome Database earlier this year in the annual Nucleic Acids Research database issue. How are you following up that work?
WISHART: We're in a phase called "compound validation" -- checking to see if the compounds that have been reported by different groups over the past half century are really there, and if the concentrations are also consistent with what we can measure using more modern techniques.
Since the publication and follow-on stories we've had many people approach us who are interested in collaborating or using the databases. We're preparing three research papers on the metabolomes of cerebrospinal fluid, serum, and urine. We have also released a new database called FooDB (a food components and additives), which complements the Human Metabolome Database and the DrugBank database. The three databases are averaging 200,000 hits per month. We've even had to upgrade our servers to deal with the heavy load! So while there have been a few negative comments from some quarters about our work, I think that people are voting with their feet (or keyboards) and finding these resources to be very useful. We regularly receive a half-dozen complimentary emails a week from people say "what a great resources" or "thank you for doing this."
You alluded to some criticism leveled about the database, perhaps because it's been portrayed as "The Human Metabolome Project." How do you respond?
WISHART: One of the concerns voiced by some members of the metabolomics community is that our collection of compounds is incomplete or isn't as large as they expected. There are different definitions of the "metabolome." Our definition is not quite as inclusive as others. In particular, we "defined" the metabolome to be the compounds that are primarily endogenous or so common (e.g. caffeine, nicotine, common drugs) that they could be regarded as "almost" endogenous. We also tried to limit our compounds to those that are detectable with reasonable instrumentation or which appear in sufficient concentrations (> 1 uM) to be detectable. We call this the "practical" metabolome.
Other metabolomics specialists believe that any small molecule (observed, predicted) ever made or existing in nature should be included. Others believe all drugs and drug metabolites should be included. Still others believe that all toxins and/or household chemicals should be included. Certainly a detailed listing of all chemicals that can/could or should be found in the body would be useful, but I don't think that this sort of list would really represent the "metabolome" or "the ingredients of life".
We would also like a complete chemical inventory, but we are trying to separate these lists so that toxins are not grouped with drugs, food products are not grouped with common household chemicals, and human metabolites are not grouped with food additives. That's why we've generated three separate databases (HMDB, DrugBank and now FooDB).
DrugBank contains 1200 drugs, HMDB has 2500 metabolites and FooDB has 2700 food additives/components. We expect ToxDB and DrugMet will have about 1000 compounds each when they're finally finished. So our total inventory of compounds will be ~8500 compounds when we're finished. However, the total number of "true metabolites" will still probably only be about 2500, or maybe 2600, compounds when everything is all finalized.
Jeremy Nicholson, from Imperial College in London, has notably downplayed the significance of your project . . .
WISHART: Jeremy Nicholson has called these databases "just lists." I think that's unfair. These databases, if printed off, would be 100,000 pages long. They contain an enormous amount of biological, chemical, clinical, biochemical data. They are really metabolite/ drug/ food encyclopedias.
They have pictures, diagrams, descriptions, facts, disease information, medical data, etc. etc. The databases also contain detailed information on metabolite concentrations for different tissues, biofluids, and disease states. Furthermore, they also support many kinds of queries, comparisons, and analyses. It would be like saying the Encyclopedia Brittanica is just a list or that GenBank is just a list.
But why has this work provoked such a harsh reaction, do you think?
WISHART: One of the reasons why these databases have led to some negative comments is that they facilitate a form of metabolomics called "Bottom-up" or "Targeted" metabolic profiling. There are two camps in metabolomics. One camp espouses more top-down or chemometric analysis. In this approach the compounds are not normally identified or quantified. Rather, their spectral profiles (like Rorschach inkblots) are analyzed using statistical or chemometric methods to distinguish healthy from diseased individuals. Jeremy Nicholson has been the primary advocate for this approach and he has done a really superb job at showing the power/utility of these methods.
The other approach, which is called bottom-up or targeted metabolomics (the approach that we have advocated), aims to identify and quantify metabolites by matching spectral profiles to known libraries of compounds or compound spectra. This is a little more akin to traditional proteomics or microarray analyses. Obviously to do proteomics/ transcriptomics you need to know the names/sequences of all the genes in your microarray or your proteome of interest. That's why, for targeted metabolomics methods, it's so critical to have the metabolome fully defined. It's also why we've spent so much time and money collecting standard NMR and MS spectra for pure compounds and adding them to our databases.
The kind of information that is in our databases will certainly make targeted metabolomics much more appealing, but likely at the expense of drawing people away from the more traditional chemometric approaches. That may be one reason why Jeremy has been so dismissive of our efforts. That said, I don't hold any ill will towards Jeremy or his group. There is tremendous admiration for what Jeremy has done for metabolomics and metabonomics, and he's certainly considered to be the founding father of the field. Nevertheless, he's not the only one doing this any more, and I think there will be many other efforts and some pretty interesting results that will come from many new metabolomics labs in the coming one or two years.
Email Kevin Davies.
Subscribe to Bio-IT World magazine.