The Human Metabolome Project Contretemps


By Kevin Davies
Earlier this year, David Wishart and colleagues at the University of Alberta published the Human Metabolome Database (HMDB), funded by a multi-million dollar Canadian grant for the "Human Metabolome Project." The database contains a rich array of biological and chemical information on more than 2,000 human metabolites, which could prove invaluable for studying cell biology, disease biomarkers, and much more.

But perhaps because of the work's branding as a logical big biology follow-up to the Human Genome Project, it has provoked some unusually stern criticism in some quarters. Bio-IT World Editor-in-Chief Kevin Davies caught up with lead investigator David Wishart to see what all the fuss is about.

David, you published the Human Metabolome Database earlier this year in the annual Nucleic Acids Research database issue. How are you following up that work?
WISHART: We're in a phase called "compound validation" -- checking to see if the compounds that have been reported by different groups over the past half century are really there, and if the concentrations are also consistent with what we can measure using more modern techniques.

Since the publication and follow-on stories we've had many people approach us who are interested in collaborating or using the databases. We're preparing three research papers on the metabolomes of cerebrospinal fluid, serum, and urine. We have also released a new database called FooDB (a food components and additives), which complements the Human Metabolome Database and the DrugBank database. The three databases are averaging 200,000 hits per month. We've even had to upgrade our servers to deal with the heavy load! So while there have been a few negative comments from some quarters about our work, I think that people are voting with their feet (or keyboards) and finding these resources to be very useful. We regularly receive a half-dozen complimentary emails a week from people say "what a great resources" or "thank you for doing this."

You alluded to some criticism leveled about the database, perhaps because it's been portrayed as "The Human Metabolome Project." How do you respond?
WISHART: One of the concerns voiced by some members of the metabolomics community is that our collection of compounds is incomplete or isn't as large as they expected. There are different definitions of the "metabolome." Our definition is not quite as inclusive as others. In particular, we "defined" the metabolome to be the compounds that are primarily endogenous or so common (e.g. caffeine, nicotine, common drugs) that they could be regarded as "almost" endogenous. We also tried to limit our compounds to those that are detectable with reasonable instrumentation or which appear in sufficient concentrations (> 1 uM) to be detectable. We call this the "practical" metabolome.

Other metabolomics specialists believe that any small molecule (observed, predicted) ever made or existing in nature should be included. Others believe all drugs and drug metabolites should be included. Still others believe that all toxins and/or household chemicals should be included. Certainly a detailed listing of all chemicals that can/could or should be found in the body would be useful, but I don't think that this sort of list would really represent the "metabolome" or "the ingredients of life".

We would also like a complete chemical inventory, but we are trying to separate these lists so that toxins are not grouped with drugs, food products are not grouped with common household chemicals, and human metabolites are not grouped with food additives. That's why we've generated three separate databases (HMDB, DrugBank and now FooDB).

DrugBank contains 1200 drugs, HMDB has 2500 metabolites and FooDB has 2700 food additives/components. We expect ToxDB and DrugMet will have about 1000 compounds each when they're finally finished. So our total inventory of compounds will be ~8500 compounds when we're finished. However, the total number of "true metabolites" will still probably only be about 2500, or maybe 2600, compounds when everything is all finalized.

Jeremy Nicholson, from Imperial College in London, has notably downplayed the significance of your project . . .
WISHART: Jeremy Nicholson has called these databases "just lists." I think that's unfair. These databases, if printed off, would be 100,000 pages long. They contain an enormous amount of biological, chemical, clinical, biochemical data. They are really metabolite/ drug/ food encyclopedias.

They have pictures, diagrams, descriptions, facts, disease information, medical data, etc. etc. The databases also contain detailed information on metabolite concentrations for different tissues, biofluids, and disease states. Furthermore, they also support many kinds of queries, comparisons, and analyses. It would be like saying the Encyclopedia Brittanica is just a list or that GenBank is just a list.

But why has this work provoked such a harsh reaction, do you think?
WISHART: One of the reasons why these databases have led to some negative comments is that they facilitate a form of metabolomics called "Bottom-up" or "Targeted" metabolic profiling. There are two camps in metabolomics. One camp espouses more top-down or chemometric analysis. In this approach the compounds are not normally identified or quantified. Rather, their spectral profiles (like Rorschach inkblots) are analyzed using statistical or chemometric methods to distinguish healthy from diseased individuals.  Jeremy Nicholson has been the primary advocate for this approach and he has done a really superb job at showing the power/utility of these methods.

The other approach, which is called bottom-up or targeted metabolomics (the approach that we have advocated), aims to identify and quantify metabolites by matching spectral profiles to known libraries of compounds or compound spectra. This is a little more akin to traditional proteomics or microarray analyses. Obviously to do proteomics/ transcriptomics you need to know the names/sequences of all the genes in your microarray or your proteome of interest. That's why, for targeted metabolomics methods, it's so critical to have the metabolome fully defined. It's also why we've spent so much time and money collecting standard NMR and MS spectra for pure compounds and adding them to our databases.

The kind of information that is in our databases will certainly make targeted metabolomics much more appealing, but likely at the expense of drawing people away from the more traditional chemometric approaches. That may be one reason why Jeremy has been so dismissive of our efforts. That said, I don't hold any ill will towards Jeremy or his group. There is tremendous admiration for what Jeremy has done for metabolomics and metabonomics, and he's certainly considered to be the founding father of the field. Nevertheless, he's not the only one doing this any more, and I think there will be many other efforts and some pretty interesting results that will come from many new metabolomics labs in the coming one or two years.

Email Kevin Davies.

Subscribe to Bio-IT World  magazine.

Click here to login and leave a comment.  

0 Comments

Add Comment

Text Only 2000 character limit

Page 1 of 1

White Papers & Special Reports

definiens briefingon-76Next-Generation Technologies Revolutionizing Oncology and Diagnostics
underwritten by Definiens

This “Briefing On” collection of Bio-IT World features, commentaries and analysis, presents some of the latest thinking on high-throughput technologies that are being applied to the fields of research and drug discovery, with particular emphasis on oncology, diagnostics and imaging technologies. Download now at no charge compliments of the underwriting sponsor, Definiens. Download This Free Paper



metaminer image(1)

MetaMiner™ Cystic Fibrosis Report,  Sponsored by GeneGo
This paper discusses the MetaMiner™ (CF) data analysis platform for a broad range of CF researchers designed to: 1. Easily assemble important biological and chemical experimental data available today in cystic fibrosis research. 2. Visualize key mechanisms leading to the disease through pathway maps and network models 3. Provide the CF community a “one stop shop” tool for uploading and analyzing experimental data in a disease-centered interface.  Download now 



gq nxt gen seq

This Bio•IT World Briefing On “Next-Generation Sequencing,” underwritten by GenomeQuest, Inc.,
presents a selection of feature stories, interviews,commentaries, conference reports, and editorials on the emergence, opportunities, and challenges posed by high-throughput sequencing. Covered in this collection: the launch of new platforms from Applied Biosystems and Helicos; new applications of nextgen sequencing; the rise of personal genomics; and informatics solutions to vexing problem of managing the vast volumes of next-gen data.  Download now 



Life Science Webcasts & Podcasts

GenoLogicsgenologics 2 translational
Enabling Translational Research Informatics

Learn about the challenges facing life sciences research labs to manage their translational research data:

  • The trends for organizations to adopt informatics solutions for translational research.
  • The unique requirements with managing complex data and workflow.
  • What labs should consider when reviewing informatics solutions for translational research.
  • Which life sciences research organizations are successfully adopting an informatics solution.

Download Now



More Podcasts

Job Openings

Isilon Systems ~ Senior Marketing Communications Manager
Isilon Systems is the worldwide leader in clustered storage systems and software for digital content and unstructured data. We seek an experienced marketing communications professional/writer expert in creating and delivering effective and persuasive business communications. The ideal candidate can think at the strategic and conceptual level and act, simultaneously, as a highly-effective and productive individual contributor. The position is based in Seattle, WA. For additional information click here:
 

Lilly Singapore Center for Drug Discovery (LSCDD) - Associate Director of Informatics
Lead and mentor a strong team for the Bioinformatics group at the Integrative Computational Sciences (ICS) department at LSCDD towards the development of novel algorithms, data analysis methods and software tools for drug discovery. Work closely with the Software Engineering group at ICS, and collaborate with the Discovery IT organization in Europe and USA. For additional information, or to apply visit: LSCDD 

For reprints and/or copyright permission, please contact RMS, 1808 Colonial Village Lane, Lancaster, PA;

(717) 399-1900 ext. 125 or via email to bio-itworld@theygsgroup.com.