YouTube Facebook LinkedIn Google+ Twitter Xinginstagram rss  

IBM Provides NIH Free Chemical Compound Database

By Bio-IT World Staff 

December 8, 2011 | IBM, Bristol-Myers Squibb, DuPont, and Pfizer are providing the National Institutes of Health with a database of more than 2.4 million chemical compounds extracted from about 4.7 million patents and 11 million biomedical journal abstracts from 1976 to 2000. The chemical data should help researchers more easily visualize important relationships among chemical compounds to aid in drug discovery and support advanced cancer research. 

IBM announced the database contribution today at a forum on U.S. economic competitiveness in the 21st century, exploring how private sector innovations and investment can be more easily shared in the public domain. 

The data was extracted from patents and journal abstracts using the IBM business analytics and optimization strategic IP insight platform (SIIP,, a combination of data and analytics delivered via the IBM SmartCloud, and developed by IBM Research in collaboration with several major life sciences organizations. The platform uses automated image analysis and enhanced optical recognition of chemical images and symbols to extract information from patents and literature upon publication.  

“Rich data and content is often buried in patents, drawings, figures and scholarly articles,” said Steve Heller, project director for the InChI Trust, a non-profit which supports the InChI international standard to represent chemical structures, in a press release. “This contribution by IBM and its collaborators will make it easier for researchers to use this data, link to other data using the InChI structure representation and derive new insight,”  

The data will be contributed to the National Center for Biotechnology Information (NCBI), part of the National Library of Medicine (NLM), and the Computer-Aided Drug Design (CADD) Group of the National Cancer Institute (NCI) at the National Institutes of Health. It will be incorporated in the NCBI’s PubChem, a public resource for the scientific community that serves as an aggregator for scientific results as well as in NCI CADD Group services such as the Chemical Structure Lookup Service and the Chemical Identifier Resolver. 

The National Institutes of Health will make the content available on PubChem at 

Click here to login and leave a comment.  


  • Avatar

    Sounds Great! How would I go about downloading the entire chemical compound database in 2D for building and use with 3D virtual screening tools? I used the link in the article and clicked on Structure download, but it is limited to 500,000. It suggested I use their FTP found at:

Add Comment

Text Only 2000 character limit

Page 1 of 1

For reprints and/or copyright permission, please contact Angela Parsons, 781.972.5467.