IBM Provides NIH Free Chemical Compound Database

December 7, 2011

By Bio-IT World Staff 

December 8, 2011 | IBM, Bristol-Myers Squibb, DuPont, and Pfizer are providing the National Institutes of Health with a database of more than 2.4 million chemical compounds extracted from about 4.7 million patents and 11 million biomedical journal abstracts from 1976 to 2000. The chemical data should help researchers more easily visualize important relationships among chemical compounds to aid in drug discovery and support advanced cancer research. 

IBM announced the database contribution today at a forum on U.S. economic competitiveness in the 21st century, exploring how private sector innovations and investment can be more easily shared in the public domain. 

The data was extracted from patents and journal abstracts using the IBM business analytics and optimization strategic IP insight platform (SIIP, www.ibm.com/gbs/bao/siip), a combination of data and analytics delivered via the IBM SmartCloud, and developed by IBM Research in collaboration with several major life sciences organizations. The platform uses automated image analysis and enhanced optical recognition of chemical images and symbols to extract information from patents and literature upon publication.  

“Rich data and content is often buried in patents, drawings, figures and scholarly articles,” said Steve Heller, project director for the InChI Trust, a non-profit which supports the InChI international standard to represent chemical structures, in a press release. “This contribution by IBM and its collaborators will make it easier for researchers to use this data, link to other data using the InChI structure representation and derive new insight,”  

The data will be contributed to the National Center for Biotechnology Information (NCBI), part of the National Library of Medicine (NLM), and the Computer-Aided Drug Design (CADD) Group of the National Cancer Institute (NCI) at the National Institutes of Health. It will be incorporated in the NCBI’s PubChem, a public resource for the scientific community that serves as an aggregator for scientific results as well as in NCI CADD Group services such as the Chemical Structure Lookup Service and the Chemical Identifier Resolver. 

The National Institutes of Health will make the content available on PubChem at http://pubchem.ncbi.nlm.nih.gov