Oct. 10, 2007 | Neuroscience involves many sub-disciplines: cell biology, electrophysiology, molecular genetics, chemistry, endocrinology, pathology, pharmacology, imaging, computer science, and so on. In recent years, major advances have been made by coupling knowledge across multiple research areas — Alzheimer’s, depression, schizophrenia, etc. — all enabled by the exchange of data and analyses. The surge in information growth makes reliance on information systems to connect and access relevant items essential.
Information capture and analysis have always been key ingredients in neuroscience — even more so now with tools such as functional MRI for studying brain activity. The rise in integrative informatics is the reason NIH funded the BIRN (Biomedical Information Research Network) project. Conceived in 2001 to help connect large-scale biomedical informatics collaborations, BIRN supports projects that focus on neuroscience at multiple levels (e.g. Morphometry BIRN, Function BIRN, Mouse BIRN). The ability to share analytical tools and data is central to BIRN, and sets a course for the future of all biomedical research.
The term “cyberinfrastructure” was coined in 2003 by an NSF committee, recognizing the need for new mechanisms of information handling and exchange. Data repositories and computational tools need to use the Internet, but currently this is done through web pages that may include manually driven database interfaces. This requires substantial effort to access even the simplest data sets, and only a handful of machine-to-machine data transaction systems exist (for example, Web services).
Today, much of what is being created for systems integration requires the inclusion of semantics. Terminologies must be concisely defined and logically inter-related in areas such as anatomy, molecular biology, diseases, neurochemistry, and others. Without definitions, no amount of computational power can untangle different data descriptions created by different researchers. Consequently, most of the above projects involve ontology development and management. Such ontologies need to be 1) used by all research groups regardless of their locations, and 2) defined in such a way that they can be combined as necessary by inter-disciplinary projects. Both requirements are addressed by Semantic Web standards.
Working with university members of BIRN at Yale, Stanford, Tennessee, San Diego, and Drexel, the World Wide Web Consortium (W3C) Health Care and Life Sciences Interest Group has assimilated many forms of neuroscientific data and structured them using RDF and OWL. This aggregate of neuroscientific knowledge was part of a demo presented at WWW2007 (Banff, Canada) and ISMB 2007 (Vienna). Data sources currently include: BrainPharm, Pubmed, Entrez-Gene, Uniprot, MESH, BAMS, Reactome, Gene Ontology, Allen Brain Atlas, NeuroCommons Annotations, NeuronDB, AlzGene, SWAN, MammalianPhenotype, Pubchem, and Homologene.
Text Mining Research
NeuroCommons, a project within Science Commons at MIT, is using text mining to extract neuro-molecular relations from text, representing them as RDF. BrainPharm is a data resource from Yale that supports research on drugs for neurological disorders. The Allen Brain Atlas has assembled multiple gene-probed slices of mouse brain. (See “Allen Brain Atlas Updated,” Bio•IT World, September 2007) And SWAN is an NIH-funded project that allows scientists to directly annotate knowledge onto findings using RDF.
The demo user interface consists of a SPARQL query page that permits a wide variety of questions regarding genes, neurological diseases, neuroanatomy, and publications. Examples include:
• Find all publications with neural dendrites in their description.
• Show all genes expressed in brain region CA1 involved in signal transduction.
• Find all papers on Parkinson’s Disease that involve gene products localized in the nucleus
Results can be formatted as tables or even as RDF graphs. As RDF, additional tools can process the data for enhanced scientific views. Moreover, tools such as Google Maps can also be applied to the output from a query.
The future of cyberinfrastructure for biomedical research is becoming a reality: a connected research community more effectively utilizing data and computational resources from different areas. Providing a new infrastructure that will connect different forms of knowledge is an essential element of biomedical research (see NIH’s KEBR: http://esi-bethesda.com/ncrrworkshops/kebr/index.aspx), and in the future, these resources will be used by researchers in R&D and health care.
Eric K. Neumann is the Director, Clinical Semantics Group; MIT Fellow, Science Commons; and co-chair, W3C Health Care and Life Sciences Interest Group. E-mail: firstname.lastname@example.org
Subscribe to Bio-IT World magazine.