May 9, 2003 | Some IT innovations need to be invented twice. Such is the case with a community-based, ambitious approach to image storage and retrieval called BioImage, funded by the European Commission.

In the mid-1990s, Ernst Stelzer, a cell biologist at the European Molecular Biology Laboratory in Heidelberg, Germany, and colleagues developed a database for microscopy images as they were working on a new confocal microscope for the optical development company Carl Zeiss. The microscope and the accompanying database were built; but in 1999 the science policy winds in the European community shifted, and funding for the database dried up.

The project was revived last year by David Shotton of Oxford University, where the database now resides. "We want to create a unified software environment for the storage, retrieval, analysis, and customization of biological images, in order to facilitate the conversion of raw image data into knowledge," Shotton says. This Web-accessible database of multidimensional digital images relevant to biology ranges from confocal microscopy images to time-lapse videos of development, and wildlife photos and films. Launched late last year through a partnership with science publishing portal Ingenta, BioImage is part of a European Commission research initiative called ORIEL, for Online Research Information Environment for the Life Sciences. This is the research arm of a program called E-BioSci, aimed at developing tools and procedures to allow integration and retrieval of many kinds of biological information, such as linking genomic and other data with biomedical literature, for example.

The idea is that a query will yield thumbnail images of visuals in published studies in English-language journals, as well as Spanish, German, and French. The images will be equipped with metadata — information about the experimenter, sample, and methodological details — such as the microscope's aperture and exposure times, for example. This metadata database links to a distributed image archive.

The digital file of the image, which may be protected by copyright, need not reside on the BioImage server — it could remain copyrighted by the journal — but it can be located. The intent is not to create a central repository for images analogous to GenBank for DNA/protein sequences. "The idea that you can data-warehouse biology, make everyone use the same database, is a forlorn cause," explains Les Grivell, E-BioSci's coordinator at the European Molecular Biological Organization. "Biologists are inventive, creative people, and if they want to do something their way, they will." Taking this culture into account, BioImage will run as a public agency, leaving images parked where they are.

Being able to search for images requires getting databases of text and images to talk to one another. This is where the informatics part of the effort comes in, using open-source software components. But making image data searchable requires reworking the language used to store and retrieve the information. "We want to open up the literature so you can search more efficiently," Grivell says.

The plan is to improve interdatabase operability; develop methods of computer-based text analysis; and develop, at the semantic level, knowledge representation systems to clearly organize biological concepts. That language-oriented job means creating ontologies, or hierarchies of meaning. "Ontologies are needed that define these biological concepts, the relations between them, and the logical rules for reasoning about them," Grivell explains. E-BioSci is working on this, as is the Gene Ontology Consortium.

Many data are not easily computer searchable. For example, PDF files are fine for human browsing but not for computerized retrieval. "Insulin" can refer to a gene, a protein, a hormone, or a therapeutic agent, depending on the context. Whimsical gene names (popular in the fruit fly community) such as "disco," "vamp," or "ether-a-go-go" are not easily searchable, either.

E-BioSci is now recruiting images to populate the database. Potential contributors can contact the curators via their Web site,

Putting it on the map: The E-BioSci network will offer multiple entry points to a distributed set of digital resources containing archived documents in different formats. Mapping of documents to specific locations is provided by a set of lookup tables (Doc2Loc) linked to a bibliographic database (initially Medline).

—Vivien Marx

