By Malorye A. Branca
Sept. 9, 2002 | The human genome is just months away from being fully sequenced, but researchers still can’t capitalize on that information efficiently.
“We have a very hard job ahead of us, and we aren’t using the information we have because we are not interoperable,” said Eric Lander in a keynote address at the I3C technical committee meeting in late July in Cambridge, Mass.
“I can think of no more important work that is being done in biology today, than what I3C is trying to do,” said Lander, director of the Whitehead Institute Center for Genomic Research.
I3C (Interoperable Informatics Infrastructure Consortium) was founded last year by groups including BIO (Biotechnology Industry Organization), IBM Corp., Millennium Pharmaceuticals Inc., Sun Microsystems Inc., and the Whitehead Institute. The consortium promotes data and software interoperability across the drug discovery and development process, from basic academic research through clinical studies.
Redundancy and inefficiency permeate life science informatics, curtailing progress, Lander explained. “You could not do it [sequence the genome] writing Perl scripts over and over. We would still be back there sequencing,” he said. Once data can be integrated, “You can take huge leaps,” he said. He described how a researcher in his lab toiled for several months to integrate some data, but was able to analyze that information and discover a disease-related gene in just half a day.
Getting bioinformaticians and biologists to agree on standards will be like the proverbial “cat herding” exercise. Many of them are fanatical about their pet tools. “I love Perl,” one member of the audience said. “I would love to see a Perl solution.”
Academic researchers also have a different set of priorities from industry professionals, including the imperative to “publish or perish.” Numerous standards efforts in this field have failed or just dragged on.
To succeed, I3C will need to “capture their imagination,” Lander advised. “Make it exciting. Sequencing the genome is a boring activity, but we got people interested in it,” he said. “Set audacious goals.”
I3C is trying to pick out the best data exchange protocols already available. The group evaluates proposed standards, incorporating outside reviewers input. They will champion those standards that fit the bill. I3C is leaving technical standards up to other groups, such as MGED (Microarray Gene Expression Databases Group).
Implementation of any standards will come about by a networking effect, says Tim Clark, chairman of I3C and vice president of informatics at Millennium Pharmaceuticals. “If people like it, and lots of people are using it, it should catch on.”
According to Clark, Oracle Corp., IBM, and Sun have agreed to provide components for a reference server, to be housed at the Whitehead Institute, which will provide the group with a central location to store and test solutions.
The July meeting featured presentations on several proposed standards, including two for genomic XML. Those were McLean Virginia-based LabBook Inc.’s BSML (Bioinformatics Sequence Markup Language) and AGAVE (Architecture for Genomic Annotation, Visualization and Exchange), which was developed and used by DoubleTwist Inc. and some of its customers.
DoubleTwist went out of business several months ago, but Brian King, who co-designed AGAVE, is serving as Sun’s representative on the I3C technical architecture working group. “By mid-September we’d like to decide whether to adopt one or the other of these as an‘I3C-branded’ standard,” Clark said.
The consortium has approximately a dozen dues-paying members, and many more who are in the process of signing up. Biogen Inc. signed up just before the meeting. “To me, this seems like an approach that is likely to succeed,” says Rainer Fuchs, vice president of research informatics at Cambridge, Mass.-based Biogen. Fuchs will also serve on the I3C board. “It’s well in line with our own efforts, which include the use of Web services and XML to provide a more scalable system.”
Winning over the broader academic community is also important because right now, I3C is vendor-dominated. The organization is getting supportive feedback from heavyweights such as Cold Spring Harbor Laboratory’s Lincoln Stein, and open source champion Ewan Birney of the European Bioinformatics Institute.
Stein is even considering incorporating I3C’s proposed Life Science Identifier (LSID) standard into his popular DAS (distributed annotation system). “I'm not adopting the LSID proposal as-is, but it is a good start,” Stein says.