Oct. 10, 2007 | Three years into its bioinformatics practice within its life sciences division, Northrop Grumman is working on two data warehousing projects valued at over $47 million for the National Institute of Allergy and Infectious Diseases (NIAID).
“There are similarities between the two engagements,” says Kevin Biersack, bioinformatics program manager for Northrop Grumman. Both data warehousing projects offer one-stop shopping to users and both make use of public data. But the projects’ user communities are different.
For NIAID’s Bioinformatics Resource Center (BRC) for Biodefense and Emerging/Re-emerging Infectious Diseases project, Northrop Grumman developed the BioHealthBase (BHB) system, an integrated source of complex, high-quality genomic, proteomic, and supporting scientific data. Information stored here focuses on microorganisms and pathogens. For NIAID’s Bioinformatics Integration Support Contract (BISC) project, Northrop Grumman developed the Immunology Database and Analysis Portal (ImmPort) system. ImmPort houses data collected by NIAID’s Division of Allergy, Immunology, and Transplantation.
BioHealthBase is open to both researchers and the public. Working with a science partner at the University of Texas Southwestern Medical Center and two subcontractors, Northrop Grumman has developed BHB to include organisms with public health and biodefense implications including tuberculosis and influenza. Biersack says that the warehouse is public resource useful for scientific research in support of vaccine development and drug discovery.
A major goal of the project is to support researchers developing rapid, inexpensive, and broad-based diagnostic approaches using genomics and proteomics. From the BHB website (www.biohealthbase.org), searchers can run queries, analyze their findings, and display them visually without even entering an email address.
Open Source Architecture
BHB data are culled from several public sources including National Center for Biotechnology Information databases, GenBank, UniProtKB, and internal sources. “We have firewalls, of course,” Biersack says, to protect the data sources. Los Alamos National Laboratory, for instance, is currently collaborating with the BHB team to integrate their data and move their public influenza site to BHB.
Northrop Grumman curates the data as well. “We add richness,” Biersack says, “by annotating entries, eliminating redundancy, and filling in missing information.” Data are added and updated to the warehouse via scheduled monthly data loads.
The ImmPort system on the other hand, Biersack says, is different because access is limited to researchers funded by NIAID. “In the future, the public data will be moved ‘out front,’” Biersack says, but for now, ImmPort is a semi-public warehouse.
ImmPort serves as an archive for research results for allergy, immunology and transplantation projects supported by NIAID. Researchers have access to private storage, as well as the ability to compare their data, if they wish, with other public research data based on the NIH data-sharing policy. “It’s results-oriented storage,” Biersack says, and ImmPort currently boasts terabytes of total storage space.
The data warehouses are web-enabled and browser based, with quarterly software updates, and the use of “mostly open source,” software Biersack says. ImmPort uses Oracle, Linux, Java 2 Enterprise Edition, and Hibernate. Most of the visualization and analytical tools are also open source and have been leveraged from previous NIH-funded grants.
Northrop Grumman’s contracts for the BHB and ImmPort projects expire in 2009 and 2010, respectively, and Biersack says he “anticipates competition” for these renewals. But for now, he’s focused on providing new software functions to support the needs of the user communities, updating the data, adding storage capability, and “enhancing the scientific discovery process through data integration.”
Subscribe to Bio-IT World magazine.