Unleashed Informatics, a spin-off of the
Blueprint Initiative at Mount Sinai Hospital in partnership with
Sun Microsystems of Canada,
announced DogBox, a self-updating bioinformatics warehouse.
The DogBox is a hardware/software appliance that includes
the Blueprint SeqHound and other public bioinformatics databases.
SeqHound is a database of biological sequences and
structures. Specifically, the database combines 3D structure, annotation,
sequence, and taxonomy information. SeqHound is updated daily by gathering
information from a number of sources, including the National Center for Biotechnology Information and the Gene
Ontology Consortium. Additional databases included with DogBox are PDB,
GenBank, SwissProt, and several other public databases.
The pre-configured DogBox system includes a SunFire V20z
dual-Opteron server with 4 GB of memory and a 3TB Sun StorEdge FC3511 storage
system. The SeqHound database is automatically updated on a regular basis
(typically, each night after new entries have been added by Blueprint
researchers).
Why Pay?
One obvious question about the DogBox is, why pay for a
database that is publicly available and offered for free?
There are actually three reason why a company would consider
a product like DogBox: performance, integration, and security.
Using a dedicated internal device to address performance,
integration, and security issues is an approach that is increasingly being
adopted in many areas of research. The best example of this is the Google
Search Appliance, which is a dedicated search device offered to companies.
On the performance front, a dedicated, internal device like
the DogBox eliminates Internet-related delays that can occur when a query must
travel over the public network to reach a database server. Additionally, a
public database’s performance is not guaranteed and can be significantly
impacted if it is handling many simultaneous requests. These delays are
eliminated when a dedicated device sits on a company network.
With regard to integration, in many cases, life science
companies use the data in a public resource as part of a larger application.
For instance, an informatics application might have a workflow where in one
step results of an experiment are used in a query to a database and the
returned answer to the query is then used in the next step of a computational
workflow.
Incorporating such calls in a workflow to public databases
is common, but the DogBox allows for a tighter marrying of an application to
the SeqHound database. For example, the DogBox includes application programming
interfaces (APIs) that let a company write applications that directly query the
system.
The third reason for using an internal appliance is
security. A query sent to a public database could, in theory, be intercepted by
a hacker. A hacker capturing these outbound queries could get information about
the research efforts going on within the company, such as which molecules and
potential drug targets are being investigated.
While this may seem far-fetched, some life science companies
are taking this threat to their intellectual property and research efforts very
seriously. One industry analyst noted that he recently visited a company that
maintains roughly 120 public databases internally to protect information that
might be derived from queries sent to these databases.
The DogBox bioinformatics warehouse appliance was announced
in May and is available now.