Can Your Data Protection Strategy Survive A Pandemic?

Contributed Commentary By Adam Marko

July 8, 2020 | Across the world, scientific laboratories have redirected their research efforts toward the global COVID-19 pandemic. Countless staff and resources are now devoted to finding a solution to this universal threat to our lives and health. This redirect has shifted both human and infrastructure resources toward developing an understanding of and treatment for SARS-CoV-2. Central to achieving this goal is generating and gathering data. Like any big shift in the way we generate, manage, or use data, this will have implications for the datacenter.

Worldwide case counts and economic upheaval show how critical it is to minimize time to results as scientists search for solutions. A data loss event can cost time, which costs lives, so protecting the research data that these efforts generate is key. Infrastructure failures and user error can easily cause a data loss, considering far too many organizations simply do not back up their research data. This is not unexpected, since both the scale of life science data and the data generation rate cause traditional backup solutions to fail. In this moment, it’s critical to deploy scalable offsite backup. By selecting a modern option that can back up at the pace of research, IT can mitigate the risk of data loss. With data protection in place, users will be confident that they will not lose precious research hours because of a user or technical issue.

In addition to new scale, speed, and reliability requirements, the pandemic has made many data management tasks remote. Though some workers have no choice but to be physically onsite, many organizations are currently striving to reduce office staffing, especially in the IT space. Teleworking is being encouraged as much as possible, as a result there is reduced or eliminated ability to have staff in the datacenter. With the ability to install, monitor, and administrate solutions remotely, employers can add a greater degree of safety for their employees.

Data Loss and Disaster Recovery

A data loss is never an acceptable event, and during a crisis, the effects can be amplified. Organizations must adopt a rapid, scalable, remotely managed solution to protect their data during challenging times like the ones we are facing. Researchers should be free to “just do science,” without worry that their critical data is unsafe.

Fortunately, the vast majority of companies do take disaster recovery initiatives seriously. In fact, 95% of organizations have a disaster recovery plan in place. Though this is a good start, there are many issues regarding the robustness of their plans. According to a recent survey:

67% do not have hybrid cloud backup
59% do not have failover systems in place
58% do not have redundant internet connectivity
53% have no backup power supply for their data center
23% never test their disaster recovery plan

These numbers do not inspire confidence in the ability of an organization to weather a major business interruption event. Without the things listed above, it is unlikely that these organizations will effectively survive a real disaster, such as a pandemic. In addition to a lack of a comprehensive plan, nearly 25% of these organizations never test their plan. This is probably due to the inherent complexity of their systems and heterogeneous infrastructure. Clearly, disaster recovery initiatives are taken seriously, but they’re difficult to do well.

Implementing and testing a plan becomes more difficult as the volume of life sciences datasets continues to grow. Traditional methods, such as those based on tape, were not designed to handle today’s scale of scientific data. A regular process of backing up or archiving tape is a great way to pay a few dollars per terabyte, but with a huge time overhead to recover data. Without a rapid search and restore anywhere solution, the reality is that you’ve effectively spent a few dollars per terabyte to make your data disappear. Tape simply does not scale to the data volumes and level of responsiveness needed in life sciences. Testing a tape backup to anywhere strategy is time consuming and difficult, if at all possible. In addition, in contrast to hybrid or cloud-only solutions, tapes require considerable physical infrastructure that demands an on site presence.

Ideally, to mitigate a number of these issues, a more robust solution should have:

Remote install functionality, without having to visit a datacenter to physically install hardware. This is increasingly important as more staff are expected to work remotely, either partly or full time.
Remote monitoring and 24x7 support. When staff has to be reallocated during a time of crisis, they should not have to worry about supporting their software in the event of a failure. This support should be handled seamlessly and proactively by the vendor.
Low resource requirements. The software should be able to be installed on existing VM infrastructure without any special hardware requirements. IT staff have more pressing issues than to deal with complicated, time consuming installs with extensive configuration.
The ability to write to any combination of local or cloud targets. Organizations may have several cloud vendors in use, or a preferred vendor. Either way, the ability to use the cloud as a destination reduces risk as compared to on premises infrastructure.
Unlimited scalability for file data, to deal with the continued data generation rates of research instrumentation, such as genome sequencers and CryoEM microscopes. Data protection solutions that were designed to handle block data, like VM backups, are not capable of operating at the performance level required for modern scientific file output.

With a comprehensive solution with these features in place, organizations will be better equipped to navigate rapidly changing landscapes like we are facing now.

Scientific data must be protected at every stage of the research process, from the initial output of raw data, to the generation of results file, and finally the retention of key files for compliance or future research. With the unprecedented challenges facing research organizations at this time, having a comprehensive data protection plan in place is more important than ever. By prioritizing remote installation, flexible deployment options, and performance, you can reduce the risk, and stress, of protecting your data during these rapidly changing times.

Adam Marko is the Director of Life Science Solutions for Panasas. In his role, Adam is involved in driving all aspects of market development in Life Sciences including working with field sales, marketing and engineering. Adam has 15+ years of experience as both a researcher and as an IT professional analyzing data and meeting the informatics needs of life sciences organizations. He began his career as an intern at the Pittsburgh Supercomputing Center, and has since held roles involving drug discovery at Pfizer, molecular diagnostics at Asuragen, Research IT consulting at BioTeam, and Life Sciences SME at Igneous. He can be reached at amarko@panasas.com.