Your Enterprise Needs an (Updated!) IT Disaster Recovery Plan

It might seem common sense that an IT disaster recovery plan is as essential as fire insurance for any organization. But it’s actually even more crucial. A fire could destroy a facility, but an IT exposure could jeopardize the continuity, viability and even the survival of the entire organization. Short-term risks from temporary outages can disrupt workflows and therefore jeopardize operations and revenue. Greater threats and long-term impacts could result from loss of access to or information in the organization’s knowledge base. That accumulated expertise is more difficult to replace than the most seasoned manager or employee.

Types of Exposures

An IT disaster recovery plan must prevent or at least mitigate exposures from the following types of events:

  • Power and communications outages
  • Natural disasters and emergencies, including physical damage to facilities and property
  • Workplace sabotage and violence
  • Cyberattacks, equipment malfunction, system faults, and human error

Reasons to Invest in DR

Small businesses — particularly startups — might not yet have formalized disaster recovery (DR) plans. More mature businesses might well have documented plans, but those might not have been updated recently. Routine updates are necessary to make sure that all critical applications have been included, as well as to ensure that adequate testing has been maintained, along with making sure responsible personnel are fully trained in DR procedures.

Even if a plan already exists, DR exposures can therefore arise from lack of:

  • Testing
  • Updated documentation
  • DR scenario training

Why Use an IT Consulting Firm?

Engaging a top IT consulting firm to develop, update, implement and test a DR plan can be a prudent and cost-effective investment for several reasons:

  • Consulting firms often specialize in vertical markets and are likely to have extensive knowledge of the latest technological developments and best practices in your industry or sector
  • Objective observers may be more likely to spot exposures in systems that have become familiar and routine to your management or user base
  • The best consulting firms will have proper governance, processes and best practices to offer based on past experience

DR Plan Project Overview

A formal and thorough DR plan engagement will include the following phases:

Pre-planning. At the outset, key questions about the organization’s business operations, the potential financial impact of an incident, and customer experience with respect to tolerance must be answered. The output of analysis should be:

  • Recovery Point Objective (RPO) – the point in the process or workflow at which recovery is desirable, or at least possible
  • Recovery Time Objective (RTO) – the length of time during which an outage can be tolerated with minimal disruption

Phase 1 – Assessment. This phase surveys and evaluates the current system environment, focusing on applications that are crucial to the viability of operations. The analysis covers first business and then technical considerations. An Assessment Document should be created for each application, including the following details:

Phase 1 Assessment – Business Section

  • Application leads
  • Infrastructure leads
  • User base
  • Criticality rating
  • Operating hours needed

Phase 1 Assessment – Technical Section

  • Authentication information
  • Backup strategy
  • Application interdependencies
  • Interface requirements
  • Server information
  • Required database features
  • Application delivery method

Phase 2 – Documentation. This effort creates or updates the following formal descriptions:

  • DR failover process flow diagram. This chart shows swim lanes, providing a workflow between each departmental and technical component
  • Architecture diagram/infrastructure. To create a comprehensive architecture diagram, the DR planning team must identify the following items:
    • Internet Protocol addresses (IPs)
    • Server names
    • Port communication for network design
    • Core service dependencies
    • Replication method
  • Playbook. This list provides a sequential set of tasks delineating the resources and play-by-play efforts required to validate and test failover.
  • Runbook. This narrative is a guide for people and processes in the event of a failover. It should describe a detailed communication plan between critical stakeholder groups, as well as a list of failover procedures and processes to get each particular application back up and running in a timely manner.
  • Business Impact Analysis. This analysis establishes the acceptable amount of time each application will not be available (RTO) and the extent of data lost (RPO) during that time.

Phase 3 – Testing. Again, failover is the ability to switch application processing from a primary to a secondary location. Failback is the process of shifting back to the primary location after a failover. Both capabilities must be tested to ensure continuity of applications in the event of a complete data center outage. Organizations should conduct testing failover procedures and functionalities for each application, including any required troubleshooting of failover procedures and functionalities.

Phase 4 – Tabletop Simulation. In this final phase, a tabletop, or mock disaster, should be conducted to educate and validate the appropriate resources for readiness in case of failover. Tabletop DR scenarios should be performed every 6-12 months, on a reoccurring basis, to ensure currency in all applications.

Summary

Engaging a top IT consulting firm to develop, update and implement a formal DR plan can ensure the viability — and the very survival — of the enterprise. The benefits of using an outside firm include objectivity, competitive situational awareness, and technological currency. Any developments within the organizational structure, such as mergers, acquisitions, or new regional or global initiatives, must be addressed and incorporated into any DR strategy. And, most importantly, the development of DR plans must not be regarded as a one-time exercise but an ongoing commitment to due diligence by the organization and its IT department.

Sharp HealthCare Case Study

Learn how Sharp HealthCare and T2 Tech successfully completed the design and implementation of a redundant architecture and disaster recovery site at Sharp’s Rees-Steely Medical Group facility in San Diego, California. Leveraging the skills of Sharp’s adept IT team along with T2 Tech’s effective management staff and methodology, the two organizations efficiently tested failover for Sharp’s key applications without causing any interruption to business. Download the case study here.

About the Author:

Kevin Torf
Kevin Torf is an information systems executive with a 30+ year career. In 2012, Kevin became a managing partner of T2 Tech Group after merging the consulting division of Inventtrex into T2 Tech. He specializes in large-scale IT project design, procurement and implementation. He offers experience in executive-level technology consulting involving data centers, server farms, storage and backup systems, security, video messaging and VoIP systems.

Leave A Comment