Beyond restoring normal operations and data, what else should be done during the recovery phase?

6.What risk are you exposing your organization to when you contract services from athird party?

The recovery point objective (RPO) is the age of files that must be recovered from backup storage for normal operations to resume if a computer, system or network goes down as a result of a hardware, program or communications failure. The RPO is expressed backward in time -- that is, into the past -- from the instant at which the failure occurs and can be specified in seconds, minutes, hours or days. It is an important consideration in a disaster recovery plan (DRP).

Once the RPO for a given computer, system or network has been defined, it determines the minimum frequency with which backups must be made. This, along with the recovery time objective (RTO), helps administrators choose optimal disaster recovery (DR) technologies and procedures.

For example, if the RPO is one hour, admins must schedule backups for at least once per hour. In this case, external, redundant hard drives may prove to be the best disaster recovery platform. If the RPO is five days (120 hours), then backups must happen at intervals of 120 hours or fewer. In that situation, tape or cloud storage may be adequate.

How does RPO work?

RPOs work by defining the duration of time that can pass before the volume of data loss exceeds what is allowed as part of a business continuity plan (BCP).

The amount of data loss an RPO allows is known as the enterprise loss tolerance. Depending on the organization and the workload, loss tolerance will vary, which affects what the associated RPO for that workload should be.

An RPO is enabled by setting the desired data backup frequency, such that there is always a backup available that fits within the duration of time the loss tolerance allows for. Admins can automatically configure an RPO as a policy setting inside of backup or storage software and cloud services.

Express RPO backward in time from the point or instant when failure happens.

How do you calculate RPO?

Calculating an RPO has several prerequisite steps.

At the most basic level, organizations first need to understand what data they have and where it exists. Understanding how frequently the different data changes as part of normal business operations is another foundational step. Companies must also assess what the value of the data actually is at a given point in time.

With the prerequisite steps in place, administrators will have the information needed to make a policy decision to determine what the RPO should be. So, after understanding how often data changes and what the value of it is, they can calculate RPO as a function of their organization's loss tolerance.

That is, how much data -- as measured by duration of time -- can their company afford to lose and still be able to recover for normal business operations.

Examples of RPOs

Businesses can choose to have any number of different tiers for an RPO based on workload and loss tolerance.

  • Critical data (0-1 hours). For the most valuable data organizations can't afford to lose at all, such as banking transactions, the RPO needs to be set for continuous backup.
  • Semicritical (1-4 hours). For data that is semicritical, which could include data on file servers or chat logs, an RPO of up to 4 hours should be set.
  • Less critical (4-12 hours). Data such as marketing information is often deemed as less critical, for example, and can work with a longer loss tolerance with an RPO of up to 12 hours.
  • Infrequent (13 - 24 hours). Infrequently updated data, such as product specifications, can have an RPO of up to 24 hours.

Experts recommend not implementing an RPO of more than 24 hours, as having a daily backup is a bare-minimum best practice for nearly all data at any time of day.

RPO in disaster recovery planning

A DRP is all about having a strategy in place to help recover necessary data and systems after a data loss event or natural disaster.

Unlike scheduled maintenance or downtime, a disaster event is unpredictable. This is why organizations need to have a DR strategy with a defined RPO and other objectives in place to help limit its impact. With an RPO, enterprises will have defined what the loss tolerance is for potential data loss, so instead of a disaster event being entirely unpredictable, organizations will know ahead of time what the maximum amount of data loss will be.

For example, take an RPO for critical data that an organization backed up at least every hour. This means that as part of a business continuity plan, it knows the worst-case scenario from a data loss event is the most data it will lose is one hour's worth.

Differences between RPO and RTO

Recovery point objective is closely related to recovery time objective, which is the maximum length of time computing resources and applications can be down after a failure or disaster. Together, the two approaches enable a BCP and a DR strategy.

Recovery point objective. The RPO determines loss tolerance and how much data can be lost. It is a planning objective that defines how often data needs to be backed up to enable recovery. An organization enables RPOs by having a DR approach in place that backs up data at the right intervals, so the amount of data loss never exceeds its determined loss tolerance.

Recovery time objective. The RTO comes into play after a loss event. It helps organizations answer the question of how quickly they can recover after data loss due to a failure, natural disaster or malfeasance.

The differences between recovery point and recovery time objectives.

RPO and RTO work together in a time sequence, with RPO making sure a business has the right data backup policies in place and RTO ensuring it can recover data backups quickly.

Disaster recovery (DR) is an organization's ability to respond to and recover from an event that negatively affects business operations. The goal of DR methods is to enable the organization to regain use of critical systems and IT infrastructure as soon as possible after a disaster occurs. To prepare for this, organizations often perform an in-depth analysis of their systems and create a formal document to follow in times of crisis. This document is known as a disaster recovery plan.

Read on to learn more about why DR is important, how it works, and the difference between disaster recovery and business continuity. You'll also discover what to include in a disaster recovery plan and the major types of DR, as well as major DR services and vendors.

What is a disaster?

The practice of DR revolves around events that are serious in nature. These events are often thought of in terms of natural disasters, but they can also be caused by systems or technical failure or by humans carrying out an intentional attack. They are significant enough to disrupt or completely stop critical business operations for a period of time. Types of disaster include:

  • Cyber attacks such as malware, DDoS and ransomware attacks
  • Sabotage
  • Power outages
  • Equipment failure
  • Epidemics or pandemics, such as COVID-19
  • Terrorist attacks or threats
  • Industrial accidents
  • Hurricanes
  • Tornadoes
  • Earthquakes
  • Floods
  • Fires

Why is disaster recovery important?

Disasters can inflict many types of damage with varying levels of severity, depending on the scenario. A brief network outage could result in frustrated customers and some loss of business to an e-commerce system. A hurricane or tornado could destroy an entire manufacturing facility, data center or office.

The monetary costs can be significant. The Uptime Institute's Annual Outage Analysis 2021 report estimated that 40% of outages or service interruptions in businesses cost between $100,000 and $1 million, while about 17% cost more than $1 million. A data breach can be more expensive; the average cost in 2020 was $3.86 million, according to the 2020 Cost of a Data Breach Report by IBM and the Ponemon Institute.

Additionally, many businesses are required to create and follow plans for disaster recovery, business continuity and data protection in order to meet compliance regulations. This is particularly important for organizations operating in financial, healthcare, manufacturing and government sectors. Failure to have DR procedures in place can result in legal or regulatory penalties, so understanding how to comply with resiliency standards is important.

Preparing for every potential disaster may seem extreme, but the COVID-19 crisis illustrated that even scenarios that seem farfetched can come to pass. Businesses that had emergency measures in place to support remote work had a clear advantage when stay-at-home orders were enacted.

Thinking about disasters before they happen and creating a plan for how to respond can provide many benefits. It raises awareness about potential disruptions and helps an organization to prioritize its mission-critical functions. It also provides a forum for discussing these topics and making careful decisions about how to best respond in a low-pressure setting.

What is the difference between disaster recovery and business continuity?

On a practical level, DR and business continuity are often combined into a single corporate initiative and even abbreviated together as BCDR, but they are not the same thing. While the two disciplines have similar goals relating to an organization's resilience, they differ greatly in scope.

BC is a proactive discipline intended to minimize risk and help ensure the business can continue to deliver its products and services no matter the circumstances. It focuses especially on how employees will continue to work and how the business will continue operations while a disaster is occurring. BC is also closely related to business resilience, crisis management and risk management, but each of these has different goals and parameters.

DR is a subset of business continuity that focuses on the IT systems that enable business functions. It addresses the specific steps an organization must take to resume technology operations following an event. DR is also a reactive process by nature. While planning for it must be done in advance, DR activity is not kicked off until a disaster actually occurs.

Elements of a disaster recovery strategy

Before an organization can determine its DR strategies, it must first analyze existing assets and priorities. Two different analyses typically factor into DR decision-making:

Risk analysis

Risk analysis or risk assessment is an evaluation of all the potential risks the business could face, as well as their outcomes. Risks can vary greatly depending on the industry the organization is in and its geographic location. The assessment should identify potential hazards, determine who or what these hazards would harm, and use the findings to create procedures that take these risks into account.

Business impact analysis

Business impact analysis (BIA) evaluates the effects of the risks identified above to business operations. A BIA can help predict and quantify costs, both financial and non-financial. It also examines the impact of different disasters on an organization's safety, finances, marketing, business reputation, legal compliance and quality assurance.

Understanding the difference between risk analysis and BIA and conducting the assessments can also help an organization define it goals when it comes to data protection and the need for backup. Organizations generally quantify these using measurements called recovery point objective (RPO) and recovery time objective (RTO).

Get started with your own analysis by reading our guide to BIA and free template.

Recovery point objective

RPO is the maximum age of files that an organization must recover from backup storage for normal operations to resume after a disaster. The RPO determines the minimum frequency of backups. For example, if an organization has an RPO of four hours, the system must back up at least every four hours.

Recovery time objective

RTO refers to the amount of time an organization estimates its systems can be down without causing significant or irreparable damage to the business. In some cases, applications can be down for several days without severe consequences. In others, seconds can do substantial harm to the business.

RPO and RTO are both important elements in disaster recovery, but the metrics have different uses. RPOs are acted on before a disruptive event takes place to ensure data will be backed up, while RTOs come into play after an event occurs.

Read more about calculating recovery objectives and the difference between RPO and RTO.

What's in a disaster recovery plan?

Once an organization has thoroughly reviewed its risk factors, recovery goals and technology environment, it can write a DR plan. The DR plan is the formal document that specifies these elements and outlines how the organization will respond when disruption or disaster occurs. The plan details recovery goals including RTO and RPO as well as the steps the organization will take to minimize the effects of the disaster.

The components of a DR plan should include:

  • A DR policy statement, plan overview and main goals of the plan.
  • Key personnel and DR team contact information.
  • A step-by-step description of disaster response actions immediately following an incident.
  • A diagram of the entire network and recovery site.
  • Directions for how to reach the recovery site.
  • A list of software and systems that staff will use in the recovery.
  • Sample templates for a variety of technology recoveries, including technical documentation from vendors.
  • A communication that includes internal and external contacts, as well as boilerplate for dealing with the media.
  • Summary of insurance coverage.
  • Proposed actions for dealing with financial and legal issues.

An organization should consider its DR plan a living document. Regular disaster recovery testing should be scheduled to ensure the plan is accurate and will work when a recovery is required. The plan should also be evaluated against consistent criteria whenever there are changes in the business or IT systems that could affect DR.

For more details and guidance, download a free DR plan template and planning guide.

How disaster recovery works

DR initiatives are more attainable by business of all sizes today due to widespread cloud adoption and availability of virtualization technologies that make backup and replication easier. However, much of the terminology and best practices developed for DR were based on enterprise efforts to recreate large-scale physical data centers. This involved plans to transfer, or fail over, workloads from a primary data center to a secondary location or DR site in order to restore data and operations.

Disaster recovery sites

An organization uses a DR site to recover and restore its data, technology infrastructure and operations when its primary data center is unavailable. DR sites can be internal, external or cloud-based.

An organization sets up and maintains an internal DR site. Organizations with large information requirements and aggressive RTOs are more likely to use an internal DR site, which is typically a second data center. When building an internal site, the business must consider hardware configuration, supporting equipment, power maintenance, heating and cooling of the site, layout design, location and staff.

An external disaster recovery site is owned and operated by a third-party provider. External sites can be hot, warm or cold.

  • Hot site: A fully functional data center with hardware and software, personnel and customer data, which is typically staffed around the clock and operationally ready in the event of a disaster.
  • Warm site: An equipped data center that doesn't have customer data; an organization can install additional equipment and introduce customer data following a disaster.
  • Cold site: Has infrastructure to support IT systems and data, but no technology until an organization activates DR plans and installs equipment; they are sometimes used to supplement hot and warm sites during a long-term disaster.

A cloud recovery site is another option. An organization should consider site proximity, internal and external resources, operational risks, service-level agreements and cost when contracting with cloud providers to host their DR assets or outsourcing additional services.

Disaster recovery tiers

In addition to choosing the most appropriate DR site, it may be helpful for organizations to consult the tiers of disaster recovery identified by the Share Technical Steering Committee and IBM in the 1980s. The tiers feature a variety of recovery options organizations can use as a blueprint to help determine the best DR approach depending on their business needs.

Another type of DR tiering involves assigning levels of importance to different types of data and applications and treating each tier differently based on the tolerance for data loss. This approach recognizes that some mission-critical functions may not be able to tolerate any data loss or downtime, while others can be offline for longer or have smaller sets of data restored.

Types of disaster recovery

In addition to choosing a DR site and considering DR tiers, IT and business leaders must evaluate the best way to put their DR plan into action. This will depend on the IT environment and the technology the business chooses to support its DR strategy.

Types of DR can vary, based on the IT infrastructure and assets that need protection as well as the method of backup and recovery the organization decides to use. Depending on the size and scope of the organization, it may have separate DR plans and implementation teams specific to departments such as data centers or networking. Major types of DR include:

Data center disaster recovery

Organizations that house their own data centers must have a DR strategy that considers all the IT infrastructure within the data center as well as the physical facility. Backup to a failover site at a secondary data center or a colocation facility is often a large part of the plan (see "Disaster recovery sites" below). IT and business leaders should also document and make alternative arrangements for a wide range of facilities-related components including power systems, heating and cooling, fire safety and physical security.

Network disaster recovery

Network connectivity is essential for internal and external communication, data sharing and application access during a disaster. A network DR strategy must provide a plan for restoring network services, especially in terms of access to backup sites and data.

Virtualized disaster recovery

Virtualization enables DR by allowing organizations to replicate workloads in an alternate location or the cloud. The benefits of virtual DR include flexibility, ease of implementation, efficiency and speed. Virtualized workloads have a small IT footprint, replication can be done frequently, and failover can be initiated quickly. Several data protection vendors offer virtual backup and DR as a product.

Cloud disaster recovery

The widespread acceptance of cloud services allows organizations that traditionally used an alternate location for DR to be hosted in the cloud. Cloud DR goes beyond simple backup to the cloud. It requires an IT team to set up automatic failover of workloads to a public cloud platform in the event of a disruption.

Disaster recovery as a service (DRaaS)

DRaaS is the commercially available version of cloud DR. In DRaaS, a third party provides replication and hosting of an organization's physical and virtual servers. The provider assumes responsibility for implementing the DR plan when a crisis arises, based on a service-level agreement.

Learn more about matching your business needs with available DR options.

Disaster recovery services and vendors

Disaster recovery vendors can take many forms, as DR is more than just an IT issue. DR vendors include those selling backup and recovery software as well as those offering hosted or managed services. Because DR is also an element of organizational risk management, some vendors couple disaster recovery with other aspects of security planning, such as incident response and emergency planning. Options include:

  • Backup and data protection platforms
  • DraaS providers
  • Add-on services from data center and colocation providers, and
  • Infrastructure as a service providers

Choosing the best option for an organization will ultimately depend on top-level business continuity plans and data protection goals, and which option best meets those needs along with budgetary goals.

Some of the major disaster recovery software and DRaaS providers include, but are not limited to:

  • Acronis
  • Dell EMC
  • Microsoft
  • IBM
  • VMware
  • Veeam
  • Zerto

Emergency communication vendors are also a key part of the recovery process, and include Everbridge Crisis Management, Cisco, Rave Alert, AlertMedia and BlackBerry AtHoc.

Download a free SLA template for use with disaster recovery products and services.

While some organizations may find it a challenge to invest in comprehensive disaster recovery planning, none can afford to ignore the concept when planning for long-term growth and sustainability. Additionally, if the worst were to happen, organizations that have prioritized DR will experience less downtime and be able to resume normal operations faster.