Why is the maximum tolerable period of disruption Mtpd important for planners when developing a disaster recovery plan?

Terminologies relevant to Business Impact Analysis

It is important to understand some of the abbreviations and terminologies used in Business continuity planning from this stage. If you like the definitions provided by the ISO 22301:2019 framework, please refer to the framework.

Business Impact Analysis (BIA) -  the process of analyzing business activities to determine recovery priorities, objectives, business dependencies and targets.

Maximum Tolerable Period of Disruption (MTPD) - the maximum time a business can withstand disruption of its business process without it having an adverse impact on business operations. It can also be defined as an organization’s maximum acceptable outage (not IT per se, but process as whole).

Recovery Time Objective (RTO) - the timeline within which a business service or activity and its associated resources must be restored.

Recovery Point Objective (RPO) - the maximum amount of information the business is willing to lose— also known as maximum data loss.

What is a business impact analysis?

The purpose of conducting a BIA is to identify the impact of a disaster on an organization's processes and the supporting resources over a period of time and provide sufficient information required to prioritize business recovery. 

Having identified the teams involved (as detailed in Part 5: Planning), you are now ready to conduct interviews with the department champions and the other subject matter experts (hereafter referred to as SMEs) to identify the processes and their dependencies to find criticality of the business processes during a disaster and the order in which they must be restored. 

Whether an organization should conduct a risk assessment or business impact analysis first is a much debated topic. We will provide a companion series in the future that further explores the pros and cons of each approach, but for now we’ll outline the process of starting with a business impact analysis.

Irrespective of whether you conduct risk assessment first or the business impact analysis first, you need to identify the critical processes and the amount of downtime a business can afford in a disaster. If you are conducting a BIA for the first time, the champions will not know what they need to prepare ahead of the interview. Therefore, we suggest hosting workshops to set expectations and give the involved parties an opportunity to ask questions.

Preparing for the business impact analysis

Preparation is important to successfully conduct BIA interviews with the department champions. Below is a list of things you should consider:

  • Prepare a list of champions or SMEs that you will interview

  • Create the spreadsheet template you’ll use to collect information during the interview

  • Create a sample interview data template you can share with the interview attendees

  • Prepare a pre-session presentation to discuss with the selected champions or SMEs 

  • Send the interview schedule to the participants at least two weeks in advance

The spreadsheet template for the interview is crucial and so it must be prepared according to your organizational needs. ISO 22301:2019 provides some guidance on the information that needs to be collected during an interview. In general, the interview template should facilitate the collection of the below information:

  • The list of activities performed by the different departments that support your organization’s vision

  • Quantitative and qualitative scales for each process to determine its criticality 

  • Impacts on the organization over time when the respective activities are not performed 

  • Dependencies in terms of other activities inside your organization and external to your organization

  • Resource dependencies for the identified activities

  • The information systems that are used to support the respective business activities

There are no set rules to the spreadsheet design. You just have to make sure you capture the above information at a minimum. The spreadsheet could consist of a single worksheet or multiple worksheets, each capturing a category of information. Before you design the template, it is important to have enough discussions with the steering committee to determine the scales for MTPD, RTO and RPO, which will be used to capture the information.

Determining the measuring scales for MTPD, RTO & RPO

MTPD, RTO and RPO can be represented in minutes, hours, days or weeks. Collecting these measurements will help you determine how long a particular business process could withstand a disruption before it becomes a disaster (MTPD), at what point the business expects the activity to be up and running (RTO) and how many hours, days or weeks worth of information the business is willing to lose (RPO).

Ideally, once you identify the MTPD, the RTO can be half or three quarters of  the time from MTPD, unless such a proposal is not practical for your business. The scales for the above measures can be range based or absolute. When a range based scale is selected, the MTPD (and RTO, RPO) is usually provided in a range, such as: 

  • 0-4 hours 

  • 5-12 hours 

  • 13 hours-1 day 

  • 2 - 4 days 

  • 5 days-2 weeks

  • 2 weeks or more

MTPD, just to remind you, stands for Maximum Tolerable Period of Disruption. It’s a metric that gets less publicity than RTO (recovery time objective) and RPO (recovery time objective) and there is a reason, if not an excuse, for that. Definitions for MTPD usually sound simple: for example, “The maximum time an activity or resource can be unavailable before irreparable harm is caused to the organization.” The more complicated part is in the measurement of MTPD. Gauging irreversible damage in terms of job security of employees, legal liabilities, reputation and/or shareholder value is as much an art as a science. Furthermore, that simple definition may also be hiding more than meets the eye.

The overall concept of MTPD (or MTPoD, if you prefer) is a good one. You need to know MTPD for different key organisational activities, so that you can concentrate on priority items (lowest MTPD, for instance) and select solutions that fix things before the MTPD expires, not after. Business impact analysis (BIA) is typically the way to determine MTPD. Business objectives and risk analysis should also be prime factors in determining RTO and RPO. However, rightly or wrongly, figures for these two metrics are often influenced by existing technology and resources, which set limits on what can be accomplished, making them easier to calculate.

What MTPD does not describe is the amount of damage that may be done before it becomes irreversible. Think about the following different situations. In the first, a legal requirement for periodic reporting for example, damage can be considerable if MTPD is exceeded, but almost zero if a solution is implemented before MTPD expires. In the second, a production problem slows output to a crawl. Customers will only wait so long before moving to one of your competitors, and some may move before MTPD expires. While you might be able to win them back in the future, you lose revenue and goodwill, at least in the short term. In summary, MTPD sounds simple, but deserves some thought and scrutiny if you want to get it right.

If you have ever worked to develop, review, or test a Business Continuity or Disaster Recovery Plan (BCP or DRP), you may be familiar with the terms Recovery Point Objective (RPO), Recovery Time Objective (RTO), and Maximum Tolerable Downtime (MTD). But what do these terms mean, how are they different, and why are they important? Let's take a look.

Defining RPO, RTO, and MTD

Recovery Point Objective (RPO)

Recovery point objectives are about data loss tolerance. RPO is the term used in business continuity to identify the maximum targeted period in which data can be lost without severely impacting the recovery of operations. For example, if a business process could not lose more than one day's worth of data, then the RPO for that information would be 24 hours. RPO is very useful to help determine the frequency of backups for a given system.

Recovery Time Objective (RTO)

Recovery time objectives are about restoration goals. RTO is a term used in business continuity to identify the planned recovery time for a process or system which should occur before reaching the business process's maximum tolerable downtime. For example, if a business process could not sustain for more than one day without normal operations, then the first RTO should be less than 24 hours. RTOs can be helpful in determining what kind of recovery and/or redundancy may be required.

Maximum Tolerable Downtime (MTD)

Maximum tolerable downtime, also sometimes referred to as Maximum Allowable Downtime (MAD), represents the total amount of downtime that can occur without causing significant harm to the organization's mission. MTD is important to define so continuity planners can select and implement appropriate recovery methods and procedures to ensure downtime does not exceed acceptable levels.

Understanding the value of RPO, RTO, and MTD

RPO, RTO, and MTD are most frequently used in business continuity and are usually defined during the Business Impact Analysis (BIA). They are important measurements to ensure the requirements for a business process or function will be achieved by current systems and procedures. During the BIA, business process owners should be asked to identify their RPO and RTO requirements. Then, evaluation can be made on existing systems, processes, and procedures to ensure required RPOs and RTOs are able to be met in the event of a disruption and ensure RTOs do not exceed the organization's MTD.

RPO, RTO, and MTD in action

Let's look at a few examples.

RPO Example

Let's assume an imaging department receives paper documents they scan and save on server ABC. These paper documents have retention requirements and these documents would be difficult or impossible to reproduce if lost. Further, let's assume the current process by the department is to shred the paper documents at 4:00 PM the day after the documents are scanned. In this case, the RPO for server ABC would be 24 hours since after that time the paper documents would be destroyed and unable to be rescanned. Knowing the RPO for the server will allow IT to verify data stored on server ABC is backed up such that you would not lose more than 24 hours' worth of data to achieve an RPO of 24 hours.

RTO and MTD Examples

First, let's look at a simple example. In one department, let's assume server XYZ is used to retrieve critical customer or member information and legal requirements state our company must be able to access this information within 4 hours of a customer requesting it. In this case, the MTD would be 4 hours and the RTO for server XYZ would need to be less than 4 hours. Knowing the RTO for the server will allow IT to verify recovery processes are in place to meet these requirements.

Now let's look at a more complicated scenario. In another department, let's assume virtual server VM is running on a physical server SH and is pulling data from server DB. Senior management has determined the MTD for the department's business process is 8 hours. In this case, we must ensure the RTO for the business process, which would include the RTOs for each system, does not exceed the MTD. Let's also assume recovery of virtual server VM is dependent on physical server SH being recovered first. In this case, we cannot start the recovery of VM until SH is recovered. This means we must ensure the RTOs for VM and SH together do not exceed the MTD of 8 hours. Unlike the VM dependency, server DB does not have a dependency, so it can be recovered at the same time as SH.

With these requirements in mind, let's look at two different recovery scenarios based on different system recovery times.

First, let's assume the recovery time for each system is as follows: SH is 4 hours, VM is 2 hours, and DB is 3 hours. In this scenario, the combined RTO for the business process is 6 hours (see figure A). This RTO is within our MTD requirement.

Second, let's assume the recovery time for each system is as follows: SH is 6 hours, VM is 4 hours, and DB is 4 hours. In this scenario, the combined RTO for the business process is 10 hours (see figure B) which would not meet our MTD requirement of 8 hours.

Tips for using your RPOs, RTOs, and MTD to improve your business continuity operations

  1. Work with business process owners at least annually to create or review RTO and RPO requirements as a part of your BIA.
  2. Compare backup frequency with RPO requirements for all systems to ensure backup processes will be able to achieve recovery point objectives.
  3. Confirm the RTO(s) for a given business process will not exceed the organization's MTD.
  4. Plan to verify critical systems can meet RTO and RPO requirements when developing your business continuity exercise and testing plan.
  5. Ensure RTO and RPO metrics are evaluated and documented during BCP exercises.

Using Tandem Business Continuity Plan software to help you understand and evaluate your RPOs, RTOs, and MTD

The Tandem BCPGap Analysis Excel Spreadsheet is an excellent report to evaluate RTO, RPO, and MTD requirements. Exceptions are highlighted in red so users can easily see gaps and institute changes in order to meet requirements. The BCP Gap Analysis Excel Spreadsheet can be downloaded from two locations within the software:

  • Tandem > Business Continuity Plan > Reports
  • Tandem > Business Continuity Plan > Download Documents

To analyze RPOs, click the System Equipment RPO worksheet tab. This worksheet will show the gap between the shortest system/equipment backup frequency and the maximum tolerable time period for data loss (the RPO) for each business process where it is needed.

To analyze RTOs, click the Business Process RTO worksheet tab. This worksheet will show the gap between business process MTD and RPOs. In addition, the spreadsheet will display dependency RTOs to ensure gaps are addressed across related systems and processes. 

Última postagem

Tag