Today, we’re going to talk about Disaster Recovery. RPO, RTO, MTD and WRT… What do all these acronyms mean? And why are they so important to business and service continuity?
The subject of disaster recovery can sometimes give people the jitters, and this is hardly surprising. It can be a little scary to imagine all the ways that things could potentially go wrong, and the subject is filled with enough jargon and acronyms to completely put off novices and lay people.
In this article, we’re going to try and break this topic down and clear up some of the mystery around the key concepts. Then, you’ll be much better equipped to make the right choices to maintain business continuity, whatever might happen in the future.
Disaster Recovery: What are RPO, RTO, WRT and MTD, and why are they important?
Let’s imagine that you work for a normal-size business and everything is running fine. The only problems that you really encounter are fairly minor. One computer might get a virus occasionally, another might need the mouse replaced, one of your printers keeps displaying error messages… These things can happen in any company, and they frequently do.
This is what we call normal operations. There are no interruptions to services and there is no risk to business continuity.
The graph below shows an illustration of the status of operations over time.
However, now let’s imagine that there is a service failure or a disaster.
Clearly, when this happens, we need to get systems back up and running again as quickly as possible and restore normal operations.
This is where the RPO (Recovery Point Objective) comes into play. This basically determines the maximum amount of data loss or service loss that the business can handle.
The RPO can vary drastically depending on the type of business or the service affected, as well as other factors such as the time the disaster occurs.
The recovery process covers everything from the moment the failure is detected until it is restored to a fully operational state. During this time, normal operations will not be possible.
The RTO (Recovery Time Objective) determines the maximum amount of time that the business can accept to get services operating normally again. This time is measured starting from the moment the failure occurs.
This period includes all the tasks required to get the system running again so that it is ready to use once again.
For example, let’s imagine that our ERP database server fails. To recover this service, you’ll need to deploy a new server, deploy the database and then restore the most recent backup before the server can be put back into production.
However, we haven’t yet fully restored the server. We’ve merely recovered it and configured it, ready to go. It’s not yet providing any services to our employees or customers. So, the clock is still ticking and we’re still losing money.
This is where we come to the WRT (Work Recovery Time). This is the time it takes to fully restore the service and involves performing all necessary system testing and putting the service back online.
The sum of the RTO and the WRT is called the MTD (Maximum Tolerable Downtime). This is the maximum amount of time that the service can be down before the consequences for the company become unacceptable.
These are the most basic disaster recovery concepts. To develop a proper disaster recovery or business continuity plan, you would need to calculate each of these values and have a clear idea of what the business can handle. Then, you would need to develop the necessary processes to ensure that, whatever happens, these target values will always be met.
Conclusion
We hope you have enjoyed this brief introduction to disaster recovery. By now, you should have a clear idea of what RPO, RTO, MTD and WRT are and why they are so important for the survival of your business.
If you’d like to find out more about this topic, check out some of the other articles on our blog.