• Nikody Keating

Leading Disaster Recovery: Single region with high availability


When leading teams with Disaster Recovery, things can be a bit hazy when identifying the repercussions of the decisions you make. This article will dive into the first of the five AWS disaster recovery approaches. We'll break down this topic into five sections: what is it, what are the problems with this approach, what are you signing up for, how does it impact the team's mentality and a conclusion.


What is it?

This DR approach can be simply stated as, “We plan to not put any effort into regional disaster recovery”. This is typically the first approach of any serverless system being developed from scratch, as business value is in the forefront of everyone’s mind when first building these systems. The benefit of this approach is that since almost all serverless systems are deployed across multiple data centers, the odds of being impacted by any single data center is slim. If there is an outage, the typical outage for AWS is 6 – 8 hours, which is shorter than most RTOs. In this case, by simply deciding to recover your service with AWS, you stay focused on business value rather than recovery.


What are the problems with this approach?

The downside here is the risk of extended unavailability if there is a disaster or outage which effects all data centers in a region. In the best case scenario, your system would come back online when the services in AWS came back online. In the worst case scenario all or part of your system's data could be lost.


What are you signing up for?

Minimum Leadership Level

Non-technical leadership

Team Type

Standard development team

Team AWS Skill

Very Low

Risk Tolerance Level

Very high

Average AWS Outage

6 – 8 hours

Regional Recovery Time

N/A

Recovery requiring design changes

Probable

Opportunity Level

None

Up front development costs

N/A

This approach means that your team isn’t signing up for any extra work to support their production systems. Your customer will need to be able to handle outages which can last anywhere from 6 hours to a week, and the potential that data might be lost in the worst-case scenario.


How does this impact the team’s mentality?

The impact to this approach on the team is that the team will not consider recovering from a disaster in their development practices. The team will also tend to focus on the entry level information on their systems. This is great if your primarily focused on capabilities, as your team will be well suited for that. This type of team will often never evolve to next level concepts around CICD and Blue/Green deployments in serverless, and in some cases that is completely appropriate.


The long-term impacts will often lead to systems which will fail to deploy to a new region if attempted, may require significant rework when compliance standards around the globe must be met or disaster recovery becomes a priority, and may be built on technologies which cannot be recovered in the region you’re attempting to recover to. An example of this last one is the AWS service TimeStream, which originally has limited availability in regions around the world, and up until reinvent 2021 could not write data except to its higher cost in memory storage. This creates a significant issue if you’re trying to recover a year’s worth of backups.


Conclusion

This form of disaster recovery leverages the existing capabilities build inherently into serverless systems. That means that it's readily available and doesn't require any planning or real effort to achieve. This approach while immediately useful for the short term, should not be considered a long term solution, as a Backup and Restore solution can now leverage the automation of AWS Backup, and provides a better backup and recovery solution.

95 views0 comments

Subscribe to get our latest content by email and a free "Guide to Building an Internal Enterprise Website on AWS Serverless".