Sep 13, 2019

What happens when the cloud goes down? AWS outage leads to permanent data loss

In the early morning of August 31st, the AWS data facility US-EAST-1 in North Virginia experienced a power failure. Backup generators promptly turned on to restore power to the data center. Unfortunately, just a few hours later, those backup generators began to fail, too.

In the immediate aftermath, roughly 7.5% of the EC2 instances and EBS volumes at that facility became unavailable. EBS is an elastic block storage service that helps teams keep data on a file system, even after shutting down EC2 instances. For some teams, EBS stores mission-critical data for applications and services.

Amazon slowly worked to restore its service throughout the morning, but for some the data loss was catastrophic. A few days later, AWS began notifying a small percentage of developers that hardware damage to Amazon’s infrastructure meant that some data could not be recovered. Engineering teams would either need to restore their data from a separate backup, or write off their data as permanently lost.

Despite its strong uptime and data recovery track record, Amazon’s terms state: “We have no liability whatsoever for any damages, liabilities, losses (including any corruption, deletion, or destruction or loss of data, applications or profits), or any other consequences resulting from the foregoing.”

The recent AWS incident highlights the importance of data redundancy and consistent backup creation as core parts of the software development workflow, even if engineering teams are using highly reliable, popular cloud services. Even the cloud, with its sprawling and distributed data centers, is subject to the consequences of unforeseen physical hardware failures.

Want to get more of these in your inbox?

Subscribe for weekly updates from the Software team.