In the morning hours of Sunday, something went very wrong in a business center of the Amazon Web data.
At 6 AM ET error rates for large NoSQL database company called DynamoDB began to rise in the US-East region of the AWS Virginia – the oldest and largest of the nine regions across the globe. By 07:52 ET, AWS determine the cause of the problem: the problem with the way the database metadata management was as expected, affecting partitioning and table service.
Because of the complexity of the connections of the AWS services, this issue quickly to affect the total 34 services (out of 117) which monitors the health of Service Dashboard company. Everything from the Elastic Compute Cloud (EC2) virtual machine hosting service Service Glacier with relational databases have been affected by it. According to media reports, other companies that rely on the experience AWS outage too, from Netflix to IMDB, to dry fire, Pocket and buffer zones.
By noon Sunday AWS reporting this issue has been resolved, but not without many complaints and musings on Twitter and elsewhere.
What we can take away from this event? Here are a few thoughts
Even the big boys failed
Amazon Web Services is the boss of the market of public cloud computing IaaS – although Microsoft seems to be the company a run for its money. Sunday’s events remind us that even large providers of cloud computing outages established are still vulnerable.
Prepare for outages
Given that even the cloud provides the most complete on the market can still have six hours plus service interruptions, customers should be prepared for this tool. AWS has a long time to customers to architect their systems to handle the virtual machine and other services go down.
Netflix, perhaps one of the client’s biggest brands of the Amazon cloud, said through a spokesman that the impact of the outage on the company’s services is very small as it migrated workload automation US-East region from troublesome to another area after learning healthy disruption. Anyone who uses AWS for mission critical applications should architect their systems with the expectation that these services can run it at any time fail. Netflix has developed open source tools to help test its system for random accident. Although Netflix does not recognize a big problem for customers, third-party tracking sites outage reporting higher than normal reports of disruption to services from Netflix users Sunday morning. Even the well prepared can be affected by this issue.
“I told you so”
A blogger at Forbes said that this trophy has changed. I basically agree with this. If you are a fanboy AWS then you will say that these take less frequently then they used to be, and that if you pay attention to the best practices of the AWS then this situation will not affect you.
On the other side of the coin, lost like what happened Sunday will just continue to forage for anyone who is tired to send workloads to public clouds.
The fact that outages occur. They occur in a public cloud, on any and all of the vendors, and they occur in the internal data centers where companies run too. We are just a fact of life in IT.