Cloud Computing isn’t perfect, but it’s far from broken…
The most recent Amazon cloud outage isn’t a herald of worse things to come; it’s more like a learning opportunity
Yes, on October 22, 2012, Amazon’s large Virginia data center experienced some technical difficulties which brought the popular sites Reddit, Foursquare and others for a number of hours. This of course comes on the heels of the last significant outage, which occurred around 5 months ago. The question is; are we collectively blowing these events completely out of proportion?
Given the recent turn of events involving AWS, it would appear that some on the internet have never experienced server / site downtime before (or perhaps just want us to believe that). Fact: quite a number of service outages and site disruptions have occurred prior to the rise of cloud computing. Likewise, in many of those cases, we weren’t looking at mere hours of not being able to surf the net, but days. Additionally, some of the events in the distant past were also followed by significant loss of user data as well. Compare and contrast this with cloud computing and its various implementations for backing up user data.
In the past, many critical errors might have been caused by massive hardware failure. Naturally, since non-cloud systems don’t (or didn’t) tend to provide for contingencies, bringing them back online often meant replacing racks of expensive hardware which would often have to be ordered and shipped. What makes the cloud so different is that everything can be more easily copied, backed up and moved around. In other words, when a cloud outage occurs it’s often simply a matter of moving things around and reconfiguring as opposed to waiting by the loading dock for new hardware to show up.
Having said all that, it is somewhat troubling that AWS would experience such problems at the same data center. But we shouldn’t use these minor events to berate Amazon or their abilities; instead, we should be learning from our mistakes. The simple truth is that there is immense power and ability inherent in AWS; it truly is a marvelous implementation of some very cutting edge technology that not only works, but allows us to achieve some very amazing things.
This notion that “if AWS goes down, so does the entire internet” is not only silly, but extremely misaligned with reality. Sure, if Amazon were to keep growing at its current pace, and were able to dominate the competition in the long-run, they would likely become tasked with running most of the major sites on the planet. However, if such a thing were to happen, wouldn’t they also increase their capabilities and expand their budgets as well? Moreover, why is it that the tendency is to assume the worst in every single scenario?
For starters, since we’ve experienced at least two events which are relatively close together in exactly the same location it’s safe to assume that this is a unique problem. In other words, given that the same data center is crashing it’s entirely logical for one to deduce that this is not a systemic problem we’re dealing with. A good comparative example would be in food production; just because one factory allows tainted spinach, peanut butter, etc…to enter into circulation doesn’t mean that all of these items produced by other companies are also contaminated.
According to recent reports, the actual AWS outage / event in question was triggered by a memory leak which led to “cascading failures”. Once again, these types of events are more-or-less expected when it comes to fully implementing a new technology for longer periods of time. Call it “working out the bugs” if you like. If anything, these types of events highlight the need for increased scrutiny and the development of new solutions and if necessary, better replacement components/technologies.
Although it’s somewhat macabre to say so, these types of events are sort of similar to growing pains. People often forget that cloud computing has pretty much replaced traditional grid/waterfall computing/networking in a relatively short amount of time. To tell you the absolute truth, it’s astounding that cloud technology has been as monumentally successful as it has been.
Furthermore, the doomsayers have been predicting terribly crippling events which last for weeks on end ever since the cloud and AWS hit the market. While it would be reckless to assume that these paranoid individuals are completely without merit (they do occasionally make some excellent points which we should learn from), most people are clearly jumping on the fear bandwagon as if it were a pop culture meme. The fact of the matter is that cloud computing has already worked its way into our lives, digital Armageddon has yet to occur and we have no logical reason to assume that it is coming as expected. The threat of solar storms, CME and EMP attacks are often used to undermine cloud computing and its stability…news flash: no computer system or server farm is truly fully protected from these events, cloud or otherwise. Need evidence? During the “Carrington event” in the 1800’s, a solar storm caused the telegraph wires to burn!
In short, let’s actually give cloud computing time to grow, heal and develop before we label it a broken, transitional technology. The internet of today has been greatly empowered courtesy of the abilities of cloud computing. In truth, most of the activities we engage in online every single day wouldn’t even be possible without some form of cloud computing to drive them.