IT Guy

    Amazon Web Services Goes Down

    November 3, 2015

    By Per Bauer

    On September 20, Amazon’s cloud computing service suffered a failure that took it offline for several hours, bringing services like Netflix and Tinder down with it. What causes these failures, and how long will customers continue to tolerate them?

    Nowadays, we’re accustomed to easy and fast access to all the data we want, whenever we want it. So it should come as no surprise that when Amazon Web Services (AWS) went down, many took to social media to complain.

    Since many tech companies don’t host their own infrastructure and instead rely on cloud servers, the Amazon failure also took large chunks of the internet down with it. After their online shopping and streaming sessions were interrupted, the masses wanted to know: what caused the failure, and will it happen again?

    As Buzzfeed points out, this is not the first time Amazon has experienced a devastating crash. In 2013, AWS went down for 40 minutes and caused quite a stir, allegedly costing Amazon $1,100 per second in lost revenue.

    What caused the recent breakdown? The Independent reports that the problem could be traced to a server glitch in Virginia, but any details beyond that are hard to come by.

    Why Big Sites Crash

    For some, tales of server failures and downed websites conjure images of the hacker group Anonymous and their infamous DDoS (distributed denial-of-service) attacks, or of cyber-terrorists using sophisticated algorithms to breach sensitive data. While security is always a priority, the reality is that most server failures  even for large conglomerates like Amazon  are brought on by relatively mundane causes.

    According to the cloud computing experts at, most server failures stem from a power outage or severed backup connection. Generally, this is either the result of human miscalculations or natural disasters like floods, fires, and lightning strikes (as TIME reports happening to Google). Of course, spikes in traffic to a site can cause outages, though one would imagine Amazon would have that variable covered after their downtime debacle following the release of Lady Gaga’s album.

    Regardless of the true cause of the failure, there are steps Amazon and other cloud providers can take to avoid similar catastrophes in the future.

    An Ounce of Prevention...

    When it comes to securing cloud services, nobody does virtual infrastructure risk mitigation better than TeamQuest. Their new Performance Software is the first of its kind: it constantly monitors the health of your cloud infrastructure without interfering with your proprietary systems. Instead, it feeds off your operational data, then identifies and reports on potential weaknesses long before they have the chance to cause failure.

    What’s more, TeamQuest’s Predictor and AutoPredict tools are reliable, automated programs. They do not require significant alterations to your current systems, and best of all, they are infinitely scalable. They can learn to interact with your system regardless of size, and will adapt as your needs evolve.

    Considering the thousands-of-dollars-per-second that Amazon purportedly loses during downtime, an investment in software solutions would have paid for itself in a flash. Hindsight is 20/20 of course, so do the smart thing for your IT infrastructure and let us help guide you into the future of cloud computing.

    (Main image credit: Leonardo Rizzi/flickr)

    Category: cloud-computing