How Netflix Came Prepared for the House of Cards Premiere
In 2014, House of Cards fans worried that Frank Underwood would be undermined by slow Netflix services. The streamed premiere, however, came off smoother than a politician.
At blockbuster movie premieres, people wait in lines for hours just so they can be among the first to watch -- to avoid all the hassle of crowds and waiting, there’s Netflix. But in 2014, there was concern that the surge of users streaming the season two premiere of House of Cards would crash the IT infrastructure supporting Netflix, returning viewers to the Dark Ages of long lines and longer waits.
Netflix streaming accounts for a whopping 37% of peak internet traffic periods, according to Venture Beat, and when demand spikes for popular shows, it can take a serious toll on IT systems around the world. Unlike a movie theater, the internet can’t simply stop selling tickets once capacity is reached.
Within 24 hours of the premier, a full 16% of Netflix’s 35 million U.S. subscribers had tuned in, according to Forbes, eight times the number of those who logged in for season one. But to viewers’ (and IT technicians’) delight, the premier went off without a single hiccup. We all wanted to know: how did Netflix prepare for such high demand?
Netflix is able to handle such massive viewership spikes because it intentionally throws the proverbial monkey wrench into its own systems. As TechRepublic reports, Netflix developed its own proprietary “chaos engineering” software called Chaos Monkey, in 2010. Using its “Simian Army,” the software deliberately incites random failures in the Netflix IT infrastructure, which enables Netflix to resolve potentially catastrophic problems before they occur.
Netflix isn’t immune to service outages, but when they happen, the company consistently restores service with astounding speed. In fact, they’re rarely responsible in the first place. For instance, their service was briefly disrupted last September when Amazon’s Web Service (AWS, the cloud platform which how hosts the entirety of Netflix) experienced a colossal regional outage, according to the Register. With a ready-made solution in hand, Netflix quickly diverted services through other regional data centers, sidestepping the problem and restoring service.
Their IT infrastructure is so robust because of a practice called multi-region, active-active replication, which means that the data, storage, and applications required to run Netflix are fully duplicated across other regional sites. Netflix claims that it could, in fact, recover from a systemwide AWS outage in a matter of hours (though they won’t tell where they hide the extra capacity), as BGR reports.
But the cloud isn’t the only cause of lagging online videos. As QZ reports, many slowdowns occur between content delivery networks (CDNs, the providers that actually deliver Netflix shows across the web) and internet service providers (ISPs). The two often boost bandwidth by engaging in peering, according to Tech Target, which directly connects their respective infrastructures together and enables traffic-sharing -- in theory, a win-win.
However, peering agreements constantly come to disagreement (ISPs feel that CDNs should pay them for the burdensome Netflix traffic that they send), causing slowdowns while the quarreling parties resolve the issue.
Under their own roof, Netflix has one of the most rigorous capacity management programs in the world. And while they pay extra for duplicating their capacity, every enterprise IT department can afford to ensure that their service performance is rock-solid -- companies live and die by their SLA commitments.
Yet, there’s no need to develop proprietary monkey software. TeamQuest’s best-in-breed diagnostic and predictive analytics tools are already used to predict demand spikes and surface issues by some of the largest internet service companies in the world. If there’s one lesson that Netflix has taught, it’s that the cost of preparing for disaster pales to that of experiencing it. That’s how Netflix is able to support House of Cards and not fold like one.
Want to learn more about best-in-breed capacity management practices? We recommend watching the webinar “Capacity Mangement Taken to the Next Level”, with George Spalding from Pink Elephant and Jerry Goldman from Law School Admission Council.
(Main image credit: Wikimedia)