3 IT Disaster Scenarios That Call for Real-Time Decision Making

    December 29, 2015

    By Per Bauer

    It’s the worst-case scenario: a company’s capacity is down, and they must decide in a split second which applications to drop so that critical services can remain available. When performed skillfully, these decisions can result in a return to smooth sailing — but in other cases, there’s a momentary but ugly loss in revenue, as well as a rough few weeks of PR work.

    Here are three real world IT disaster scenarios that separate mature IT strategies from the rest. Of course, these problems are best dealt with through prevention, but most companies are bound to run into bad luck at least once. What would you do if you found yourself in one of these situations?

    The Online Retail Bottleneck

    More and more customers are shopping online — according to Fortune, online Black Friday sales recently outpaced revenue from traditional brick-and-mortar outlets — which places an extraordinary burden on retail servers. The outages that result from such spikes in demand can be harmful during any business period, but the effects are especially severe during a peak shopping event.

    This leaves companies with limited options: momentarily cut other internal applications (and departments) in order to keep online resources available; accept the outage, announce public reparations, and wait; meter your traffic to the site so that a smaller customer base can still shop; or some combination of all the above.

    Sometimes there’s little to be done. Retailer Neiman Marcus suffered a massive outage on Black Friday, as CNBC reports, and because systems were offline until the following day, the company was forced to offer an extended “Black Saturday” sale. Customers had trouble accessing the site throughout the weekend, and the retailer doubtlessly fell frustratingly short of whatever sales goals it had set for the largest annual shopping event.

    Likewise, Target suffered an outage on Cyber Monday causing a rash of social media condemnation. Although Target won points by metering service, according to TechCrunch, this is only the most recent in a spate of outages and other IT blunders that has lost the retailer millions, as RetailMerchandiser reports.

    Financial Floundering

    Financial institutions are even more vulnerable to these risks. A sudden outage not only halts financial transactions and markets, but leaves companies vulnerable to stiff regulatory consequences and potential lawsuits. What if an outage happened at a critical moment?

    Such was the fate of PayPal. Like Target, the e-payment giant’s online transfer services went offline during Cyber Monday. During the worst period, 70% of customers were unable to make online payments, causing a cascading effect for companies who offer Paypal as a payment method. Paypal could only offer public updates while they resolved issues internally.

    Even the largest institutions have found themselves in hot water: the New York Stock Exchange went offline for nearly four hours in July, according to the Washington Post, forcing orders to freeze or else be redirected ad-hoc through other exchanges. Traders panicked, undermining the confidence that markets can operate securely and consistently. A loss of trust in institutions like these can have a direct, negative effect on the economy generally.

    Information security publisher Dark Reading warns that financial institutions must determine how they will address regulatory and legal compliance, manage the release of court-ordered data, and meet SLAs and other contractual agreements should an outage occur.

    Healthcare Hiccups

    IT outages in healthcare are unique in that they can present threats to human life. While outage-related fatalities are fortunately rare (life-sustaining systems are a first priority for backup generators), they can cause a massive disruption to electronic record-keeping, scheduling, medication delivery, and business operations.

    When data centers go offline, most hospitals must revert to paper systems — are you prepared to rewind the technological clock?

    This past July, BJC Healthcare, the largest hospital group in the St. Louis area, experienced an outage that lasted nearly 20 hours. According to the St. Louis Post-Dispatch, 13 hospitals were unable to access electronic medical records, email, and registration/scheduling systems. While paper systems were adequate, hospital transfers were a nightmare. The same happened to the Boston Children’s Hospital back in March, as the Boston Globe reports.

    According to a 2013 Ponemon Institute and Emerson Network Power report, discussed in Healthcare IT News, these outages cost an average of $9,000 per minute — saving human lives should never be that expensive.

    Diagnosing Risk

    As companies come to rely on data and IT systems completely, the consequences of downtime become even more grave. Before a disaster strikes, companies must thoroughly analyze and diagnose their risk. Only by evaluating their IT systems with analytics can companies determine the cost of a worst-case scenario — and therefore the benefit of preventative IT spends.

    But when an unavoidable outages occurs, you need information at the ready that will give you the optimal solution that’s needed to minimize damage and protect revenue. Through a strong metrics management strategy and a TeamQuest business value dashboard, you’ll have the visibility across infrastructure and operations needed to make the tough decisions, whenever and wherever they need to be made.

    (Main image credit: Scott Beale/flickr)

    Category: capacity-planning