Exception Management Is the Engine of Automated Analytics

    February 9, 2016

    By Dino Balafas

    Technology has had a massive impact on the speed and accuracy with which companies can find and identify problems. With the right tools, this process can be automatic, with exception management driving the machine.

    Tremendous improvements to IT technologies have enabled companies to create more value than ever from their IT strategies — at the highest end, automated IT systems become a true value-add. Still, many companies support immature IT processes, and are caught in a reactive cycle where customers, not systems, report the majority of service-level problems.

    What separates these two ends of the spectrum? The reality is that technological add-ons are only effective if they’re supported by a strong metrics management strategy. Automated analytics — which identify the health and risk levels of IT systems and remediate problems — can seem like magic, but they outperform traditional IT processes only because they make better use of a company’s “plumbing.”

    Exception Management and the IT Environment

    Underperforming IT strategies are rarely the victims of incompetence — rather, the scale of IT tends to increase faster than they can manage. The IT systems of today have reached staggering levels of depth and complexity, the outcome being that when problems arise, it’s too difficult and time-intensive for individual workers to hunt down root causes with any degree of success. The needle becomes harder and harder to find as the haystack becomes not only larger, but spread out across a disparate IT landscape.

    This is why customers tend to spot problems first, and why these teams struggle to meet the agility needs of the modern IT marketplace. This is the domain of algorithm-based exception management.

    The idea is that analytic software can identify when abnormal or “exceptional” processes occur, proactively alerting IT workers to their source or resolving the issue automatically. This effectively eliminates the need for IT workers to hunt down problems, or even worry much about them.

    Within the largest IT environments, which encompass thousands of servers with hundreds of thousands of individual entities, exception management with automated analytics is the only plausible way to tackle issues at scale.

    Yet, many companies still operate traditional, rules-based exception management programs, which flag any processes that deviate from pre-defined thresholds. An IT worker is left to adjust these values as they respond to problems. However, this strategy is vulnerable to any problems that spring up in new areas, where thresholds have not yet been set. Put another way, this practice is unable to predict problems before they occur.

    Tracking Patterns to Self-Define Exception

    The real power of automated analytics stems from the fact that machine learning techniques automatically identify and define patterns of “normalcy” for IT systems, self-setting exhaustive thresholds. In that way, automated analytics thresholds can be truly comprehensive, covering many more possibilities of error than traditional exception management strategies can. Moreover, they will automatically adjust thresholds on an ongoing basis.

    For organizations with larger IT environments, this is the only way to effectively manage the health and risk of IT infrastructure. As an added bonus, automated analytics programs display those two qualities as single numbers, drastically simplifying the process of problem resolution for IT workers, who can then focus the bulk of their attention on improving the overall performance of IT.

    Of course, exception management runs alongside the need for effective capacity management — as the demands on servers grow, so too must their ability to allocate workloads efficiently. Through similar pattern recognition techniques, automated analytics can simultaneously determine the most efficient operating configurations across services, making adjustments to account for present and predicted demand.

    If it seems like CIOs at most digitally mature companies have more time to focus on agility and meeting customer demands, it’s because they do — automated analytics afford them that time. In this way, the difference between mature and lagging IT strategies is often not one of skill, but of scale.

    (Main image credit: Kevin Utting/flickr)

    Tags: teamquest