How do you calculate IT health and risk?
There are different methods you can use, depending on your needs.
The most common methods for determining IT infrastructure health are:
On the other hand, the most common methods for calculating IT infrastructure risk are:
There’s pros and cons to each of these methods. It’s hard to know which one to rely on. So consider this your guide to making health and risk calculations easy.
1. Threshold Comparison
Threshold comparison is all about using measurement statistics to set thresholds.
These measurement statistics typically include things like:
Once you determine your measurement statistics, your next step is setting up thresholds for these measurements. Your IT infrastructure health is determined by how many and which of these thresholds are exceed.
For instance, thresholds for CPU utilization are often greater than 70 percent. If that threshold is exceeded, your IT infrastructure health will be compromised.
On the pro side, threshold comparisons are:
The cons of threshold comparison are:
If you choose to monitor IT infrastructure health with thresholds, be wary. If you don’t set and monitor your thresholds, you could waste time on false positives of poor health—or not recognize truly unhealthy situations.
2. Enhanced Threshold Comparison
Enhanced threshold comparison is the next step up from threshold comparison.
These thresholds are usually more complicated. Multiple measurement statistics and formulas are used within a given threshold. And threshold severity is often introduced.
You’ll determine IT infrastructure health same way you do for threshold comparison: by looking at the number and type of thresholds that are exceeded. These thresholds just happen to be more complicated.
In the pro column, enhanced threshold comparisons are:
The cons of enhanced threshold comparisons are:
Using enhanced thresholds can minimize some of the inaccuracy of standard thresholds. But you’re still taking a risk with your IT infrastructure health.
3. Event Detection
Event detection utilizes alarms, alerts, and other techniques to recognize something noteworthy has occurred (i.e., an event). Thresholds, logs, and other types of data are typically used to recognize events.
You’ll determine IT infrastructure health based on which events have occurred—and how many.
On the pro side, event detection is:
The cons of event detection are:
Event detection is a great reactive measure for IT infrastructure health. But being proactive will require a better method.
4. Variation from Normal
Variation from normal is what it sounds like.
You define normal situations for IT infrastructure health. This is typically based on historical data and identification of “normal” behavior and events. So situations that vary from the set “normal” are considered unhealthy.
It’s a simple concept, but you will need some complex algorithms and the right resources to make these determinations.
On the pro side, variations from normal:
But the cons of the variation from normal method are that it:
Variation from normal can take longer to hit ROI than other methods. But it can also be more precise.
5. Allocation Comparison
Allocation comparison determines health by looking at the available capacity versus the capacity that has been allocated. If your allocated capacity of resources gets close enough to the available capacity of resources, then your infrastructure’s unhealthy.
On the pro side, allocation comparison:
But the cons of allocation comparison are:
It might be a good idea to use allocation comparisons if you’re an expert in your capacity already. But this might not be the right method for you if you lack that expertise.
6. Queuing Theory for Health
Queuing theory involves analysis on system utilization, throughput, queue length, and response time. These are typically based on the amount of work running on systems and system configuration.
You can use queuing theory to determine IT infrastructure health by measuring these components against your optimal scenario.
On the pro side, queueing theory for health:
But there are cons to queuing theory, like it:
Queuing theory for health might be a smart choice if you have the right resources to do analysis. (And if you’re more focused on CPU, IO, and system health.)
1. Linear Trending
Linear trending involves using historical data to create a trend line.
In terms of IT infrastructure risk, this means using that line to project future values of your historical data. This creates a simulation for when you’re established thresholds will be surpassed.
On the pro side, linear trending:
But there are cons to linear trending, like:
So using linear trending to predict IT infrastructure risk might be enough if your utilization is moderate-to-low.
2. Enhanced Trending
Enhanced trending uses basic algebraic quadratic functions. These usually take multiple statistics into account in one equation.
You can use enhanced trending to predict IT infrastructure risk. Here’s how. You can use quadratic function to project out future values from historical data to determine when your thresholds will be exceeded.
In the pro column, enhanced trending:
The cons of enhanced trending are that:
Enhanced trending can be a good choice—if you have the strong math resources in your IT department. Otherwise, it can be difficult to do well.
3. Event Projection
Event projection uses historical data about events to project when future events will occur.
You can use this to calculate IT infrastructure risk based on which events will occur and when. In this sense, event projection is similar to variation from normal.
On the pro side, event projection:
The cons of event projection are:
Event projection can work—if you’re okay with being reactive, rather than proactive, with IT risk.
4. Allocation Projection
Allocation projection determines risk by looking at the total amount of capacity available versus the allocated capacity.
So when the allocated amounts of capacity close enough to the available capacity of resources, risk is goes up.
On the pro side, allocation projection
The cons to allocation projection are:
If you’re primarily concerned about risk in virtualized environments or disk space, allocation projections might make sense for you.
5. Queuing Theory for Risk
The same metrics used in queuing theory for health—system utilization, throughput, queue length, and response time—are used for risk.
You can determine IT infrastructure risk by comparing the predicted values to what you need to maintain service levels.
On the pro side, queueing theory:
The cons of queueing theory for risk are:
Queuing theory for risk might make sense if you have the right resources to do the calculations.
There are many routes you can take to IT health and risk management. The right one will depend on your organization and your environment.
But it doesn’t need to be difficult to decide the right method.
Capacity management makes it easy to manage IT health and risk. Learn how. Download our white paper, Health and Risk: A New Paradigm for Capacity Management.