If you are tasked with finding the best tool for optimally configuring systems for a project within your IT organization, you have your work cut out for you. There are many companies out there offering "capacity planning" tools, and it isn't obvious based on their marketing literature which ones are really worth considering.
Anybody with a dartboard can claim ownership of a server capacity planning tool. Unfortunately, companies selling dartboards for capacity planning aren't likely to be very honest about the sophistication of their tools. It's caveat emptor, let the buyer beware.
If you have an unlimited IT budget, dartboards are fine capacity planning tools, but the rest of us need accurate predictions of future workload needs. Accuracy in capacity planning helps balance IT risk and health -- too much risk means you could lose revenue and customers from downtime, but you also don’t want to achieve IT health through overspending. A good capacity planning tool will provide the appropriate metrics for both these factors.
So how do you avoid buying a dartboard when what you really need are accurate predictions? What makes a good capacity planning tool? What should a buyer in your particular position be looking for in a capacity planning tool?
Before we answer those questions, let's start by looking at the tools and methods that simply won’t do the job -- what is NOT good enough for capacity planning.
First off, monitoring performance doesn’t actually constitute capacity planning. Whereas planning involves proactive provisioning for future demand, getting an alarm 15 minutes before users complain can only enable you to move reactively. Monitoring and alarming are essential components for capacity management, but they don’t do enough on their own to be considered an actual form of capacity planning.
Many capacity planners keep historical data and use it to plot trends, hoping those trends can accurately predict future performance. While this strategy is better than nothing, you should beware of companies that tout trending as capacity planning: it’s far from an optimal solution in most situations.
Because this strategy assumes that workloads increase at a steady rate, or at a rate based on some statistical formula that isn’t necessarily accurate, trending often collapses when it comes time to add a new workload or consolidate servers. It also falsely assumes that computer system performance is linear, when in reality, a capacity planning tool needs to assess more than just past system performance in order to make accurate predictions about the future.
In order to account for the fact that performance doesn’t scale linearly, those who use trending as a capacity planning tool simply keep utilization rates far from 100%. This kind of inexact estimation is almost guarantee over-provisioning, and companies who do this are often oblivious to the fact that they’re wasting money and hamstringing their IT department.
You should also be wary of tools that do "capacity planning" for server consolidation by adding together the resource utilization of each of the workloads being considered for consolidation. After normalizing CPU utilization to account for differences in computing capability, the utilization for each workload is added together to determine how much of the target CPU will be utilized after consolidation. A similar calculation is performed for other resources such as memory, I/O, and the network.
This kind of simplistic procedure can be effective enough to find potential candidates for consolidation, but it leaves way too much out of the equation to be solely responsible for the final decision when consolidating important workloads. The indicators of IT health and risk you would use with this method would be artificial and not based on actual behavior -- this will inevitably lead to over-provisioning, outages, and other IT unpleasantness.
You need a tool that understands the details of your server architecture, your applications' use of that architecture, and how workloads will interact when consolidated.
Now that we've said so much about what isn't a good capacity planning tool, you're probably wondering what we would actually recommend.
It’s always been true that the best capacity planning tools use some sort of performance modeling, but due to the time-intensive nature of the traditional modeling, capacity planners generally only use it for highly critical applications. Thankfully, automated predictive analytics have made it possible to do this kind of advanced capacity planning across your entire infrastructure, not just with the most important servers.
Sometimes when people talk about a "model," they mean a description or diagram. That's not the kind of model we are talking about in this case. For sure, you need a description of the systems involved, but that description is really just one step in a good capacity planning process. What you want is a tool that can look at that description along with information regarding the incoming workloads, and predict how the systems will perform.
There are at least two methods used by capacity planning tools that use modeling to predict performance: simulation modeling and analytic modeling. A good simulation modeling tool will create a queuing network based on the system being modeled and pretend to run the incoming workloads on that network.
These simulations can be very accurate, but a lot of work is needed to adequately describe the systems with enough detail to produce dependable results. For this reason, it makes plenty of sense to use flexibility simulation models to plan for those “what-if” scenarios. For example, you might use it to determine how long a proposed investment in CPU infrastructure will last so that you can construct a business case for management.
This is still the preferred method for networks, but it’s so resource-intensive that it’s practically impossible to use as a capacity planning tool for servers. For your most critical capacity planning needs, you’ll need something that utilizes queueing theory.
Simple Queuing Network
While analytic modeling also takes queuing into account, it doesn’t pretend to run the incoming workloads on the model. Instead, in a good analytic modeling tool, formulas based on queuing theory are used to mathematically calculate processing times and delays. This type of modeling is much quicker and not nearly as tedious to set up, and the results can be just as accurate as with simulation modeling.
Still, it is very important to pick the right data to model and to ensure that it represents the appropriate situations. When the process of selecting and contextualizing data isn’t automated, you must rely heavily on the skills of the analyst doing the work and run the risk of making mistakes. Without automation, it becomes extremely easy to miss important data and get inaccurate projections of future needs.
To get the most out of your capacity planning tool, you’ll want it working for you 24/7. Automated predictive analytics are what capacity planners will find most useful in their day-to-day monitoring of applications and systems, keeping a constant eye on the predicted performance of large numbers of infrastructure elements serving applications and business services.
If you really want to cover your bases, get a tool that can do both analytic and simulation modeling. And check that your capacity planning tool vendor isn't misusing the term "analytic" when making claims for their tool. You want to be sure that the tool you pick uses sound methods based on queuing theory to make its calculations, not something more closely resembling the less accurate capacity planning techniques described earlier in this article.
What makes a good capacity planning tool? Any advanced method of capacity should take a number of complex factors into account, and we’ve summarized a few of the most important criteria in our Buyers Guide: Capacity Planning Tool Checklist. Be sure to check it out!