The Queue Conundrum

    July 13, 2015

    By Vernon Johnson

    I.T. must know when, where and why service levels will degrade in the future.

    It’s the day before your holiday season. You leave work expecting a bit of traffic since everyone is leaving for vacation. But as you slow down to a crawl, you notice that all four lanes of traffic are congested. No one is moving. The road looks like a busy car lot. You wish you knew how long this would last. The way you’re feeling right now is probably the same way some of your customers feel when they experience a slowdown from your IT resources.

    Is There Going to Be a Traffic Jam?

    Well. Yes. But you already knew that. Can you tell when and where the traffic jam will happen in your environment? A good start is to look at your historical data. Surprisingly, only 13% of IT managers surveyed said they make predictions about future demand based on the historical performance of their IT services.

    You can collect historical data as granular as needed to make an accurate prediction. Some industries such as banking and gaming need to collect data in one second intervals to ensure they see the peaks and valleys where problems occur. Usually, the peaks and valleys are missed when making these predictions.

    Relate your view of traffic with that of a video racing game. Sometimes you can view the entire map with all the cars identified as bright dots moving through the course or you get the view that shows what’s directly in front of you. To get a better understanding of your queue, you probably want the arcade view of the entire road map.

    It’s the Queue!

    Predicting the right course of action is important to the business and over time, your accuracy can help build confidence with your customers. Remember the traffic analogy above?

    Simply moving your car from one lane to another may or may not help you get home quicker. More than likely, you’ll just watch other cars pass by after you move five feet in your new lane. Another queue! What’s a good source for accurately predicting or preventing queues?

    If you want to ensure you’re covered, try analytic and simulation modeling. A good simulation modeling tool will create a queuing network based on the system being modeled and pretend to run the incoming workloads on that network. These can be accurate, but a lot of work is necessary to adequately describe the systems with enough detail for the results to be dependable.

    Analytic modeling takes queuing into account without pretending to run the incoming workloads on the model. Formulas based on queuing theory are used to mathematically calculate processing times and delays.

    Check out, "Of Buses and Bunching: Strangeness in the Queue" for greater detail about queuing.

    While it’s great to know about the queue, wouldn’t you like to know how long you’ll be in queue so you can make an informed decision?

    Accurate Predictions Save Money and Time

    It’s been 30 minutes and the roads aren’t cleared. You’ve zigzagged through four lanes, only to advance two car lengths. Your vacation plans call for a check-in by 4 p.m., reservations at a posh restaurant at 6 p.m., and attending a sold-out performance at 8:30 p.m. It’s already 7:15 p.m. What will this cost you? Better yet, can you identify the source of your problem and how long you’ll have to wait? It’s time to talk about one method to identify the why, where and how long questions of your queue conundrum.

    But what if you knew how long you were going to be in the queue? You could have warned the hotel, the restaurant and even changed your tickets for another show. This one piece of information would have allowed you to make an informed decision about your future.

    And so it is the same in the IT world. What if you had one impact based metric that would allow you to make similar decisions?

    You can gather operational data and apply a sophisticated queuing model to generate an analysis of your production systems. TeamQuest refers to this as the TeamQuest Performance Indicator (TPI). It’s for system health, analysis and reporting.

    TPI can help you can identify when systems are getting stressed. You get an easy-to-understand indicator of your overall system health. The scale reads as follows:

    100 - You have open roads. 
    50 - A few lanes are busy, but you can maneuver around traffic.
    0 - You’re in your worse traffic-jam nightmare.

    One customer runs TPI reports to identify the top potential issues. Recently, after using TPI, the team noticed they had 180 out of 200 poorly performing virtual machines in a cluster. They were able to correct the issue before it dramatically affected their systems by reducing the number of underperforming virtual machines to four.

    Similar to wishing you had advanced knowledge of your traffic jam, you can have plenty of time to head off potential bottlenecks or outages with TPI. You can even drill down to get a predicted components of response time analysis showing you why server health is going to suffer in the future. If you had the ability to use this on the road, perhaps you would know that a three-car accident five miles ahead caused the traffic jam you’re in.

    This type of information could help you make alternate plans and have a better understanding of what decisions you should make in your environment … or your vacation plans.

    Predicting the right path to take is part art, part science. Your ability to make the best decision increases your chances of success. Your data center infrastructure, cloud included, is part of the highway you and your customers interact with. If the roads are clear of obstacles, the trip is easy-going and usually this means a positive experience for the trip home. If cluttered roads filled with bottlenecks are part of the experience, the outcome is less than optimal.

    Let us know how we can help you keep your roads open for your customers.