Performance Problem Detection, Diagnosis & Resolution

Answering angry calls about poor response time and firefighting on IT service performance issues is a thankless and stressful position in which to be. Such is the fate of IT organizations that simply react to issues as they arise and who fail to measure and monitor service performance to avoid problems.

A proactive position is much more efficient than a reactive one, since it prevents disruptive events, ensures a smooth-running data center, and eliminates time spent fighting performance fires. To be proactive is to measure service performance, watch for trends in usage, and identify potential problem areas so steps can be taken to resolve them before they impact users. Monitoring service performance also helps determine when hardware upgrades are required to maintain service levels.

TeamQuest Software Addresses Problem Detection and Resolution

TeamQuest Analyzer is a highly efficient tool for monitoring the performance of applications and services. You can easily see when an infrastructure component needs attention, and quickly drill down to the fine-grained details needed for root cause analysis.

alarm details screenshot
Click for larger view

This example shows the server 'monitor' is in critical status and has generated an alarm. The alarm details show that on 1/29/05 at 8:33 PM, CPU usage spiked over 90% busy.

 
cpu utilization screenshot
Click for larger view

Drilling down to more detail provides a chart that clearly shows the spike in CPU usage, noted by the area in green.

 
process detail screenshot
Click for larger view

Right-click the specific area on the chart to drill down into process detail to determine the cause of the spike.

 
process detail report screenshot
Click for larger view

The resulting report shows process detail at the time of the CPU spike. Clearly, 'adams' is running two processes that are consuming over 90% of the CPU.

 
cpu utilization screenshot
Click for larger view

After taking action (asking adams to kill his non-critical processes), CPU usage has dropped significantly and the server is again performing at acceptable levels.

Next Steps

Interested in problem detection and resolution?
Contact TeamQuest

Share