Improving Cloud Efficiency Using Model-Based Transaction-Aware Management

    One of the conditions of enterprise application (EA) scalability in the cloud is its deployment on a network-like architecture where each functional service of the EA is hosted on dedicated computers representing a functional cluster (typically called a server farm). In this case, servers were used for analytics, consolidation, data integration, web processing, business logic, data storage, data import/export, and printing. If EA deployments are done in this way, they are capable of supporting practically unlimited growth in a number of business users as they scale upwards. Even though the servers within each farm are functionally identical in the sense that they provide the same “menu” of services, you still have to balance the load to ensure system performance is acceptable to the business.

    Load balancing algorithms can either be round robin or based on an assessment of hardware metrics (CPU utilization, etc.). The approach discussed in this paper is based on business transaction metrics. This approach has a major impact on cloud profitability as it minimizes the number of servers assigned to applications without compromising transaction times.


    There are a number of definitions that have to be understood in this type of work.

    • Transaction - a request from an EA user to be processed by the system.
    • Transaction (response) time - time to process transaction by the application. This one is the most important for the business as it is what they understand best.
    • Transaction rate - the number of transaction requests submitted by one user during one hour.
    • Transaction service demand - the time interval a transaction was processed by a particular component of the infrastructure (network, hardware appliance, hardware server).
    • Transaction profile - a set of time intervals (service demands) a transaction was processed by the network as well as each hardware appliance and server it has visited while served by the application.
    • Workload - a flow of transactions generated by EA users.
    • Workload characterization - a specification of workload that includes a list of business transactions, the transaction rate and the number of users requesting each transaction.
    • Transaction stretch factor - a parameter defined by the formula:


    A scalable system has a stretch factor equal to 1 for all transactions.

    Application Modeling

    The EA in this case had a typical three-tier structure with web, application and database layers. The database tier included on-line analytical processing (OLAP) and relational databases (RDBMS). The customer requested an estimate of the number of hardware servers (and CPUs per server) required on each layer for an anticipated workload of 400 business users. Each server had 2x4-core CPUs and plenty of memory. The EA was a financial application running Oracle and making heavy use of an OLAP database. For this customer, the acceptable level of service was established as no more than a transaction time degradation of 7% while increasing the number of users up to 400. 

    Based on this criterion, analysis and modeling began. It is important to characterize the workloads correctly. Some transactions can be as long as 10,000 seconds while others are very short. Similarly, some transactions can be initiated by only one user and others by hundreds. In addition, some transactions follow a complex path as they progress from server to server. All this has to be fully understood in order to harness TeamQuest Predictor accurately.

    TeamQuest Predictor allows you to rapidly conduct a series of what-if scenarios showing the results when a different numbers of servers are placed on each layer.

    All analyzed deployments indicated a sufficiency of one web/application and one RDBM server. The difficulty in this case turned out to be precise modeling of the number of OLAP servers required to meet performance requirements.

    One model featured 40 OLAP servers. This maintained transaction time deterioration for 400 users at less than 7%, but it had a CPU utilization of only 36% for each OLAP server. In an effort to increase utilization, a model was created with 34 OLAP servers. Unfortunately, some transaction times deteriorated by as much as 18%.

    The cause of degradation of short transactions was found to be waiting in server queues until long transactions (like an OLAP calculation) released a CPU. This observation led to a hypothesis that segmentation of transactions based on hourly service demand by different groups and the processing of each group in dedicated OLAP servers might minimize the total number of OLAP servers required.

    Therefore, Grinshpan broke down all transactions into a group with low hourly service demands and another group with high hourly service demands. The modeled architecture with a segmented workload consisted of 1 OLAP server processing low demand workloads and 20 OLAP servers processing high demand workloads. This configuration delivered the same transaction times as a system with 40 OLAP servers handling non-segmented workloads. This study demonstrates that transaction- aware cloud management might deliver significant improvement of cloud profitability without any additional hardware platform investments. This met the customer’s 7% requirements.


    This method minimizes the number of servers assigned to an application without compromising transaction times. The approach assumes classification of transactions by groups depending on their hourly service demand and the processing of each group in dedicated servers. In this way, transaction-aware cloud management can deliver significant improvement in cloud profitability without any additional investments in the hardware platform.

    Further research of cloud management based on business transaction metrics is worthy of consideration as it might bring significant economic benefits to cloud providers and their customers. Using TeamQuest Predictor, this implementation brought the number of OLAP servers required down from 40 to 21 without compromising transaction times.

    To achieve success, however, the capacity planner needs to have accurate application models, workload specifications, transaction profiles, and the right tool to be able to model all these factors. One of the most important steps in model building is the precise specification of the model’s input data. Success or failure of any modeling project is largely defined by input data quality.

    TeamQuest Predictor gave Grinshpan what his client needed today as a model solver. Models can be assembled in very little time. What-if scenarios can be viewed and adjusted in seconds. Additionally, rich hardware libraries are being permanently updated by TeamQuest and they can be populated by users. Another feature of value is automatic generation of charts showing modeling results. This is backed up by agile, knowledgeable TeamQuest technical support.

    To learn more about queuing models for enterprise applications see: “Solving Enterprise Applications Performance Puzzles: Queuing Models to the Rescue,” by Leonid Grinshpan. It is available in bookstores on the web.

    You can also contact Leonid Grinshpan at