Agile Operations for the Software Defined Data Center and the Cloud

    By Bernd Harzog

    Let's take a look at the big picture. Virtualization and cloud computing are combining with the fields of agile development, agile support (DevOps), and agile operations to create significant new challenges. Large management frameworks have failed to keep up with the times. Therefore, IT is faced with a situation where the management stack needs to be completely rebuilt.

    How did this come to be? The demand for business functionality implemented in software is infinite and accordingly, so is the backlog. This is driving rapid innovation in software development tools and platforms which are now being deployed on scaled-out, open source, commodity deployment platforms. In this new world, applications are being distributed across data centers and private/public clouds, and virtualization of business critical and performance critical applications has reached critical mass with most enterprises operating more than one hypervisor.

    On top of this, rapidly changing applications are running on increasingly dynamic platforms. The sheer pace of change at all layers of the stack is changing monitoring and management into a big data problem. But note, too, that the trend toward submerging the infrastructure and managing it by policy driven automation means that only the applications will be left to manage.

    This convergence of multiple trends has accelerated the failure of existing management tools which will have to be replaced by a new paradigm.

    This is only becoming all the more apparent as the forthcoming software defined data center takes stage.

    So what will the new era be all about? Get ready for rapidly changing apps with the agility built into every step of the software development process. Get used to the fact that diverse languages with diverse runtimes will be running on next-generation deployment platforms. This new architecture is further evolving to one that operates on multiple virtualization platforms, on scaled-out commodity hardware, and is located in multiple clouds with multiple owners.

    Application Performance Management

    When virtualization began, it was all about the virtual server. It provided a neat way to consolidate workloads and cut costs. But virtualization is progressing far beyond that point and the current focus of VMware, for example, is to virtualize business critical apps. The company is making extraordinary progress in this regard. In just the last two years, for instance, SAP and Oracle have gone from 28% virtualized to 53% and 49%, respectively.

    As a consequence of all this change, the “Big Four” framework vendors (IBM, BMC, HP, CA) are outdated and in my opinion cannot be fixed.

    This method minimizes the number of servers assigned to an application without compromising transaction times. The approach assumes classification of transactions by groups depending on their hourly service demand and the processing of each group in dedicated servers. In this way, transaction-aware cloud management can deliver significant improvement in cloud profitability without any additional investments in the hardware platform.

    These frameworks made a promise that cannot be kept – that one product could monitor everything. They no longer are up to date with respect to the current environment, they stand no chance of catching up and keeping pace with innovation, they can neither develop nor acquire themselves out of this mess, and are stuck in a legacy business model that is out of step with how new constituencies want to buy management software – try it and buy it.

    These companies face the innovators dilemma. They have an outdated business model which they have to change. But they have so much vested interested in the old ways that if they create a new approach using a different software model, they would have to sell it at 10% of the price.

    CIOs can no longer go on living with what has evolved into a Franken-Monitor. Frameworks resulted in Franken-Monitors because people bought point tools to meet the needs that the framework could not meet. Those that did not buy frameworks ended up assembling their own Franken-Monitors by purchasing too many non integrated point tools.

    Franken-Monitors cannot cope with the SDDC and the cloud. They will be left behind in the transition to agile development and operations.

    Case in point: I came across a bank in the North East USA which ran a cloud with 40,000 virtual machines. That bank had 80,000 more physical servers it need to virtualize. It was attempting to manage them with over 100 monitoring tools, none of which did the job well.

    Managing the Software-Defined Data Center

    This method minimizes the number of servers assigned to an application without compromising transaction times. The approach assumes classification of transactions by groups depending on their hourly service demand and the processing of each group in dedicated servers. In this way, transaction-aware cloud management can deliver significant improvement in cloud profitability without any additional investments in the hardware platform.

    What does all this mean for performance management? It is going to be much harder for enterprises to evaluate what is really going on with app performance in the SDDC. After all, they now have the opportunity to mess up badly at the speed of light. In such a climate, change has to be driven by software-based policies and rules. Policy driven data center automation will make infrastructure latency and throughput into your most important metrics. If you don’t measure those directly, you have no hope of knowing what is going on.

    The SDDC and the cloud, therefore, require a brand new management stack. Frameworks and Franken-Monitors will be replaced by ecosystems of cooperating vendors anchored by common big data back ends. No one company has a management software architecture that can encompass all the necessary elements. What that calls for is obtaining all the data from the various tools and placing them into the same management store. That has to be a big data store as no relational DB can scale to manage it. It is quite possible that the Hadoop Distributed File System (HDFS) may become that big data store.

    But what about the rest of the management ecosystem? VMware and its vCloud will certainly play a big part in the evolving management stack. However, VMware has nothing at the Application Performance Management (APM) level and is absent in other key areas of the stack. That’s why we need an ecosystem of vendors to provide it.

    At the infrastructure performance layer, companies like TeamQuest and Virtual Instrument offer best-in-class solutions.

    And at the APM layer, AppDynamics, AppEnsure, and New Relic are at the forefront.

    The combined force of these best-in-class management tools working in concert has to be able to solve cloud, mobile and SDDC infrastructure performance management (IPM) in a scaled-out manner. Other functional requirements for this architecture are that it has to measure app response time and throughput accurately, require zero configuration, have broad application support, automatic application identification, automatic application topology discovery, be cloud ready, and supply automated answers to common problems. Further, this toolkit must be able to understand things at 1 second and 5 second intervals. And it has to be available to download and try before you buy.

    Therefore, if a consultant comes with the management box, my advice to you is to send it back. You should never need consultants to make it work. The software should be able to discover and reconfigure itself. If it’s a monitor, is should show latency and throughput and be able to find all apps and related elements.

    Building the New Management Stack

    In summary, the new stack must start with a management software architecture i.e., the pieces have to fit together correctly so we do not end up erecting more Franken-Monitors. The tool vendors should probably be forced to put their data into a common big data back end for cross domain analysis, and every tool must promote agile development and DevOps. That means no need for constant manual reconfiguration. Every piece should work in highly dynamic, abstracted, shared, and distributed environments (virtualization, SDDC, public clouds), and provide response time, latency, and throughput (not resource utilization) as the key performance metrics.