October 31st, 2007 by Lou
Consolidation invariably results in running more workloads on fewer machines. For most customers, this implies going from one workload or major application per machine to many workloads or applications per machine. This may imply more than one business unit or portfolio of applications sharing individual machine hardware in order to obtain the desired, cost-effective compression ratios. This has a number of implications for IT Service Management and operational capability. The technical nature of virtual machine technology as a lever with respect to consolidation nuances these issues in several ways.
Regardless of the technological tools leveraged to enable consolidation, there are certain invariant implications for operational capability with respect to consolidation when transitioning from one service per machine to many services per machine architectures. Current service capabilities must be examined, with particular focus on the following ITIL areas:
- Service Level Management
- Capacity Management
- Availability Management
- Service Desk and Incident Management
The implications of sharing machine resources on a single machine are the most readily recognized of the potential impacts from consolidation. For most customers, the conversations begin and revolve around questions like “How will consolidation impact the performance of my application? How can you make sure that other applications don’t hog the machine?”
The ability to consolidate workloads depends on two key principle mechanisms:
- The predictable resource requirements of a given set of application workloads.
- The ability to constrain or control resource use by a given application in the workload mix.
In the world before consolidation, isolation of workloads is effectively realized by the machine itself. A single application is on a single machine and can’t, for the most part, impact other applications in the data center on other machines. Of course, any resources that are shared, such as network bandwidth and SAN shared I/O may be impacted across machine boundaries, but even impacts to these resources is effectively capped by an individual application’s available CPU.
In the world after consolidation, this is no longer the case. It is no longer acceptable to allow individual applications to grow to the point where they consume all the resources on a machine over time, since the machine CPU, memory and other resources is now very much shared. Furthermore, there must be coordination of Capacity Management processes for multiple applications that generally is isolated in the systems management staff of individual applications prior to consolidation.
To digress a bit, the acceptability of allowing businesses to overrun “their machines” in pre-consolidation operations is another matter, but the prevailing wisdom in many enterprises is to leave it to individual business units to manage use of “their own” computing resources. Note that this attitude and approach re-enforces the unfortunate perspective that business units “own” the IT resources and machines they use, which makes driving consolidation initiatives from centralized IT organizations all the more difficult.
However, this also suggests that enhancing Capacity Management of existing machines and services prior to consolidation may increase the perception of IT as at least a partner in owning these resources, and thus may consequentially reduce this somewhat irrational barrier to consolidation. If the customer perceives that IT is measuring capacity only in preparation for consolidation, this merely adds fuel to the “leave my machine alone” fire. This may be the case even if IT has proactive monitoring in place if there has been no periodic, acknowledged reporting and interpretation of capacity metrics with the customer. Capacity measurement without a plan or reporting is not Capacity Management.
The practical implications of increased Capacity Management capability will vary, but the key change is more centralized, periodic management and reporting of Business and Service Capacity requirements and use.
Availability Management processes are responsible for technical architecture and the satisfaction of systemic qualities (”fit for service” in ITIL parlance). This covers areas like reliability, maintainability, serviceability and security as well as the more obvious meaning of “availability”.
The process of designing an implementing new services, and improving the systemic qualities of existing services, changes in a consolidated regime, since the prevailing or preferred mechanism for implementing new services involves provisioning virtual machines into existing infrastructure that may or may not involve additional hardware provisioning.
This is in direct contrast to single machine, single service regimes where the service can, and many times is, designed and provisioned, end-to-end with few implementation constraints. The need for documented design constraints for new services conforming to and suitable for provisioning in the consolidated service implies potential changes in the way new services are designed after consolidation, as well as clear communication and documentation circumscribing the correct use of the consolidated services. The resulting expectation is that IT should be involved earlier in the application design and procurement processes than may be the case prior to consolidation.
A more obvious availability implication for consolidated services is the impact of individual component outages in a consolidated environment: more services will be impacted by a component failure when more services depend on fewer components. “If you put all your eggs in one basket, it better be a pretty good basket.” This will generally result in a need or desire for higher availability designs for the consolidated regime versus the actual availability of the individual, stand-alone services prior to the consolidation. This will not only result in technical designs with higher availability, it should also result in enhanced operational capabilities and procedures to reduce the extent and duration of service outages when they do occur. The latter changes will, at minimum, affect Service Desk processes, procedures and operational capability.
Service Desk and Incident Management
Enhanced Availability Management can provide enhanced technical capability for consolidated architectures, but without adequate Service Desk capability, the full benefit of these capabilities will be unrealized. “If a tree falls in the forest and no one hears it, did it make a sound?” Providing the technical capability to ability to recognize component failures with out the timely operational capability to act on the event is useless.
Service impacting events in the consolidated service must be recognized by management and monitoring capabilities. The ability to recognize the events must be backed up in two ways: the ability to exercise Incident Management processes to rapidly mitigate the event or restore service, and the ability to adequately communicate the situation to service customers as required. Implementation of consolidated services will certainly affect Service Desk and Incident Management training and documentation, even if the core processes are sound and capable of handling the service management requirements.
Another pitfall to avoid is trading disjointed, diverse and distributed pre-consolidation Service Desk and Incident Management capability for no capability at all: if a physical machine or service has been colloquially or informally supported by a group separate from centralized IT Service Desk, the support capability must translate to a more centralized implementation in the consolidated regime. This will result in a shift of organizational roles, responsibility and staffing that should be reduced in aggregate over pre-consolidation levels.
Service Level Management
The establishment of a “services not servers” approach is frequently the most disruptive aspect of consolidation. The management of the service as a virtual machine tends to mask some of these implications, but they are not eliminated.
One of the great achievements of virtual machine technology in general is a significant reduction in the requirements for initial consolidation, including tools that can assist with the translation of physical machines into virtual machines in the consolidated environment. This relative transparency extends to the ongoing operation of the individual virtual machines, including such things as isolated, per virtual machine security and configuration.
Nonetheless, much like multi-tenant housing, you can’t operate consolidated infrastructure without a landlord. There must be over-arching IT business management of the consolidated service to coordinate all the other service management processes for the consolidated service, and to proxy the aggregate virtual machine customer demands into a cohesive consolidated service plan. Service Level Management is the interface between the consolidated service IT processes and the consolidated service customers.
The relative impact of this for consolidated services over per-consolidated administration depends greatly on several factors:
- the relative organizational centralization and homogeneity of the pre-consolidation server administration function
- the relative business acumen and capability of this pre-consolidation function
IT organizations that are exclusively focused on the technical aspects of machine administration are very likely to require additional operational capabilities in many IT service areas coordinated with enhanced Service Level Management processes. This can be generally characterized by being able to operate the consolidated service like a “business within a business”, with strong organizational knowledge of IT customer business requirements, and strong capabilities to translate these demands into useful, well managed consolidated service offerings over time.
A key red-flag for trouble to come in the area of IT Service Level Management is any consolidation plan attempting to leverage virtual machine technology as a “silver-bullet” for low utilization rates, without extending and requiring clear, committed and continued involvement and participation by the business units and customers affected by the plan.
Extending the landlord metaphor, a consolidated service without adequate Service Level Management will devolve into the IT equivalent of a slum. Whether or not the point of a consolidation is to refresh and update aging technology, particular care should be taken to refresh and update IT Service Management as a key component of the delivery.itil, itsm, vmware, xvm