Deploying complex solutions into virtualized environments - or "To virtualize or not to virtualize"
A very active topic in the world of private cloud regards which applications should be virtualized and which applications need to run in a traditional, siloed datacenter environment. At this time it’s hard to come up with a definite “right answer” on either side of the debate. However, it’s something that you will definitely need to think about as you move forward to a cloud computing model for current and future application deployment.
To help you decide what the salient issues and considerations are for choosing the right deployment model, Ulrich Homann, Chief Architect in Microsoft Consulting Services, has written today’s blog post on this subject. Ulrich covers a number of critical topics in this discussion and I believe you’ll gain a lot of insight from what he has to day. Make sure to leave comments at the bottom of this blog post to keep the conversation running! Thanks! –Tom.
There are abundant and strong opinions about when it comes to deploying complex enterprise products, such as Microsoft Exchange or Microsoft SQL Server, into a virtualized environment. For many solution specialists the notion that a homogeneous resource infrastructure, that is not specialized or optimized exclusively for their solution, (resources comprised of compute, network and storage) exposed through a hypervisor is capable of hosting their complex solution with all of its requirements and constraints is ridiculous or even worse endangering the TCO of the solution.
On the other hand there are lot of proponents of the notion that even the most complex enterprise products can and should be virtualized as there are numerous benefits to be had (uniform hardware deployments, uniform automation of critical management tasks, consistent BC/DR approaches, etc.) across all applications or solutions that are deployed on the virtualized environment. As with all strong opinions, there is truth in both arguments. I am not here to defend either position, but would like to provide some thoughts about architecture options and a possible decision framework that helps you guide our customers down a solid path.
Think about the type of customer first
The decision to virtualize a complex enterprise product or not has to be decided based upon the customer scenario, scale of the to-be-deployed solution, and the all-up datacenter strategy of the given customer. In general there are two types of solution deployments or focus points that I consider in this context:
- The solution is either the sole function of the business or deployed standalone at such a scale that it utilizes a significant amount of the enterprises' resources. The Server and Tools Business (STB) refers to these kind of solution providers as cloud service vendors(CSV's) - Facebook, Bing, Exchange Online or a consolidated Exchange Deployment for the entire US Department of Defense (DoD) are great examples for this kind of solution. In such kind of solution there is minimum to zero COGS reduction through the usage of virtualization capabilities due the usage of physical hardware of the solution (most of the time 1:1 ratio between the virtual server and the host fully dedicated). The assumption is that the given application workload scales up well on physical hardware and that either all or at least a significant amount of your datacenter capability is consumed by that application workload.
- The solution is one of many that the given customer has to operate and while critical to the business isn't large scale enough to warrant specialized treatment. Most deployments we are involved in fall into that category. The predominant factor is an optimized integration into the all-up datacenter environment to minimize all-up operating costs across all solutions while obviously providing an environment that is capable of hosting the desired set of solutions.
The decision tree that the infrastructure and solution team members should walk ought to start with this step. While not the only factor, balancing these two ultimately conflicting goals is part of our job advising our customers. We should also consider customers asking us from different angles as we believe that application deployments will happen more often than not coinciding with virtualization or IaaS projects.
What's the deal with virtualization?
Most virtualization scenarios focus on consolidation of one or more resource islands onto ever more capable and homogeneous resource pools. While there are times and scenarios where such consolidation within complex solution would be feasible or even desirable - see the above case of CSV scenarios, virtualization enables far more important capabilities by breaking the tight coupling between hardware and software.
This de-coupling allows for far more dynamic behavior in both normal operations as well as emergency operations enabling rapid movement of solutions (one or more application workloads) hosted in virtual machines intact and to any of the available resources - analogous to the emergence of flat networks enabling any-to-any connectivity. Furthermore, it enables the ability to quickly introduce new and different hardware patterns as well as new and different software solution patterns independent and - mostly - without impact on each other.
Virtualization platforms have evolved from basic hypervisors providing shared access to hardware to advanced management solutions providing high availability, data protection, disaster recovery and other enterprise oriented features applied to the virtual machine, in many cases in an OS and application agnostic fashion. This opens the opportunity for customers to “standardize” on commodity virtualization capabilities (i.e. vMotion/Live Migration) for complex problems like high-availability, rather than have N number of HA solutions for N number of applications. Customers – and cloud data centers – are embracing the notion of standardization due to the simple fact that each time you need a different flavor of compute, storage, network configuration, OS, etc. you increase initial and ongoing (operational) cost, particularly over the lifetime of the system.
Automated resource provisioning
In addition to the virtues extolled above, more and more capabilities to automatically configure and manage resources are being primarily provided on virtual resources rather than physical. The emerging ideal in the datacenter world is a fairly static and 'flat' physical architecture for compute, storage and networking with all the variability and flexibility moved to the virtual level across the entire resource space.
Automation is going to become ever more critical as customers are continuing their relentless drive towards lower cost of ownership. Recycling physical resources is an expensive exercise that slows the adoption of new technologies. Moving to virtual resource pools and automated provision will provide the flexibility needed to upgrade quickly. Furthermore, a move to a virtualized, pooled and automated datacenter environment provides the pathway to rapidly deploying and utilizing more solutions as the traditional time-lag to between solution design and operational deployment shortens due to decoupling of hardware acquisition and deployment in the datacenter and the deployment of the solution.
Virtualization is also an excellent way to prepare solutions to be moved to the public cloud: from a virtualized on-premise solution is a short step towards IaaS on environment such as Windows Azure until such time a solution is either restructured as a fully cloud-enabled solution (PaaS or SaaS).
Is there a way to have your cake and eat it too?
Having established that there are very good reasons to drive towards a virtualized datacenter world even if at first glance it appears not to be sensible for complex solutions, it is now time to dive deeper into the considerations and some solution approaches to balance solution requirements and the desire to create an extremely standardized and automated environment.
Resource architecture - standardizing the way we describe resource requirements
Another advancement that virtualization is driving forward is standardization of key attributes required to design and deploy any solution: simple things like number and type of CPU's, network requirements, memory, etc. are now standardized as a virtual machine (VM) Resource Pool. Important concepts in a shared services or hosting model such as tenant are also being standardized. The SPF (solution provider foundation) effort by the System Center team is an important part of that standardization (RESTful/ODATA interfaces, runbooks to map standard concepts to Microsoft-specific ones, etc.). Any reference architecture work, such as the MCS PLA work, should express any resource requirements as VM specifications rather than physical resource descriptions.
Services-specific resource architecture & constraints
So far, so good. While standardization on the construct of one or more "VM" resource pools for the server resource requirements is necessary and an important step, it is not sufficient. We need to look at the entire solution across a number of dimensions in order to safely and successfully deploy complex solutions onto a virtualized-dominated datacenter world.
However, it would be too complex to introduce solution specific resource definitions for each and every solution that a given customer might deploy. We need to find a workable compromise that allows complex services to benefit from the virtualized and highly automated environment while at the same time ensuring optimal deployment for the solution requirements. After reviewing a number of complex solutions including SharePoint and Exchange, it appears that a number of dimensions have to be expressed and designed into any resource architecture that will host complex services:
- Hypervisor feature support – a better definition might be shared infrastructure: dynamic memory, high-availability and disaster recovery techniques such as live migration.
- Placement rules: certain scenarios, such as Microsoft Exchange, require 1:1 deployment between an Exchange server and a physical host. While it is permissible to deploy another workload to the same physical host, placing another Exchange server on the same physical host is not supported. While the product documentation will actually support placing more Exchange servers onto the solution, the recommended deployment strategy – due the nature of the built-in HA/DR architecture of Exchange – is to not deploy more than one Exchange server onto a physical host.
- Storage architecture: we need to be able to identify the storage type and architecture, e.g. DAS or SAN, for storage-intensive and sensitive workloads such as Exchange or SQL Server. While this requirement obviously goes against the entire ideal of virtualization and standardization, the real world is unfortunately not quite as advanced today.
- Storage IOPS: We also need to be able to provide the storage-sensitive VM with optimized ways of accessing the storage primarily guaranteeing IOPS. Currently the hyper-v solution does not provide storage QoS which would obviously be an elegant way to ensure the right level of IOPS support for any given workload.
- Network performance: similarly to storage, complex solutions have very specific requirements on network performance. The good news is that Windows Server 2012 provides ways to manage network performance either through QoS (ideal) or through SR-IOV (high-performance).
- Run state change management of mixed state environments, very common within existing complex services is especially complex where mixed stateful and stateless settings span across VMs (through roles) and within VMs (through files, registry). This aspect of complex solution management is beyond the scope of the proposal but something to consider: how to leverage IaaS optimizations offers on solutions running states?
With these constraints in mind, there are six categories or types of solution patterns emerging based upon close collaboration between the application workload architects driving the aforementioned PLA’s and the infrastructure architects driving and defining the IaaS PLA:
- The Messaging-category: Messaging is a major workload in most enterprises and has a number of constraints and rules when deploying in an IaaS-type environment.
· Hypervisor features: dynamic memory and hypervisor HA/DR features disabled
· Placement: 1:1 Exchange server and physical host deployment, VM's of other application types are OK
· DAS is the preferred storage recommendation because of presumed cost and data segmentation.
· Network QoS required
- The SQL OLTP-category: OLTP workloads will be a major category as a lot of MSFT and custom LOB solutions are typically dependent upon a well-functioning SQL Server backend infrastructure. SQL Server 2012 is very supportive of virtualization and generally assumes that deployments are virtual even for the heaviest workloads (in combination with Windows Server 2012 capabilities). While support has markedly improved, real-world SQL Server deployments also place constraints on the resource architecture:
· Storage IOPS exposure and guarantee
· Network QoS management required for high-end solutions such as SAP or SharePoint; usage of network offload technologies like SR-IOV highly recommended
· SQL-specific HA/DR approach (always-on) requires virtual cluster setup while not requiring host clustering
- The App Server category: in a lot of ways, the SQL OLTP requirements are a superset of the application server requirements. SharePoint app servers for scenarios such as Search, Profile Import and distributed cache are resource intensive like a SQL Server environment but differ as their HA/DR approach is based upon either being stateless (i.e. multiple VM’s capable of hosting the same application workload) or built-into the application in itself (e.g. distributed cache, Search).
- The RDS/VDI category: RDS and VDI are both workloads with extreme burst IO patterns for both storage and networks. While no specific constraints are known, large-scale deployments will do well to provide an IO-optimized environment for this workload. This application workload might also require specialized hardware such as RemoteFX capable video cards in the compute infrastructure.
- The File storage/VM category: the typical design patterns for virtualization should do fine for file storage and VM scenarios.
- The Media Server-category: Lync Media Control Units represent a major category of very network and CPU intensive workloads that require special consideration at the network layer.
· Network QoS required
· Hypervisor features: dynamic memory and hypervisor HA/DR features disabled
How would one represent or deploy a solution such as Microsoft SharePoint? SharePoint in this environment would be a composite solution deployed on two different resource pools – the web front-end and SharePoint application tiers would be deployed on a resource pool categorized as File storage/VM and the SQL Server tier would be deployed on a resource pool categorized as SQL OLTP.
The network connectivity between those two resource pools has to be such that the SQL OLTP resource pool is part of the same virtual network as the file storage/VM resource pool. Simplifications could obviously be made to opt for the more capable resource pool (in this case the SQL OLTP one) but that might drive costs up. If that kind of partitioning would be too complicated one can always opt to deploy the entire solution on the higher-capable resource pool.
There will be other workloads - such as BigData or datawarehousing - that will require specialization due to the nature of their access to resources. The key will be to identify the key constraints and consolidate the classes to the minimum possible in order to share potentially expensive resources across the largest possible set of applications.
Integrating this approach with System Center Virtual Machine Manager (SC VMM)
A key component in the emerging solution is SC VMM. SC VMM provides a couple of very useful features and concepts that allow us to describe the capabilities of the resource pools and isolate complex solutions for management and operations.
Host groups and meta-data/tags
Resource pools are ways to group and expose resources with similar capabilities. SC VMM allows to add meta-data to the resource pool definition. Automation solutions can utilize that kind of meta-data to enforce the constraints such as capability profiles of the underlying hosts, support for workload specific features such as RemoteFX GPU, etc.
SC VMM Cloud
The SC VMM cloud concept is primarily demonstrated as a BUIT solution providing delegated control over virtualized resources. While that is a key scenario, complex solutions - as outlined above - follow their own rules and should be administered by specialists that can enforce those rules and constraints. It would be useful to think through the usage of the cloud concept for this scenario and associate properly categorized resource pools to that specific cloud. This would provide the balance between standardization and control that most application owners desire.
Ultimately any datacenter effort is all about the solution - as our System Center marketing so nicely points out in their "it's all about the app" theme. However, we also have to take into account that customers run many solutions in their datacenters and it is our job to think through how we can deploy solutions to their full capability while at the same time driving down cost of ownership of the portfolio of solutions any customers operates.
As always - share your thoughts or showcases where you have already followed a similar approach. Looking forward to the conversation.
--- Uli & team
I hope you enjoyed Ulrich’s discussion and that you’ll weigh in on this topic in the comments section. I’ll make sure to forward these to the contributors – so comment early and often!