Azure VMware Solution private cloud and cluster concepts

Azure VMware Solution provides VMware-based private clouds in Azure. The private cloud hardware and software deployments are fully integrated and automated in Azure. Deploy and manage the private cloud through the Azure portal, CLI, or PowerShell.

A private cloud includes clusters with:

  • Dedicated bare-metal server hosts provisioned with VMware ESXi hypervisor
  • VMware vCenter Server for managing ESXi and vSAN
  • VMware NSX software-defined networking for vSphere workload VMs
  • VMware vSAN datastore for vSphere workload VMs
  • VMware HCX for workload mobility
  • Resources in the Azure underlay (required for connectivity and to operate the private cloud)

Private clouds are installed and managed within an Azure subscription. The number of private clouds within a subscription is scalable. Initially, there's a limit of one private cloud per subscription. There's a logical relationship between Azure subscriptions, Azure VMware Solution private clouds, vSAN clusters, and hosts.

The following diagram describes the architectural components of the Azure VMware Solution.

Diagram illustrating a single Azure subscription containing two private clouds for development and production environments.

Each Azure VMware Solution architectural component has the following function:

  • Azure Subscription: Provides controlled access, budget, and quota management for the Azure VMware Solution.
  • Azure Region: Groups data centers into Availability Zones (AZs) and then groups AZs into regions.
  • Azure Resource Group: Places Azure services and resources into logical groups.
  • Azure VMware Solution Private Cloud: Offers compute, networking, and storage resources using VMware software, including vCenter Server, NSX software-defined networking, vSAN software-defined storage, and Azure bare-metal ESXi hosts. Azure NetApp Files, Azure Elastic SAN, and Pure Cloud Block Store are also supported.
  • Azure VMware Solution Resource Cluster: Provides compute, networking, and storage resources for customer workloads by scaling out the Azure VMware Solution private cloud using VMware software, including vSAN software-defined storage and Azure bare-metal ESXi hosts. Azure NetApp Files, Azure Elastic SAN, and Pure Cloud Block Store are also supported.
  • VMware HCX: Delivers mobility, migration, and network extension services.
  • VMware Site Recovery: Automates disaster recovery and storage replication services with VMware vSphere Replication. Third-party disaster recovery solutions Zerto Disaster Recovery and JetStream Software Disaster Recovery are also supported.
  • Dedicated Microsoft Enterprise Edge (D-MSEE): Router that connects Azure cloud and the Azure VMware Solution private cloud instance.
  • Azure Virtual Network (VNet): Connects Azure services and resources together.
  • Azure Route Server: Exchanges dynamic route information with Azure networks.
  • Azure Virtual Network Gateway: Connects Azure services and resources to other private networks using IPSec VPN, ExpressRoute, and VNet to VNet.
  • Azure ExpressRoute: Provides high-speed private connections between Azure data centers and on-premises or colocation infrastructure.
  • Azure Virtual WAN (vWAN): Combines networking, security, and routing functions into a single unified Wide Area Network (WAN).

Hosts

Azure VMware Solution clusters are based upon hyper-converged infrastructure. The following table shows the CPU, memory, disk and network specifications of the host.

Host Type CPU (Cores/GHz) RAM (GB) vSAN Cache Tier (TB, raw) vSAN Capacity Tier (TB, raw) Regional availability
AV36 Dual Intel Xeon Gold 6140 CPUs (Skylake microarchitecture) with 18 cores/CPU @ 2.3 GHz, Total 36 physical cores (72 logical cores with hyperthreading) 576 3.2 (NVMe) 15.20 (SSD) Selected regions (*)
AV36P Dual Intel Xeon Gold 6240 CPUs (Cascade Lake microarchitecture) with 18 cores/CPU @ 2.6 GHz / 3.9 GHz Turbo, Total 36 physical cores (72 logical cores with hyperthreading) 768 1.5 (Intel Cache) 19.20 (NVMe) Selected regions (*)
AV52 Dual Intel Xeon Platinum 8270 CPUs (Cascade Lake microarchitecture) with 26 cores/CPU @ 2.7 GHz / 4.0 GHz Turbo, Total 52 physical cores (104 logical cores with hyperthreading) 1,536 1.5 (Intel Cache) 38.40 (NVMe) Selected regions (*)
AV64 Dual Intel Xeon Platinum 8370C CPUs (Ice Lake microarchitecture) with 32 cores/CPU @ 2.8 GHz / 3.5 GHz Turbo, Total 64 physical cores (128 logical cores with hyperthreading) 1,024 3.84 (NVMe) 15.36 (NVMe) Selected regions (**)

An Azure VMware Solution cluster requires a minimum number of three hosts. You can only use hosts of the same type in a single Azure VMware Solution private cloud. Hosts used to build or scale clusters come from an isolated pool of hosts. Those hosts passed hardware tests and had all data securely deleted before being added to a cluster.

All the above Host Types have 100 Gbps network interface throughput.

(*) details available via the Azure pricing calculator.

(**) AV64 Prerequisite: An Azure VMware Solution private cloud deployed with AV36, AV36P, or AV52 is required prior to adding AV64.

Azure Region Availability Zone (AZ) to SKU mapping table

When planning your Azure VMware Solution design, use the following table to understand what SKUs are available in each physical Availability Zone of an Azure region.

Important

This mapping is important for placing your private clouds in close proximity to your Azure native workloads, including integrated services such as Azure NetApp Files and Pure Cloud Block Store (CBS).

The Multi-AZ capability for Azure VMware Solution Stretched Clusters is also tagged in the following table. Customer quota for Azure VMware Solution is assigned by Azure region, and you are not able to specify the Availability Zone during private cloud provisioning. An auto selection algorithm is used to balance deployments across the Azure region. If you have a particular Availability Zone you want to deploy to, open a Service Request with Microsoft requesting a "special placement policy" for your subscription, Azure region, Availability Zone, and SKU type. This policy remains in place until you request it be removed or changed.

SKUs marked in bold are of limited availability due to customer consumption and quota may not be available upon request. The AV64 SKU should be used instead when AV36, AV36P, or AV52 SKUs are limited.

AV64 SKUs are available per Availability Zone, the table below lists the Azure regions that support this SKU. For RAID-6 FTT2 and RAID-1 FTT3 storage policies, six and seven Fault Domains (FDs) are needed respectively, the FD count for each Azure region is listed in the "AV64 FDs Supported" column.

Azure region Availability Zone SKU Multi-AZ SDDC AV64 FDs Supported
Australia East AZ01 AV36P, AV64 Yes 5 (7 Planned H2 2024)
Australia East AZ02 AV36 No N/A
Australia East AZ03 AV36P, AV64 Yes 5 (7 Planned H2 2024)
Australia South East AZ01 AV36 No N/A
Brazil South AZ02 AV36 No N/A
Canada Central AZ02 AV36, AV36P No N/A
Canada East N/A AV36 No N/A
Central India AZ03 AV36P No N/A
Central US AZ01 AV36P No N/A
Central US AZ02 AV36 No N/A
Central US AZ03 AV36P No N/A
East Asia AZ01 AV36 No N/A
East US AZ01 AV36P Yes N/A
East US AZ02 AV36P, AV64 Yes 7
East US AZ03 AV36, AV36P, AV64 Yes 7
East US 2 AZ01 AV36, AV64 No 5 (7 Planned H2 2024)
East US 2 AZ02 AV36P, AV52, AV64 No 5 (7 Planned H2 2024)
France Central AZ01 AV36 No N/A
Germany West Central AZ01 AV36P Yes N/A
Germany West Central AZ02 AV36 Yes N/A
Germany West Central AZ03 AV36, AV36P Yes N/A
Italy North AZ03 AV36P No N/A
Japan East AZ02 AV36 No N/A
Japan West AZ01 AV36 No N/A
North Central US AZ01 AV36, AV64 No 5 (7 Planned H2 2024)
North Central US AZ02 AV36P, AV64 No 5 (7 Planned H2 2024)
North Europe AZ02 AV36, AV64 No 5 (7 Planned H2 2024)
Qatar Central AZ03 AV36P No N/A
South Africa North AZ03 AV36 No N/A
South Central US AZ01 AV36, AV64 No 5 (7 Planned H2 2024)
South Central US AZ02 AV36P, AV52, AV64 No 5 (7 Planned H2 2024)
South East Asia AZ02 AV36 No N/A
Sweden Central AZ01 AV36 No N/A
Switzerland North AZ01 AV36, AV64 No 7
Switzerland North AZ03 AV36P No N/A
Switzerland West AZ01 AV36, AV64 No 7
UAE North AZ03 AV36P No N/A
UK South AZ01 AV36, AV36P, AV52, AV64 Yes 7
UK South AZ02 AV36, AV64 Yes 7
UK South AZ03 AV36P, AV64 Yes 7
UK West AZ01 AV36 No N/A
West Europe AZ01 AV36, AV36P, AV52, AV64 Yes 5 (7 Planned H2 2024)
West Europe AZ02 AV36, AV64 Yes 7
West Europe AZ03 AV36P, AV64 Yes 5 (7 Planned H2 2024)
West US AZ01 AV36, AV36P No N/A
West US 2 AZ01 AV36 No N/A
West US 2 AZ02 AV36P No N/A
West US 3 AZ01 AV36P No N/A
US Gov Arizona AZ02 AV36P No N/A
US Gov Virginia AZ03 AV36 No N/A

Clusters

For each private cloud created, there's one vSAN cluster by default. You can add, delete, and scale clusters. The minimum number of hosts per cluster and the initial deployment is three.

You use vCenter Server and NSX Manager to manage most aspects of cluster configuration and operation. All local storage of each host in a cluster is under the control of VMware vSAN.

The Azure VMware Solution management and control plane have the following resource requirements that need to be accounted for during solution sizing of a standard private cloud.

Area Description Provisioned vCPUs Provisioned vRAM (GB) Provisioned vDisk (GB) Typical CPU Usage (GHz) Typical vRAM Usage (GB) Typical Raw vSAN Datastore Usage (GB)
VMware vSphere vCenter Server 8 28 915 1.1 3.9 1,854
VMware vSphere vSphere Cluster Service VM 1 1 0.1 2 0.1 0.1 5
VMware vSphere vSphere Cluster Service VM 2 1 0.1 2 0.1 0.1 5
VMware vSphere vSphere Cluster Service VM 3 1 0.1 2 0.1 0.1 5
VMware vSphere ESXi node 1 N/A N/A N/A 5.1 0.2 N/A
VMware vSphere ESXi node 2 N/A N/A N/A 5.1 0.2 N/A
VMware vSphere ESXi node 3 N/A N/A N/A 5.1 0.2 N/A
VMware vSAN vSAN System Usage N/A N/A N/A N/A N/A 5,458
VMware NSX NSX Unified Appliance Node 1 12 48 300 2.5 13.5 613
VMware NSX NSX Unified Appliance Node 2 12 48 300 2.5 13.5 613
VMware NSX NSX Unified Appliance Node 3 12 48 300 2.5 13.5 613
VMware NSX NSX Edge VM 1 8 32 200 1.3 0.6 409
VMware NSX NSX Edge VM 2 8 32 200 1.3 0.6 409
VMware HCX (Optional Add-On) HCX Manager 4 12 65 1 2.5 140
VMware Site Recovery Manager (Optional Add-On) SRM Appliance 4 12 33 1 1 79
VMware vSphere (Optional Add-On) vSphere Replication Manager Appliance 4 8 33 1 0.6 75
VMware vSphere (Optional Add-On) vSphere Replication Server Appliance 2 1 33 1 0.3 68
Total 77 vCPUs 269.3 GB 2,385 GB 30 GHz 50.4 GB 10,346 GB (9,032 GB with expected 1.2x Data Reduction ratio)

The Azure VMware Solution management and control plane have the following resource requirements that need to be accounted for during solution sizing of a stretched clusters private cloud. VMware SRM isn't included in the table since it currently isn't supported.

Area Description Provisioned vCPUs Provisioned vRAM (GB) Provisioned vDisk (GB) Typical CPU Usage (GHz) Typical vRAM Usage (GB) Typical Raw vSAN Datastore Usage (GB)
VMware vSphere vCenter Server 8 28 915 1.1 3.9 3,708
VMware vSphere vSphere Cluster Service VM 1 1 0.1 2 0.1 0.1 5
VMware vSphere vSphere Cluster Service VM 2 1 0.1 2 0.1 0.1 5
VMware vSphere vSphere Cluster Service VM 3 1 0.1 2 0.1 0.1 5
VMware vSphere ESXi node 1 N/A N/A N/A 5.1 0.2 N/A
VMware vSphere ESXi node 2 N/A N/A N/A 5.1 0.2 N/A
VMware vSphere ESXi node 3 N/A N/A N/A 5.1 0.2 N/A
VMware vSphere ESXi node 4 N/A N/A N/A 5.1 0.2 N/A
VMware vSphere ESXi node 5 N/A N/A N/A 5.1 0.2 N/A
VMware vSphere ESXi node 6 N/A N/A N/A 5.1 0.2 N/A
VMware vSAN vSAN System Usage N/A N/A N/A N/A N/A 10,722
VMware NSX NSX Unified Appliance Node 1 12 48 300 2.5 13.5 1,229
VMware NSX NSX Unified Appliance Node 2 12 48 300 2.5 13.5 1,229
VMware NSX NSX Unified Appliance Node 3 12 48 300 2.5 13.5 1,229
VMware NSX NSX Edge VM 1 8 32 200 1.3 0.6 817
VMware NSX NSX Edge VM 2 8 32 200 1.3 0.6 817
VMware HCX (Optional Add-On) HCX Manager 4 12 65 1 2.5 270
Total 67 vCPUs 248.3 GB 2,286 GB 42.3 GHz 49.1 GB 20,036 GB (17,173 GB with expected 1.2x Data Reduction ratio)

These resource requirements only apply to the first cluster deployed in an Azure VMware Solution private cloud. Subsequent clusters only need to account for the vSphere Cluster Service, ESXi resource requirements and vSAN System Usage in solution sizing.

The virtual appliance Typical Raw vSAN Datastore Usage values account for the space occupied by virtual machine files, including configuration and log files, snapshots, virtual disks and swap files.

The VMware ESXi nodes have compute usage values that account for the vSphere VMkernel hypervisor overhead, vSAN overhead and NSX distributed router, firewall and bridging overhead. These are estimates for a standard three cluster configuration. The storage requirements are listed as not applicable (N/A) since a boot volume separate from the vSAN Datastore is used.

The VMware vSAN System Usage storage overhead accounts for vSAN performance management objects, vSAN file system overhead, vSAN checksum overhead and vSAN deduplication and compression overhead. To view this consumption, select the Monitor, vSAN Capacity object for the vSphere Cluster in the vSphere Client.

The VMware HCX and VMware Site Recovery Manager resource requirements are optional Add-ons to the Azure VMware Solution service. Discount these requirements in the solution sizing if they aren't being used.

The VMware Site Recovery Manager Add-On has the option of configuring multiple VMware vSphere Replication Server Appliances. The previous table assumes one vSphere Replication Server appliance is used.

Sizing an Azure VMware Solution is an estimate; the sizing calculations from the design phase should be validated during the testing phase of a project to ensure the Azure VMware Solution is sized correctly for the application workload.

Tip

You can always extend the cluster and add additional clusters later if you need to go beyond the initial deployment number.

The following table describes the maximum limits for Azure VMware Solution.

Resource Limit
vSphere clusters per private cloud 12
Minimum number of ESXi hosts per cluster 3 (hard-limit)
Maximum number of ESXi hosts per cluster 16 (hard-limit)
Maximum number of ESXi hosts per private cloud 96
Maximum number of vCenter Servers per private cloud 1 (hard-limit)
Maximum number of HCX site pairings 25 (any edition)
Maximum number of HCX service meshes 10 (any edition)
Maximum number of Azure VMware Solution ExpressRoute linked private clouds from a single location to a single Virtual Network Gateway 4
The virtual network gateway used determines the actual max linked private clouds. For more information, see About ExpressRoute virtual network gateways
If you exceed this threshold use Azure VMware Solution Interconnect to aggregate private cloud connectivity within the Azure region.
Maximum Azure VMware Solution ExpressRoute port speed 10 Gbps (use Ultra Performance Gateway SKU with FastPath enabled)
The virtual network gateway used determines the actual bandwidth. For more information, see About ExpressRoute virtual network gateways
Maximum number of Azure Public IPv4 addresses assigned to NSX 2,000
Maximum number of Azure VMware Solution Interconnects per private cloud 10
Maximum number of Azure ExpressRoute Global Reach connections per Azure VMware Solution private cloud 8
vSAN capacity limits 75% of total usable (keep 25% available for SLA)
VMware Site Recovery Manager - Maximum number of protected Virtual Machines 3,000
VMware Site Recovery Manager - Maximum number of Virtual Machines per recovery plan 2,000
VMware Site Recovery Manager - Maximum number of protection groups per recovery plan 250
VMware Site Recovery Manager - RPO Values 5 min or higher * (hard-limit)
VMware Site Recovery Manager - Maximum number of virtual machines per protection group 500
VMware Site Recovery Manager - Maximum number of recovery plans 250

* For information about Recovery Point Objective (RPO) lower than 15 minutes, see How the 5 Minute Recovery Point Objective Works in the vSphere Replication Administration guide.

For other VMware-specific limits, use the VMware configuration maximum tool.

VMware software versions

Microsoft is a member of the VMware Metal-as-a-Service (MaaS) program and uses the VMware Cloud Provider Stack (VCPS) for Azure VMware Solution upgrade planning.

The VMware solution software versions used in new deployments of Azure VMware Solution private clouds are:

Software Version
VMware vCenter Server 7.0 U3o
VMware ESXi 7.0 U3o with TianfuCup HotPatch
VMware vSAN 7.0 U3
VMware vSAN on-disk format 15
VMware vSAN storage architecture OSA
VMware NSX 4.1.1
VMware HCX 4.8.2
VMware Site Recovery Manager 8.7.0.3
VMware vSphere Replication 8.7.0.3

The current running software version is applied to new clusters added to an existing private cloud.

Host maintenance and lifecycle management

One benefit of Azure VMware Solution private clouds is that the platform is maintained for you. Microsoft is responsible for the lifecycle management of VMware software (ESXi, vCenter Server, and vSAN) and NSX appliances. Microsoft is also responsible for bootstrapping the network configuration, like creating the Tier-0 gateway and enabling North-South routing. You’re responsible for the NSX SDN configuration: network segments, distributed firewall rules, Tier 1 gateways, and load balancers.

Note

A T0 gateway is created and configured as part of a private cloud deployment. Any modification to that logical router or the NSX edge node VMs could affect connectivity to your private cloud and should be avoided.

Microsoft is responsible for applying any patches, updates, or upgrades to ESXi, vCenter Server, vSAN, and NSX in your private cloud. The impact of patches, updates, and upgrades on ESXi, vCenter Server, and NSX has the following considerations:

  • ESXi - There's no impact to workloads running in your private cloud. Access to vCenter Server and NSX isn't blocked during this time. During this time, we recommend you don't plan other activities like: scaling up private cloud, scheduling or initiating active HCX migrations, making HCX configuration changes, and so on, in your private cloud.

  • vCenter Server - There's no impact to workloads running in your private cloud. During this time, vCenter Server is unavailable and you can't manage VMs (stop, start, create, or delete). We recommend you don't plan other activities like scaling up private cloud, creating new networks, and so on, in your private cloud. When you use VMware Site Recovery Manager or vSphere Replication user interfaces, we recommend you don't do either of the actions: configure vSphere Replication, and configure or execute site recovery plans during the vCenter Server upgrade.

  • NSX - The workload is impacted. When a particular host is being upgraded, the VMs on that host might lose connectivity from 2 seconds to 1 minute with any of the following symptoms:

    • Ping errors

    • Packet loss

    • Error messages (for example, Destination Host Unreachable and Net unreachable)

    During this upgrade window, all access to the NSX management plane is blocked. You can't make configuration changes to the NSX environment for the duration. Your workloads continue to run as normal, subject to the upgrade impact previously detailed.

    During the upgrade time, we recommend you don't plan other activities like, scaling up private cloud, and so on, in your private cloud. Other activities can prevent the upgrade from starting or could have adverse impacts on the upgrade and the environment.

You're notified through Azure Service Health that includes the timeline of the upgrade. This notification also provides details on the upgraded component, its effect on workloads, private cloud access, and other Azure services. You can reschedule an upgrade as needed.

Software updates include:

  • Patches - Security patches or bug fixes released by VMware

  • Updates - Minor version change of a VMware stack component

  • Upgrades - Major version change of a VMware stack component

Note

Microsoft tests a critical security patch as soon as it becomes available from VMware.

Documented VMware workarounds are implemented in lieu of installing a corresponding patch until the next scheduled updates are deployed.

Host monitoring and remediation

Azure VMware Solution continuously monitors the health of both the VMware components and underlay. When Azure VMware Solution detects a failure, it takes action to repair the failed components. When Azure VMware Solution detects a degradation or failure on an Azure VMware Solution node, it triggers the host remediation process.

Host remediation involves replacing the faulty node with a new healthy node in the cluster. Then, when possible, the faulty host is placed in VMware vSphere maintenance mode. VMware vSphere vMotion moves the VMs off the faulty host to other available servers in the cluster, potentially allowing zero downtime for live migration of workloads. If the faulty host can't be placed in maintenance mode, the host is removed from the cluster. Before the faulty host is removed, the customer workloads are migrated to a newly added host.

Tip

Customer communication: An email is sent to the customer's email address before the replacement is initiated and again after the replacement is successful.

To receive emails related to host replacement, you need to be added to any of the following Azure RBAC roles in the subscription: 'ServiceAdmin', 'CoAdmin', 'Owner', 'Contributor'.

Azure VMware Solution monitors the following conditions on the host:

  • Processor status
  • Memory status
  • Connection and power state
  • Hardware fan status
  • Network connectivity loss
  • Hardware system board status
  • Errors occurred on the disk(s) of a vSAN host
  • Hardware voltage
  • Hardware temperature status
  • Hardware power status
  • Storage status
  • Connection failure

Note

Azure VMware Solution tenant admins must not edit or delete the previously defined VMware vCenter Server alarms because they are managed by the Azure VMware Solution control plane on vCenter Server. These alarms are used by Azure VMware Solution monitoring to trigger the Azure VMware Solution host remediation process.

Backup and restore

Azure VMware Solution private cloud vCenter Server and HCX Manager (if enabled) configurations are on a daily backup schedule and NSX configuration has an hourly backup schedule. The backups are retained for a minimum of three days. Open a support request in the Azure portal to request restoration.

Note

Restorations are intended for catastrophic situations only.

Azure VMware Solution continuously monitors the health of both the physical underlay and the VMware Solution components. When Azure VMware Solution detects a failure, it takes action to repair the failed components.

Next steps

Now that you've covered Azure VMware Solution private cloud concepts, you might want to learn about: