Share an Azure managed disk

Azure shared disks is a new feature for Azure managed disks that allows you to attach a managed disk to multiple virtual machines (VMs) simultaneously. Attaching a managed disk to multiple VMs allows you to either deploy new or migrate existing clustered applications to Azure.

How it works

VMs in the cluster can read or write to their attached disk based on the reservation chosen by the clustered application using SCSI Persistent Reservations (SCSI PR). SCSI PR is an industry standard leveraged by applications running on Storage Area Network (SAN) on-premises. Enabling SCSI PR on a managed disk allows you to migrate these applications to Azure as-is.

Shared managed disks offer shared block storage that can be accessed from multiple VMs, these are exposed as logical unit numbers (LUNs). LUNs are then presented to an initiator (VM) from a target (disk). These LUNs look like direct-attached-storage (DAS) or a local drive to the VM.

Shared managed disks do not natively offer a fully managed file system that can be accessed using SMB/NFS. You need to use a cluster manager, like Windows Server Failover Cluster (WSFC) or Pacemaker, that handles cluster node communication and write locking.

Limitations

Enabling shared disks is only available to a subset of disk types. Currently only ultra disks and premium SSDs can enable shared disks. Each managed disk that have shared disks enabled are subject to the following limitations, organized by disk type:

Ultra disks

Ultra disks have their own separate list of limitations, unrelated to shared disks. For ultra disk limitations, refer to Using Azure ultra disks.

When sharing ultra disks, they have the following additional limitations:

Shared ultra disks are available in all regions that support ultra disks by default, and do not require you to sign up for access to use them.

Premium SSDs

  • Currently limited to Azure Resource Manager or SDK support.
  • Can only be enabled on data disks, not OS disks.
  • ReadOnly host caching is not available for premium SSDs with maxShares>1.
  • Disk bursting is not available for premium SSDs with maxShares>1.
  • When using Availability sets and virtual machine scale sets with Azure shared disks, storage fault domain alignment with virtual machine fault domain is not enforced for the shared data disk.
  • When using proximity placement groups (PPG), all virtual machines sharing a disk must be part of the same PPG.
  • Only basic disks can be used with some versions of Windows Server Failover Cluster, for details see Failover clustering hardware requirements and storage options.
  • Azure Backup and Azure Site Recovery support is not yet available.

Regional availability

Shared premium SSDs are available in all regions that managed disks are available.

Operating system requirements

Shared disks support several operating systems. See the Windows or Linux sections for the supported operating systems.

Disk sizes

For now, only ultra disks and premium SSDs can enable shared disks. Different disk sizes may have a different maxShares limit, which you cannot exceed when setting the maxShares value. For premium SSDs, the disk sizes that support sharing their disks are P15 and greater.

For each disk, you can define a maxShares value that represents the maximum number of nodes that can simultaneously share the disk. For example, if you plan to set up a 2-node failover cluster, you would set maxShares=2. The maximum value is an upper bound. Nodes can join or leave the cluster (mount or unmount the disk) as long as the number of nodes is lower than the specified maxShares value.

Note

The maxShares value can only be set or edited when the disk is detached from all nodes.

Premium SSD ranges

The following table illustrates the allowed maximum values for maxShares by premium disk sizes:

Disk sizes maxShares limit
P15, P20 2
P30, P40, P50 5
P60, P70, P80 10

The IOPS and bandwidth limits for a disk are not affected by the maxShares value. For example, the max IOPS of a P15 disk is 1100 whether maxShares = 1 or maxShares > 1.

Ultra disk ranges

The minimum maxShares value is 1, while the maximum maxShares value is 5. There are no size restrictions on ultra disks, any size ultra disk can use any value for maxShares, up to and including the maximum value.

Sample workloads

Windows

Azure shared disks are supported on Windows Server 2008 and newer. Most Windows-based clustering builds on WSFC, which handles all core infrastructure for cluster node communication, allowing your applications to take advantage of parallel access patterns. WSFC enables both CSV and non-CSV-based options depending on your version of Windows Server. For details, refer to Create a failover cluster.

Some popular applications running on WSFC include:

Linux

Azure shared disks are supported on:

Linux clusters can leverage cluster managers such as Pacemaker. Pacemaker builds on Corosync, enabling cluster communications for applications deployed in highly available environments. Some common clustered filesystems include ocfs2 and gfs2. You can use SCSI Persistent Reservation (SCSI PR) and/or STONITH Block Device (SBD) based clustering models for arbitrating access to the disk. When using SCSI PR, you can manipulate reservations and registrations using utilities such as fence_scsi and sg_persist.

Persistent reservation flow

The following diagram illustrates a sample 2-node clustered database application that leverages SCSI PR to enable failover from one node to the other.

Two node cluster. An application running on the cluster is handling access to the disk

The flow is as follows:

  1. The clustered application running on both Azure VM1 and VM2 registers its intent to read or write to the disk.
  2. The application instance on VM1 then takes exclusive reservation to write to the disk.
  3. This reservation is enforced on your Azure disk and the database can now exclusively write to the disk. Any writes from the application instance on VM2 will not succeed.
  4. If the application instance on VM1 goes down, the instance on VM2 can now initiate a database failover and take-over of the disk.
  5. This reservation is now enforced on the Azure disk and the disk will no longer accept writes from VM1. It will only accept writes from VM2.
  6. The clustered application can complete the database failover and serve requests from VM2.

The following diagram illustrates another common clustered workload consisting of multiple nodes reading data from the disk for running parallel processes, such as training of machine learning models.

Four node VM cluster, each node registers intent to write, application takes exclusive reservation to properly handle write results

The flow is as follows:

  1. The clustered application running on all VMs registers the intent to read or write to the disk.
  2. The application instance on VM1 takes an exclusive reservation to write to the disk while opening up reads to the disk from other VMs.
  3. This reservation is enforced on your Azure disk.
  4. All nodes in the cluster can now read from the disk. Only one node writes back results to the disk, on behalf of all nodes in the cluster.

Ultra disks reservation flow

Ultra disks offer an additional throttle, for a total of two throttles. Due to this, ultra disks reservation flow can work as described in the earlier section, or it can throttle and distribute performance more granularly.

An image of a table that depicts the `ReadOnly` or `Read/Write` access for Reservation Holder, Registered, and Others.

Performance throttles

Premium SSD performance throttles

With premium SSD, the disk IOPS and throughput is fixed, for example, IOPS of a P30 is 5000. This value remains whether the disk is shared across 2 VMs or 5 VMs. The disk limits can be reached from a single VM or divided across two or more VMs.

Ultra disk performance throttles

Ultra disks have the unique capability of allowing you to set your performance by exposing modifiable attributes and allowing you to modify them. By default, there are only two modifiable attributes but, shared ultra disks have two additional attributes.

Attribute Description
DiskIOPSReadWrite The total number of IOPS allowed across all VMs mounting the share disk with write access.
DiskMBpsReadWrite The total throughput (MB/s) allowed across all VMs mounting the shared disk with write access.
DiskIOPSReadOnly* The total number of IOPS allowed across all VMs mounting the shared disk as ReadOnly.
DiskMBpsReadOnly* The total throughput (MB/s) allowed across all VMs mounting the shared disk as ReadOnly.

* Applies to shared ultra disks only

The following formulas explain how the performance attributes can be set, since they are user modifiable:

  • DiskIOPSReadWrite/DiskIOPSReadOnly:
    • IOPS limits of 300 IOPS/GiB, up to a maximum of 160K IOPS per disk
    • Minimum of 100 IOPS
    • DiskIOPSReadWrite + DiskIOPSReadOnly is at least 2 IOPS/GiB
  • DiskMBpsRead Write/DiskMBpsReadOnly:
    • The throughput limit of a single disk is 256 KiB/s for each provisioned IOPS, up to a maximum of 2000 MBps per disk
    • The minimum guaranteed throughput per disk is 4KiB/s for each provisioned IOPS, with an overall baseline minimum of 1 MBps

Examples

The following examples depict a few scenarios that show how the throttling can work with shared ultra disks, specifically.

Two nodes cluster using cluster shared volumes

The following is an example of a 2-node WSFC using clustered shared volumes. With this configuration, both VMs have simultaneous write-access to the disk, which results in the ReadWrite throttle being split across the two VMs and the ReadOnly throttle not being used.

CSV two node ultra example

Two node cluster without cluster share volumes

The following is an example of a 2-node WSFC that isn't using clustered shared volumes. With this configuration, only one VM has write-access to the disk. This results in the ReadWrite throttle being used exclusively for the primary VM and the ReadOnly throttle only being used by the secondary.

CSV two nodes no csv ultra disk example

Four node Linux cluster

The following is an example of a 4-node Linux cluster with a single writer and three scale-out readers. With this configuration, only one VM has write-access to the disk. This results in the ReadWrite throttle being used exclusively for the primary VM and the ReadOnly throttle being split by the secondary VMs.

Four node ultra throttling example

Ultra pricing

Ultra shared disks are priced based on provisioned capacity, total provisioned IOPS (diskIOPSReadWrite + diskIOPSReadOnly) and total provisioned Throughput MBps (diskMBpsReadWrite + diskMBpsReadOnly). There is no extra charge for each additional VM mount. For example, an ultra shared disk with the following configuration (diskSizeGB: 1024, DiskIOPSReadWrite: 10000, DiskMBpsReadWrite: 600, DiskIOPSReadOnly: 100, DiskMBpsReadOnly: 1) is charged with 1024 GiB, 10100 IOPS, and 601 MBps regardless of whether it is mounted to two VMs or five VMs.

Next steps

If you're interested in enabling and using shared disks for your managed disks, proceed to our article Enable shared disk