Virtual machine and disk performance

This article helps clarify disk performance and how it works when you combine Azure Virtual Machines and Azure disks. It also describes how you can diagnose bottlenecks for your disk IO and the changes you can make to optimize for performance.

How does disk performance work?

Azure virtual machines have input/output operations per second (IOPS) and throughput performance limits based on the virtual machine type and size. OS disks and data disks can be attached to virtual machines. The disks have their own IOPS and throughput limits.

Your application's performance gets capped when it requests more IOPS or throughput than what is allotted for the virtual machines or attached disks. When capped, the application experiences suboptimal performance. This can lead to negative consequences like increased latency. Let's run through a couple of examples to clarify this concept. To make these examples easy to follow, we'll only look at IOPS. But, the same logic applies to throughput.

Disk IO capping

Setup:

  • Standard_D8s_v3
    • Uncached IOPS: 12,800
  • E30 OS disk
    • IOPS: 500
  • Two E30 data disks × 2
    • IOPS: 500

Diagram showing disk level capping.

The application running on the virtual machine makes a request that requires 10,000 IOPS to the virtual machine. All of which are allowed by the VM because the Standard_D8s_v3 virtual machine can execute up to 12,800 IOPS.

The 10,000 IOPS requests are broken down into three different requests to the different disks:

  • 1,000 IOPS are requested to the operating system disk.
  • 4,500 IOPS are requested to each data disk.

All attached disks are E30 disks and can only handle 500 IOPS. So, they respond back with 500 IOPS each. The application's performance is capped by the attached disks, and it can only process 1,500 IOPS. The application could work at peak performance at 10,000 IOPS if better-performing disks are used, such as Premium SSD P30 disks.

Virtual machine IO capping

Setup:

  • Standard_D8s_v3
    • Uncached IOPS: 12,800
  • P30 OS disk
    • IOPS: 5,000
  • Two P30 data disks × 2
    • IOPS: 5,000

Diagram showing virtual machine level capping.

The application running on the virtual machine makes a request that requires 15,000 IOPS. Unfortunately, the Standard_D8s_v3 virtual machine is only provisioned to handle 12,800 IOPS. The application is capped by the virtual machine limits and must allocate the allotted 12,800 IOPS.

Those 12,800 IOPS requested are broken down into three different requests to the different disks:

  • 4,267 IOPS are requested to the operating system disk.
  • 4,266 IOPS are requested to each data disk.

All attached disks are P30 disks that can handle 5,000 IOPS. So, they respond back with their requested amounts.

Virtual machine uncached vs cached limits

Virtual machines that are enabled for both premium storage and premium storage caching have two different storage bandwidth limits. Let's look at the Standard_D8s_v3 virtual machine as an example. Here is the documentation on the Dsv3-series and the Standard_D8s_v3:

Chart showing D s v 3 specifications.

  • The max uncached disk throughput is the default storage maximum limit that the virtual machine can handle.
  • The max cached storage throughput limit is a separate limit when you enable host caching.

Host caching works by bringing storage closer to the VM that can be written or read to quickly. The amount of storage that is available to the VM for host caching is in the documentation. For example, you can see the Standard_D8s_v3 comes with 200 GiB of cache storage.

You can enable host caching when you create your virtual machine and attach disks. You can also turn on and off host caching on your disks on an existing VM.

Screenshot showing host caching.

You can adjust the host caching to match your workload requirements for each disk. You can set your host caching to be:

  • Read-only: For workloads that only do read operations
  • Read/write: For workloads that do a balance of read and write operations

If your workload doesn't follow either of these patterns, we don't recommend that you use host caching.

Let's run through a couple examples of different host cache settings to see how it affects the data flow and performance. In this first example, we'll look at what happens with IO requests when the host caching setting is set to Read-only.

Setup:

  • Standard_D8s_v3
    • Cached IOPS: 16,000
    • Uncached IOPS: 12,800
  • P30 data disk
    • IOPS: 5,000
    • Host caching: Read-only

When a read is performed and the desired data is available on the cache, the cache returns the requested data. There is no need to read from the disk. This read is counted toward the VM's cached limits.

Diagram showing a read host caching read hit.

When a read is performed and the desired data is not available on the cache, the read request is relayed to the disk. Then the disk surfaces it to both the cache and the VM. This read is counted toward both the VM's uncached limit and the VM's cached limit.

Diagram showing a read host caching read miss.

When a write is performed, the write has to be written to both the cache and the disk before it is considered complete. This write is counted toward the VM's uncached limit and the VM's cached limit.

Diagram showing a read host caching write.

Next let's look at what happens with IO requests when the host cache setting is set to Read/write.

Setup:

  • Standard_D8s_v3
    • Cached IOPS: 16,000
    • Uncached IOPS: 12,800
  • P30 data disk
    • IOPS: 5,000
    • Host caching: Read/write

A read is handled the same way as a read-only. Writes are the only thing that's different with read/write caching. When writing with host caching is set to Read/write, the write only needs to be written to the host cache to be considered complete. The write is then lazily written to the disk as a background process. This means that a write is counted toward cached IO when it is written to the cache. When it is lazily written to the disk, it counts toward the uncached IO.

Diagram showing read/write host caching write.

Let’s continue with our Standard_D8s_v3 virtual machine. Except this time, we'll enable host caching on the disks. Also, now the VM's IOPS limit is 16,000 IOPS. Attached to the VM are three underlying P30 disks that can each handle 5,000 IOPS.

Setup:

  • Standard_D8s_v3
    • Cached IOPS: 16,000
    • Uncached IOPS: 12,800
  • P30 OS disk
    • IOPS: 5,000
    • Host caching: Read/write
  • Two P30 data disks × 2
    • IOPS: 5,000
    • Host caching: Read/write

Diagram showing a host caching example.

The application uses a Standard_D8s_v3 virtual machine with caching enabled. It makes a request for 15,000 IOPS. The requests are broken down as 5,000 IOPS to each underlying disk attached. No performance capping occurs.

Combined uncached and cached limits

A virtual machine's cached limits are separate from its uncached limits. This means you can enable host caching on disks attached to a VM while not enabling host caching on other disks. This configuration allows your virtual machines to get a total storage IO of the cached limit plus the uncached limit.

Let's run through an example to help you understand how these limits work together. We'll continue with the Standard_D8s_v3 virtual machine and premium disks attached configuration.

Setup:

  • Standard_D8s_v3
    • Cached IOPS: 16,000
    • Uncached IOPS: 12,800
  • P30 OS disk
    • IOPS: 5,000
    • Host caching: Read/write
  • Two P30 data disks × 2
    • IOPS: 5,000
    • Host caching: Read/write
  • Two P30 data disks × 2
    • IOPS: 5,000
    • Host caching: Disabled

Diagram showing a host caching example with remote storage.

In this case, the application running on a Standard_D8s_v3 virtual machine makes a request for 25,000 IOPS. The request is broken down as 5,000 IOPS to each of the attached disks. Three disks use host caching and two disks don't use host caching.

  • Since the three disks that use host caching are within the cached limits of 16,000, those requests are successfully completed. No storage performance capping occurs.
  • Since the two disks that don't use host caching are within the uncached limits of 12,800, those requests are also successfully completed. No capping occurs.

Disk performance metrics

We have metrics on Azure that provide insight on how your virtual machines and disks are performing. These metrics can be viewed through the Azure portal. They can also be retrieved through an API call. Metrics are calculated over one-minute intervals. The following metrics are available to get insight on VM and Disk IO, and also on throughput performance:

  • OS Disk Queue Depth: The number of current outstanding IO requests that are waiting to be read from or written to the OS disk.
  • OS Disk Read Bytes/Sec: The number of bytes that are read in a second from the OS disk.
  • OS Disk Read Operations/Sec: The number of input operations that are read in a second from the OS disk.
  • OS Disk Write Bytes/Sec: The number of bytes that are written in a second from the OS disk.
  • OS Disk Write Operations/Sec: The number of output operations that are written in a second from the OS disk.
  • Data Disk Queue Depth: The number of current outstanding IO requests that are waiting to be read from or written to the data disk(s).
  • Data Disk Read Bytes/Sec: The number of bytes that are read in a second from the data disk(s).
  • Data Disk Read Operations/Sec: The number of input operations that are read in a second from data disk(s).
  • Data Disk Write Bytes/Sec: The number of bytes that are written in a second from the data disk(s).
  • Data Disk Write Operations/Sec: The number of output operations that are written in a second from data disk(s).
  • Disk Read Bytes/Sec: The number of total bytes that are read in a second from all disks attached to a VM.
  • Disk Read Operations/Sec: The number of input operations that are read in a second from all disks attached to a VM.
  • Disk Write Bytes/Sec: The number of bytes that are written in a second from all disks attached to a VM.
  • Disk Write Operations/Sec: The number of output operations that are written in a second from all disks attached to a VM.

Storage IO utilization metrics

Metrics that help diagnose disk IO capping:

  • Data Disk IOPS Consumed Percentage: The percentage calculated by the data disk IOPS completed over the provisioned data disk IOPS. If this amount is at 100%, your application running is IO capped from your data disk's IOPS limit.
  • Data Disk Bandwidth Consumed Percentage: The percentage calculated by the data disk throughput completed over the provisioned data disk throughput. If this amount is at 100%, your application running is IO capped from your data disk's bandwidth limit.
  • OS Disk IOPS Consumed Percentage: The percentage calculated by the OS disk IOPS completed over the provisioned OS disk IOPS. If this amount is at 100%, your application running is IO capped from your OS disk's IOPS limit.
  • OS Disk Bandwidth Consumed Percentage: The percentage calculated by the OS disk throughput completed over the provisioned OS disk throughput. If this amount is at 100%, your application running is IO capped from your OS disk's bandwidth limit.

Metrics that help diagnose VM IO capping:

  • VM Cached IOPS Consumed Percentage: The percentage calculated by the total IOPS completed over the max cached virtual machine IOPS limit. If this amount is at 100%, your application running is IO capped from your VM's cached IOPS limit.
  • VM Cached Bandwidth Consumed Percentage: The percentage calculated by the total disk throughput completed over the max cached virtual machine throughput. If this amount is at 100%, your application running is IO capped from your VM's cached bandwidth limit.
  • VM uncached IOPS Consumed Percentage: The percentage calculated by the total IOPS on a virtual machine completed over the max uncached virtual machine IOPS limit. If this amount is at 100%, your application running is IO capped from your VM's uncached IOPS limit.
  • VM Uncached Bandwidth Consumed Percentage: The percentage calculated by the total disk throughput on a virtual machine completed over the max provisioned virtual machine throughput. If this amount is at 100%, your application running is IO capped from your VM's uncached bandwidth limit.

Storage IO utilization metrics example

Let's run through an example of how to use these new Storage IO utilization metrics to help us debug where a bottleneck is in our system. The system setup is the same as the previous example, except this time the attached OS disk is not cached.

Setup:

  • Standard_D8s_v3
    • Cached IOPS: 16,000
    • Uncached IOPS: 12,800
  • P30 OS disk
    • IOPS: 5,000
    • Host caching: Disabled
  • Two P30 data disks × 2
    • IOPS: 5,000
    • Host caching: Read/write
  • Two P30 data disks × 2
    • IOPS: 5,000
    • Host caching: Disabled

Let's run a benchmarking test on this virtual machine and disk combination that creates IO activity. To learn how to benchmark storage IO on Azure, see Benchmark your application on Azure Disk Storage. From the benchmarking tool, you can see that the VM and disk combination can achieve 22,800 IOPS:

Screenshot of f i o output showing r=22.8k highlighted.

The Standard_D8s_v3 can achieve a total of 28,600 IOPS. Using the metrics, let's investigate what's going on and identify our storage IO bottleneck. On the left pane, select Metrics:

Screenshot showing Metrics highlighted on the left pane.

Let's first take a look at our VM Cached IOPS Consumed Percentage metric:

Screenshot showing V M Cached I O P S Consumed Percentage.

This metric tells us that 61% of the 16,000 IOPS allotted to the cached IOPS on the VM is being used. This percentage means that the storage IO bottleneck isn't with the disks that are cached because it isn't at 100%. Now let's look at our VM Uncached IOPS Consumed Percentage metric:

Screenshot showing V M Uncached I O P S Consumed Percentage.

This metric is at 100%. It tells us that all of the 12,800 IOPS allotted to the uncached IOPS on the VM are being used. One way we can remediate this issue is to change the size of our VM to a larger size that can handle the additional IO. But before we do that, let's look at the attached disk to find out how many IOPS they are seeing. Check the OS Disk by looking at the OS Disk IOPS Consumed Percentage:

Screenshot showing O S Disk I O P S Consumed Percentage.

This metric tells us that around 90% of the 5,000 IOPS provisioned for this P30 OS disk is being used. This percentage means there's no bottleneck at the OS Disk. Now let's check the data disks that are attached to the VM by looking at the Data Disk IOPS Consumed Percentage:

Screenshot showing Data Disk I O P S Consumed Percentage.

This metric tells us that the average IOPS consumed percentage across all the disks attached is around 42%. This percentage is calculated based on the IOPS that are used by the disks, and that aren't being served from the host cache. Let's drill deeper into this metric by applying splitting on these metrics and splitting by the LUN value:

Screenshot showing Data Disk I O P S Consumed Percentage with splitting.

This metric tells us the data disks attached on LUN 3 and 2 are using around 85% of their provisioned IOPS. Here is a diagram of what the IO looks like from the VM and disks architecture:

Diagram of Storage I O metrics example.