Windows 10 VMs generating storage errors on 2019 Hyper-V cluster

Question

We have a cluster of 4 Windows Server 2019 Hyper-V host servers. These servers are attached to FC storage and make use of CSV with NTFS. Networking consists of 10GB Enet using SET teams. This particular cluster hosts almost 200 VM's of which about 45 are Windows 10, generation 2 VM's. The rest of the VM's are either 2012R2, 2016 or 2019 servers.

We recently started monitoring the Microsoft\Windows\Hyper-V-StorageVSP\Admin event log on each of the host servers and noticed that we were getting a LOT of error events like the one below...

Log Name: Microsoft-Windows-Hyper-V-StorageVSP/Admin
Source: Hyper-V-StorageVSP
Event ID: 8
Level: Error
User: SYSTEM
Message: Failed to map guest I/O buffer for write access with status 0xC0000044. Device name = C:\ClusterStorage\CSV1\WIN10VM1\Virtual Hard Disks\WIN10VM1.vhdx

If I look at the system event log of the VM WIN10VM1, we see lots of the following warnings that match up with the same times as the above events on the host server...

Log Name: System
Source: disk
Event ID: 153
Level: Warning
User: N/A
Message: The IO operation at logical block address 0x1a751f9 for Disk 0 (PDO name: \Device\0000002a) was retried.

This happens multiple times a day on all of the Windows 10 VM's. Obviously, across almost 45 VM's, the System log of each has a lot of the disk error event 153 in each of the logs, and of course, each host gets a lot of the previous event log messages as well. This appears to have been going on forever, and only seems to effect the Windows 10 VM's. None of the server VM's seems to be generating any of these warnings or causing any of the host level errors. As far as we can tell, it's not causing any problems with the functionality, but it's very disconcerting seeing disk level errors where I wouldn't otherwise expect it.

Anyone with any ideas, it would be helpful to know what's going on. Thanks.

Answer

Do you see .RCT files in the folder on the Hyper-V host where the VM's vhdx files are?
If so that suggest RCT (resilient change tracking) is active as part of DPM or Veeam host level backups
Which can also have negative impact on the VM's disk write performance

Answer

Hi,

Thanks for your feedback, for error 8, I found the following information:

https://learn.microsoft.com/en-us/openspecs/windows_protocols/ms-erref/596a1078-e883-4972-9bbc-49e60bebca55

However, there's limited information about how to deal with it. Since the issue may be complex, it's recommended to open a case with MS for deep troublehsooting:

Below is the link to open a case with MS:

https://support.microsoft.com/en-us/gp/customer-service-phone-numbers

Thanks for your time!
Best Regards,
Anne

-----------------------------

If the Answer is helpful, please click "Accept Answer" and upvote it.

Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

Answer

Hi,

In the error 153, please check the error's details, The details section of the event the log record will present what error caused the retry and whether the request was a read or write. Below is an example of the details output:

https://learn.microsoft.com/en-us/archive/blogs/ntdebugging/interpreting-event-153-errors

Besides, please check if there's any other errors in the cluster logs, such as CSV event 5120.

Based on my experience, if there's no issue on the operation leave, please check if the FC HBA cards are up to date, if not, please update the HBA cards.

Thanks for your time!
Best Regards,
Anne

-----------------------------

If the Answer is helpful, please click "Accept Answer" and upvote it.

Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

Share via

Windows 10 VMs generating storage errors on 2019 Hyper-V cluster

3 answers