Windows 10 VMs generating storage errors on 2019 Hyper-V cluster

Joe Bandura 31 Reputation points
2021-02-10T15:28:26.043+00:00

We have a cluster of 4 Windows Server 2019 Hyper-V host servers. These servers are attached to FC storage and make use of CSV with NTFS. Networking consists of 10GB Enet using SET teams. This particular cluster hosts almost 200 VM's of which about 45 are Windows 10, generation 2 VM's. The rest of the VM's are either 2012R2, 2016 or 2019 servers.

We recently started monitoring the Microsoft\Windows\Hyper-V-StorageVSP\Admin event log on each of the host servers and noticed that we were getting a LOT of error events like the one below...

Log Name: Microsoft-Windows-Hyper-V-StorageVSP/Admin
Source: Hyper-V-StorageVSP
Event ID: 8
Level: Error
User: SYSTEM
Message: Failed to map guest I/O buffer for write access with status 0xC0000044. Device name = C:\ClusterStorage\CSV1\WIN10VM1\Virtual Hard Disks\WIN10VM1.vhdx

If I look at the system event log of the VM WIN10VM1, we see lots of the following warnings that match up with the same times as the above events on the host server...

Log Name: System
Source: disk
Event ID: 153
Level: Warning
User: N/A
Message: The IO operation at logical block address 0x1a751f9 for Disk 0 (PDO name: \Device\0000002a) was retried.

This happens multiple times a day on all of the Windows 10 VM's. Obviously, across almost 45 VM's, the System log of each has a lot of the disk error event 153 in each of the logs, and of course, each host gets a lot of the previous event log messages as well. This appears to have been going on forever, and only seems to effect the Windows 10 VM's. None of the server VM's seems to be generating any of these warnings or causing any of the host level errors. As far as we can tell, it's not causing any problems with the functionality, but it's very disconcerting seeing disk level errors where I wouldn't otherwise expect it.

Anyone with any ideas, it would be helpful to know what's going on. Thanks.

Windows Server 2019
Windows Server 2019
A Microsoft server operating system that supports enterprise-level management updated to data storage.
3,484 questions
Hyper-V
Hyper-V
A Windows technology providing a hypervisor-based virtualization solution enabling customers to consolidate workloads onto a single server.
2,561 questions
Windows Server Clustering
Windows Server Clustering
Windows Server: A family of Microsoft server operating systems that support enterprise-level management, data storage, applications, and communications.Clustering: The grouping of multiple servers in a way that allows them to appear to be a single unit to client computers on a network. Clustering is a means of increasing network capacity, providing live backup in case one of the servers fails, and improving data security.
962 questions
{count} votes

3 answers

Sort by: Most helpful
  1. Xiaowei He 9,871 Reputation points
    2021-02-11T05:09:02.23+00:00

    Hi,

    In the error 153, please check the error's details, The details section of the event the log record will present what error caused the retry and whether the request was a read or write. Below is an example of the details output:

    66816-image.png

    https://learn.microsoft.com/en-us/archive/blogs/ntdebugging/interpreting-event-153-errors

    Besides, please check if there's any other errors in the cluster logs, such as CSV event 5120.

    Based on my experience, if there's no issue on the operation leave, please check if the FC HBA cards are up to date, if not, please update the HBA cards.

    Thanks for your time!
    Best Regards,
    Anne

    -----------------------------

    If the Answer is helpful, please click "Accept Answer" and upvote it.

    Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.


  2. Xiaowei He 9,871 Reputation points
    2021-02-23T05:39:00.73+00:00

    Hi,

    Thanks for your feedback, for error 8, I found the following information:

    70878-image.png

    https://learn.microsoft.com/en-us/openspecs/windows_protocols/ms-erref/596a1078-e883-4972-9bbc-49e60bebca55

    However, there's limited information about how to deal with it. Since the issue may be complex, it's recommended to open a case with MS for deep troublehsooting:

    Below is the link to open a case with MS:

    https://support.microsoft.com/en-us/gp/customer-service-phone-numbers

    Thanks for your time!
    Best Regards,
    Anne

    -----------------------------

    If the Answer is helpful, please click "Accept Answer" and upvote it.

    Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.


  3. stephc_msft 21 Reputation points
    2022-10-25T10:25:59.597+00:00

    Do you see .RCT files in the folder on the Hyper-V host where the VM's vhdx files are?
    If so that suggest RCT (resilient change tracking) is active as part of DPM or Veeam host level backups
    Which can also have negative impact on the VM's disk write performance