question

Pero-7573 avatar image
0 Votes"
Pero-7573 asked Pero-7573 commented

Hyper-v failover cluster VM resource in loop /pending/failure/starting

Hello all,

So we are facing very frustrating issue with Hyper-v Cluster. Issue happens randomly, we cannot get hang of it.

Just...like that, random VM or VMs cluster resources goes in let's say LOOP state. Pictures will say everything my words cannot.

36335-c1.png
36336-c2.png
36338-c3.png
36339-c4.png
36340-c5.png
36381-c6.png


And what you see in first 4 pictures goes in loop like 100 per minute. If you click on it Cluster management console will crash.

On the other side you can manage machine trough hyper-v console, but VM did reset few times during this error.

When I go trough logs I cannot figure out what could cause this, because VM is actually working (until it gets reset)

Log entry from: "Applications and Services Logs\Microsoft\Windows\Hyper-V-StorageVSP" last 9 entrys out of 1800 all made inside one second


1.)Storage device '\\?\UNC\NAMEsofs01\Cluster\VMs\XXX-PROD-WEB02\VHDs\XXX-prod-web02_sys.vhdx' received a recovery status notification. Current device state = Recoverable Error Detected, Last status = No Errors, New status = Disconnected.
2.)Storage device '\\?\UNC\NAMEsofs01\Cluster\VMs\XXX-PROD-WEB02\VHDs\XXX-prod-web02_sys.vhdx' changed recovery state. Previous state = Recoverable Error Detected, New state = Recoverable Error Detected.
3.)Storage device '\\?\UNC\NAMEsofs01\Cluster\VMs\XXX-PROD-WEB02\VHDs\XXX-prod-web02_sys.vhdx' received a recovery status notification. Current device state = Recoverable Error Detected, Last status = Disconnected, New status = No Errors.
4.)Storage device '\\?\UNC\NAMEsofs01\Cluster\VMs\XXX-PROD-WEB02\VHDs\XXX-prod-web02_sys.vhdx' changed recovery state. Previous state = Recoverable Error Detected, New state = No Errors.
5.)Storage device '\\?\UNC\NAMEsofs01\Cluster\VMs\XXX-PROD-WEB02\VHDs\XXX-prod-web02_sys.vhdx' received an IO failure with error = SRB_STATUS_ERROR_RECOVERY. Current device state = No Errors, New state = Recoverable Error Detected, Current status = No Errors.
6.)Storage device '\\?\UNC\NAMEsofs01\Cluster\VMs\XXX-PROD-WEB02\VHDs\XXX-prod-web02_sys.vhdx' received a recovery status notification. Current device state = Recoverable Error Detected, Last status = No Errors, New status = Disconnected.
7.)Storage device '\\?\UNC\NAMEsofs01\Cluster\VMs\XXX-PROD-WEB02\VHDs\XXX-prod-web02_sys.vhdx' changed recovery state. Previous state = Recoverable Error Detected, New state = Recoverable Error Detected.

8.)An I/O request for device '\\?\UNC\NAMEsofs01\Cluster\VMs\XXX-PROD-WEB02\VHDs\XXX-prod-web02_sys.vhdx' took 1216203 miliseconds to complete. Operation code = READ16, Data transfer length = 512, Status = SRB_STATUS_ABORTED. ###HERE VM has quit unexpectedly

9.)Storage device '\\?\UNC\NAMEsofs01\Cluster\VMs\XXX-PROD-WEB02\VHDs\XXX-prod-web02_sys.vhdx' received a recovery status notification. Current device state = Shutting Down, Last status = Disconnected, New status = No Errors.

This entry has timestamp when problem started 1:51:46. And there is no later logs of this kind but ClusterResurce was still in Loop like in first 4 pictures. And you cannot kill that loop.


Logo from Hyper-V worker:

1.)'name-prod-web02' was resumed from critical error. (Virtual machine ID 08BFD5A3-AF52-4F66-BD01-C635FED8F87A)
2.)'name-prod-web02' was paused for critical error. (Virtual machine ID 08BFD5A3-AF52-4F66-BD01-C635FED8F87A)
3.)'name-prod-web02' was resumed from critical error. (Virtual machine ID 08BFD5A3-AF52-4F66-BD01-C635FED8F87A)

and i circle 2000 logs in one second in time 1:51:46


Can you please give me some idea, directions, anything. Problem is random on windows and linux machines, and random nodes.

I will provide you with any additional info, I simply have nothing else to give from logs.

Pero


windows-server-hyper-vwindows-server-clustering
c1.png (4.8 KiB)
c2.png (7.0 KiB)
c3.png (7.0 KiB)
c4.png (6.5 KiB)
c5.png (40.6 KiB)
c6.png (148.4 KiB)
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

XiaoweiHe-MSFT avatar image
0 Votes"
XiaoweiHe-MSFT answered XiaoweiHe-MSFT commented

Hi,

According to your description, please check the following things:

  1. Please check if the CSV volumes store the VMs are running out of space.

  2. Please open computer management on each cluster nodes and check if the CLIUSR account is disabled or locked out:


36694-image.png

Thanks for your time!
Best Regards,
Anne


If the Answer is helpful, please click "Accept Answer" and upvote it.

Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.



image.png (174.1 KiB)
· 4
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Thank you for response.

  1. is definitely not the case.

  2. Now all of them are ok, but this is something I have to check when it happens again.


If you have any other ideas I will be thankful.



0 Votes 0 ·

Hi,

Based on my experience, for such a random issue, usually due to the above reasons, if they are not your case, and if the issue reoccurs, I would suggest you open a case with MS for deep troubleshooting. Since it's hard to troubleshoot random issues on the forum due to some limitations, thanks for your understanding.

Below is the link to open a case with MS:

https://support.microsoft.com/en-us/gp/customer-service-phone-numbers

Best Regards,
Anne

0 Votes 0 ·

Thank You,

I understand it is hard to troubleshoot. Just wanted to know if someone had similar issue.

Best regards,
Pero

0 Votes 0 ·
Show more comments
Eimantas-6223 avatar image
0 Votes"
Eimantas-6223 answered Pero-7573 commented

Hi,


Have a same problem on Hyper-V Failover Cluster within 3 Windows Server 2019 Standard nodes. Actually i think that happens when virtual machine disk resize. After some time VM shows in LOOP state. Pero-7573 have you contact with MS with that issue? If You get some answer from MS please post information there. Thanks.

· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi Eimantas,

It could be. It happened again yesterday with our exchange VM. Hosts are srv19 and it was day after virtual disk resize. In the end we migrated healthy vms to another host and then we killed host. After that, cluster resource was down and vm just disappeared in hyper-v (not vhds they were ok).

But I never contacted MS with that issue.

0 Votes 0 ·