question

PavelNovak-0431 avatar image
0 Votes"
PavelNovak-0431 asked jiayaozhu-MSFT commented

Windows Server 2019 Failover Cluster

Hi,
I have problem with 2 node cluster (windows 2019 fully patched, vmware 6.7, configuration via best practices, storage are from Unity 400 over FC as RDM disks physical sharing over vmware paravirtual scsi bus). Cluster validation is ok.
I tried to completely destroy cluster and build a new, install SQL. It all looks fine till VMs restart. After that I'm having lot of errors in cluster log again.
I can move roles from one to second node without trouble. Only visible problem is lot of errors and critical errors in the log many times a day.
Any idea please?
Regards,
Pavel

common logs error:
The cluster service encountered an unexpected problem and will be shut down. The error code was '5050'.

Cluster node 'XXX-NEW' was removed from the active failover cluster membership. The Cluster service on this node may have stopped. This could also be due to the node having lost communication with other active nodes in the failover cluster. Run the Validate a Configuration wiza...

The cluster Resource Hosting Subsystem (RHS) process was terminated and will be restarted. This is typically associated with cluster health detection and recovery of a resource. Refer to the System event log to determine which resource and resource DLL is causing the issue.

Cluster physical disk resource encountered an error while attempting to terminate.
Phsyical Disk Resource Name: XXX_XXX_F
Device Number: 3
Device Guid: {73990d27-a8ab-7cd3-2a27-f1cc6215dfcf}
Error Code: 2
Reason: OpenPartitionFailure

Cluster physical disk resource encountered an error while attempting to terminate.
Phsyical Disk Resource Name: XXX_XXX_I
Device Number: 6
Device Guid: {794ff3af-ac70-a968-5b4c-ff7e3c7ee6aa}
Error Code: 1168
Reason: ReleaseDiskPRFailure


windows-server-clustering
· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi,

Thanks for posting on our forum!

Based on your error messages, I think some of the resources on node 'XXX-NEW' has failed and this failure caused the lost communication between your node and RHS/cluster service. This is what the first three error messages convey to me. While for the error codes related to storage, after my research, this may be related with your VMware setting. Here is the blog I found in VMware community whose condition is similar to yours:
https://communities.vmware.com/t5/Availability-HA-FT-Discussions/Does-enabling-HA-reboots-VMs/td-p/2824826

Please note: Information posted in the given link is hosted by a third party. Microsoft does not guarantee the accuracy and effectiveness of information.

As a result, you need to contact supporters from Vmware directly/

in addition, from Microsoft's perspective, we recommend you to check the logs on node 'XXX-NEW', especially the logs correlated to storage. Go to %SystemRoot%\System32\winevt.

Thanks for your support!

BR,
Joan

0 Votes 0 ·

0 Answers