server 2019
new cluster; was getting a lot of event 1237; got my ad admin to allow the cluster dnn to dynamically update it's ip address; event errors went away
but then my four nodes dropped like hot potatoes hours later
any cluster cmdlt (like get-clusterNode or even stop-cluster or remove-cluster) now toss error "The remote server has been paused or is in the process of being started." with "FullyQualifiedErrorId : ClusterSharingPaused"
get-wmiobject mscluster_resourcegroup -namespace "ROOT\MSCluster"
same error
netsh advfirewall firewall show rule name="windows management instrumentation (async-in)"
netsh advfirewall firewall show rule name="windows management instrumentation (wmi-out)"
netsh advfirewall firewall show rule name="windows management instrumentation (wmi-in)"
netsh advfirewall firewall show rule name="windows management instrumentation (dcom-in)"
all showed not enabled, so:
netsh advfirewall firewall set rule group="windows management instrumentation (wmi)" new enable=yes
but that didn't help, so:
net stop winmgmt /y
winmgmt /resetRepository
restart-computer
did this on all four nodes, still not able to even redo the cluster (remove-cluster fails)
when I could run "get-clusterNetwork", it's two 10gig cluster only, and one nic team 2gig none (side note for those who know, should this be set to "2", which is unsupported clientOnly?); and hidden Microsoft Failover Cluster Virtual Adapter is up:
get-netAdapter -includeHidden | where {$_.interfaceDescription -match 'failover'}
but then I noticed that one node had a different name for that virtual adapter, so tried:
get-netAdapter -includeHidden -name "Local Area Connection* 11" | rename-netAdapter -newName "Local Area Connection* 1
but got error about "name already existed", but it doesn't (this returns nothing):
get-netAdapter -includeHidden -name "Local Area Connection* 1"
unsure if that matters, and any help or pointing me in a direction would be appreciated (again, I suspect my trouble began when addns allow ip update was added, but it solved my event 1237, just maybe caused inter node communication problems, like now no nodes receive heartbeat; i.e. a lot of event 1650 now: oscillating from missed heartbeat, established UDP connection, lost UDP connection)