question

hunter3740-3805 avatar image
0 Votes"
hunter3740-3805 asked Dexter-5952 commented

all cluster cmdlts toss error "The remote server has been paused or is in the process of being started."

server 2019

new cluster; was getting a lot of event 1237; got my ad admin to allow the cluster dnn to dynamically update it's ip address; event errors went away

but then my four nodes dropped like hot potatoes hours later

any cluster cmdlt (like get-clusterNode or even stop-cluster or remove-cluster) now toss error "The remote server has been paused or is in the process of being started." with "FullyQualifiedErrorId : ClusterSharingPaused"

 get-wmiobject mscluster_resourcegroup -namespace "ROOT\MSCluster"

same error

 netsh advfirewall firewall show rule name="windows management instrumentation (async-in)"
 netsh advfirewall firewall show rule name="windows management instrumentation (wmi-out)"
 netsh advfirewall firewall show rule name="windows management instrumentation (wmi-in)"
 netsh advfirewall firewall show rule name="windows management instrumentation (dcom-in)"

all showed not enabled, so:

 netsh advfirewall firewall set rule group="windows management instrumentation (wmi)" new enable=yes

but that didn't help, so:

 net stop winmgmt /y
 winmgmt /resetRepository
 restart-computer

did this on all four nodes, still not able to even redo the cluster (remove-cluster fails)

when I could run "get-clusterNetwork", it's two 10gig cluster only, and one nic team 2gig none (side note for those who know, should this be set to "2", which is unsupported clientOnly?); and hidden Microsoft Failover Cluster Virtual Adapter is up:

 get-netAdapter -includeHidden | where {$_.interfaceDescription -match 'failover'}

but then I noticed that one node had a different name for that virtual adapter, so tried:

 get-netAdapter -includeHidden -name "Local Area Connection* 11" | rename-netAdapter -newName "Local Area Connection* 1

but got error about "name already existed", but it doesn't (this returns nothing):

 get-netAdapter -includeHidden -name "Local Area Connection* 1"

unsure if that matters, and any help or pointing me in a direction would be appreciated (again, I suspect my trouble began when addns allow ip update was added, but it solved my event 1237, just maybe caused inter node communication problems, like now no nodes receive heartbeat; i.e. a lot of event 1650 now: oscillating from missed heartbeat, established UDP connection, lost UDP connection)

windows-server-clustering
· 3
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

also FYI, getting a lot of event ID 1092, 1146 on two nodes, and 1573, and, 5398 on the other two nodes (and small amounts of 1070)

my witness shares are up (I have two, wondering if that was a mistake, but I can't remove one of them right now, so moot); so, tried to start one of the nodes (with event 1573) with forcedQuorum:

 start-clusterNode -FQ

and at least it didn't toss the error "The remote server has been paused or is in the process of being started." and result showed state "Joining" (yeah, hope), but then then must have gone down, as then running "get-clusterNode" returned my paused or is in the process of being started error; doh! I then ran "start-cluster", and it warned that one of the nodes, e.g. "three": WARNING: Cannot start service clussvc on computer 'three'. but then eventually my dreaded paused/starting error

0 Votes 0 ·

the idea after start clusterNode -fixQuorum would then be to run "start-clusterNode -PQ" (preventQuorum) on the other three nodes (but since FQ didn't work, didn't get around to PQ)

0 Votes 0 ·

For anyone that might read this later on like I did, I had same events in a test cluster just running on two VMs and no quorum disk. reboot both nodes at the same time destroys the cluster and gets stuck with those events.

  • shut node2 down.

  • On node1 run:
    net stop clussvc
    net start clussvc /fixquorum
    -Boot node2 back up again.

seemed to fix it, found the powershell start-ClusterNode -FQ commands etc. didnt work.

Then guess you troubleshoot the quorum disk see whats going on






0 Votes 0 ·

1 Answer

hunter3740-3805 avatar image
0 Votes"
hunter3740-3805 answered XiaoweiHe-MSFT commented

just redo the cluster (fixed for me); just say my nodes are called "one", "two", "three", and "four":

ran these from my quorum file share machine to one fell swoop the uninstall/install, where my four nodes have these two cmdlts already run:
set-item WSman:\localhost\client\trustedHosts -concatenate -value "myQuorumShareMachine"
enable-psremoting

 invoke-command -computerName one,two,three,four {remove-windowsFeature failover-clustering -restart}
 icm one,two,three,four {install-windowsFeature failover-clustering -includeManagementTools -restart}
 icm one,two,three,four {clear-clusterNode}

then on any node, in dsa.msc, remove my dnn (call it "myCluster"), add it back, and disable it (and side note: have the addns admin give it rights to dynamically update its ip address), and while in there, delete myCluster-CAU; note my intranet is the pair of non-dhcp 10gig connections (and my quorum share already has computer object "myCluster" allowed to write)

 new-cluster -name myCluster -node one,two,three,four -noStorage -ignoreNetwork 192.168.0.0/24,192.168.1.0/24 –managementPointNetworkType distributed
 add-CAUclusterRole -daysOfWeek saturday -weeksOfMonth 1 -requireAllNodesOnline -maxFailedNodes 1 -enableFirewallRules -CAUpluginName Microsoft.WindowsUpdatePlugin -virtualComputerObjectName myCluster-CAU -groupName myCluster-CAU
 set-clusterQuorum -fileShareWitness \\myQuorumShareMachine\witness$ -credential $(get-credential)
 get-clusterNetwork | ft name,address,role
 # set my two intranet 10gpbs to cluster only, and my nic team 2gbps lacp to none (no cluster traffic, just client traffic; unsure about using "unsupported" "2" for client only)
 (get-clusterNetwork "Cluster Network 1").role = 1
 (get-clusterNetwork "Cluster Network 2").role = 1
 (get-clusterNetwork "Cluster Network 3").role = 0

now get back my data volumes

 get-clusterAvailableDisk | add-clusterDisk
 get-clusterResource | where {$_.ownerGroup –eq "Available Storage" -and $_.name -ne "Cluster Virtual Disk (ClusterPerformanceHistory)"} | add-clusterSharedVolume

and if we're going down the rabbit hole, note I clean up some artifacting

 get-clusterResource
 # the only cluster disk that has ownerGroup as "Cluster Group" is the old cluster performance history disk (and per virtualDisk, there will be two cluPerfHis, but the old one will be opStatus detatched and healthStatus unknown)
 remove-clusterResource "Cluster Disk 20"
 get-virtualDisk "ClusterPerformanceHistory" | where {$_.healthStatus -eq "unknown"} | remove-virtualDisk
 # and per get resources above, cluster pool 1 is owned by SID/GUID
 move-clusterResource "Cluster Pool 1" -group "Cluster Group"
 remove-clusterGroup "51620a48-3f0c-4175-8ac5-7f3839e39a0a"
 # again per get resource, dnn isn't right
 (get-clusterResource "Cluster Name").name = "myCluster"
 get-clusterSharedVolume
 # they're all "Cluster Disk X", so match those up to old names:
 get-clusterSharedVolume | ft name,sharedVolumeInfo
 (get-clusterSharedVolume "Cluster Disk 1").name = "Cluster Virtual Disk (originalName)"
 # and the share's to those are gone, so get them back (note I create a main folder in each csv, for easy change of permissions and so peeps don't see sys vol info and recycle bin
 new-SMBshare -name originalName -path c:\clusterStorage\originalName\subfolder -fullAccess "myDomain\my ou admins",builtin\administrators -changeAccess "myDomain\originalName users"

some of the virtualDisks were inService, so nice function you can run to keep an eye on the "get-storageJob" function (but they were accessible, no users won't know); just ctrl + "c" to exit the function:

 function refreshVDSJST () { while($true) {get-virtualDisk | where {$_.healthStatus -eq "warning"} | ft; get-storageJob; icm nb-s2d4 `
 {get-scheduledTask -taskName "Data Integrity Scan for Crash Recovery" | where state -eq running} | ft; sleep -s 420; clear-host;} }
 refreshVDSJST


hope this helps someone!

· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Appreciate your sharing! Thanks.

Best Regards,
Anne

0 Votes 0 ·