I have a failover cluster with some clustered shared volumes
The vSAN supporting the shared volumes, was rebooted and was back up and running in about 1-2 minutes.
Naturally the cluster detected an issue, tried failing over various resources, but couldn't due to them all needing some storage from the rebooting vSAN.
Eventually in the cluster log I see
INFO [RCM] Will retry online from long delay restart of Cluster Disk 4 in 900000 milliseconds
and sure enough 15 minutes later the disk was retried, and came back online in the cluster.
In another datacentre, running the same setup and a similar operation. Reboot of the vSAN that was providing clustered shared volumes to another cluster, but in this case the cluster recovered itself in about 1m20s
So, questions:
Why would the "same" action in 2 different DCs, that are essentially running the same setup result in one having a minor blip, and the other waiting 15 minutes to recover?
Is this 900000ms (= 15minutes) configurable somewhere?