Part 2: Datacenter Activation Coordination and the File Share Witness

In Part 1 of this series, I provided a high level overview of how Datacenter Activation Coordination (DAC) mode works and how the database mount process on startup is affected when DAC mode is enabled.  (http://blogs.technet.com/b/timmcmic/archive/2012/05/21/my-databases-do-not-automatically-mount-after-i-enabled-datacenter-activation-coordination.aspx)

 

Remember, with DAC mode enabled, different rules apply for mounting databases on startup. The starting DAG member must be able to participate in a cluster that has quorum, and it must be able to communicate with another DAG member that has a DACP value of 1 or be able to communicate with all DAG members listed on the StartedMailboxServers list.

 

When a datacenter switchover has been performed and DAC mode is enabled, there could exist a condition where the standby datacenter contains a single DAG member (and therefore a single node cluster). This raises two interesting conditions:

 

  • A single node cluster always has quorum
  • There could be a single server on the started servers list

 

If the primary datacenter were restored to service without connectivity to the standby datacenter, this configuration could result in a split brain condition. To protect against this, we use an independent arbitrator to assist in determining the DACP bit: the boot time of the witness server.

 

When DAC mode is enabled, a DAG member will record two values in the registry at HKEY_LOCAL_MACHINE\Software\Microsoft\ExchangeServer\v14\Replay\Parameters:

 

  • BootTimeCookie = the boot time of the DAG member
  • BootTimeFSWCookie = the boot time of the witness server (which we obtain using WMI)

 

When DAC mode is enabled, there are different mount-on-startup rules that apply when only a single DAG member remains:

 

  • If the bootTimeCookie equals the boot time of the DAG member <and> the bootTimeFSWCookie does not equal the boot time of the witness server, then the DACP bit is set to 1.
  • If the bootTimeFSWCookie equals the boot time of the witness server <and> the bootTimeCookie does not equal the boot time of the DAG member, then the DACP bit is set to 1.
  • If the bootTimeFSWCookie equals the boot time of the witness server <and> the bootTimeCookie equals the boot time of the DAG member, then the DACP bit is set to 1.
  • If the bootTimeFSWCookie is not equal to the boot time of the witness server <and> the bootTimeCookie is not equal to the boot time of the DAG member, then the DACP bit remains at 0.

 

In the following examples, a two-member DAG was configured, and a datacenter switchover was performed resulting in a single-node cluster. The specific test, with tracing data, is provided for each example.

 

Example #1

 

In this example, the Microsoft Exchange Replication service on the single surviving node is restarted. Neither the DAG member itself nor the witness server was restarted. The bootTimeFSWCookie equals the boot time of the witness server <and> the bootTimeCookie equals the boot time of the DAG member, resulting in a DACP bit of 1.

 

438 00000000 5264 Cluster.Replay ActiveManager GetBootTimeWithWmi: WMI says that the boot time for dc.exchange.msft is 05/27/2012 16:28:12.
439 00000000 5264 Cluster.Replay ActiveManager DetermineAutomountConsensus: checking if the replay service has restarted since the MommyMayIMount bit was set.
455 00000000 5264 Cluster.Replay ActiveManager GetBootTimeWithWmi: WMI says that the boot time for MBX-2.exchange.msft is 05/27/2012 18:02:49.
456 00000000 5264 Cluster.Replay ActiveManager DetermineAutomountConsensus: WMI says the boot time is 05/27/2012 18:02:49, and the boot time when the Mount bit was set was 05/27/2012 18:02:49.
457 00000000 5264 Cluster.Replay ActiveManager DetermineAutomountConsensus found matching boot timestamps, assuming that the replay service has restarted since setting the bit.
458 00000000 5264 Cluster.Replay ActiveManager AllowAutoMount called: Found matching boot timestamps, assuming that the replay service has restarted since setting the bit.
460 00000000 5264 Cluster.Replay ActiveManager RefreshConfigInternal: The Automount consensus is true.

Example #2

 

In this example, the remaining DAG member was restarted and the witness server remained running. The bootTimeFSWCookie equals the boot time of the witness server <and> the bootTimeCookie does not equal the boot time of the DAG member, resulting in a DACP bit of 1.

 

85 00000000 2996 Cluster.Replay ActiveManager GetBootTimeWithWmi: WMI says that the boot time for dc.exchange.msft is 05/27/2012 16:28:12.
86 00000000 2996 Cluster.Replay ActiveManager DetermineAutomountConsensus: checking if the replay service has restarted since the MommyMayIMount bit was set.
87 00000000 2996 Cluster.Replay ActiveManager GetBootTimeWithWmi: WMI says that the boot time for MBX-2.exchange.msft is 05/27/2012 19:11:49.
88 00000000 2996 Cluster.Replay ActiveManager DetermineAutomountConsensus: WMI says the boot time is 05/27/2012 19:11:49, and the boot time when the Mount bit was set was 05/27/2012 18:58:31.
89 00000000 2996 Cluster.Replay ActiveManager DetermineAutomountConsensusUnanimity: There is only one node in the cluster -- this is not sufficient to allow mounts!
90 00000000 2996 Cluster.Replay ActiveManager DetermineAutomountConsensusForSingleMachine: checking if the file share witness has restarted since the MommyMayIMount bit was set.
91 00000000 2996 Cluster.Replay ActiveManager DetermineAutomountConsensusForSingleMachine: WMI says the boot time for the FSW server is 05/27/2012 16:28:12, and the boot time when the Mount bit was set was 05/27/2012 16:28:12.
92 00000000 2996 Cluster.Replay ActiveManager DetermineAutomountConsensusForSingleMachine found matching boot timestamps, assuming that only this computer has restarted since setting the bit.

93 00000000 2996 Cluster.Replay ActiveManager AllowAutoMount called: Found matching FSW boot timestamps, assuming that only this computer has restarted since setting the bit.
94 00000000 2996 Cluster.Replay ActiveManager GetBootTimeWithWmi: WMI says that the boot time for MBX-2.exchange.msft is 05/27/2012 19:11:49.
95 00000000 2996 Cluster.Replay ActiveManager DetermineAutomountConsensusUnanimity is returning True.
96 00000000 2996 Cluster.Replay ActiveManager RefreshConfigInternal: The Automount consensus is true.

Example #3

 

In this example, the witness server was rebooted and the Microsoft Exchange Replication service on the DAG member was restarted. The bootTimeFSWCookie does not equal the boot time of the witness server <and> the bootTimeCookie does equal the boot time of the DAG member resulting, in a DACP bit of 1.

 

263 00000000 1552 Cluster.Replay ActiveManager GetBootTimeWithWmi: WMI says that the boot time for dc.exchange.msft is 05/27/2012 19:36:51.
264 00000000 1552 Cluster.Replay ActiveManager DetermineAutomountConsensus: checking if the replay service has restarted since the MommyMayIMount bit was set.
265 00000000 1552 Cluster.Replay ActiveManager GetBootTimeWithWmi: WMI says that the boot time for MBX-2.exchange.msft is 05/27/2012 19:27:30.

266 00000000 1552 Cluster.Replay ActiveManager DetermineAutomountConsensus: WMI says the boot time is 05/27/2012 19:27:30, and the boot time when the Mount bit was set was 05/27/2012 19:27:30. 267 00000000 1552 Cluster.Replay ActiveManager DetermineAutomountConsensus found matching boot timestamps, assuming that the replay service has restarted since setting the bit.
268 00000000 1552 Cluster.Replay ActiveManager AllowAutoMount called: Found matching boot timestamps, assuming that the replay service has restarted since setting the bit.
269 00000000 1552 Cluster.Replay ActiveManager GetBootTimeWithWmi: WMI says that the boot time for MBX-2.exchange.msft is 05/27/2012 19:27:30.
270 00000000 1552 Cluster.Replay ActiveManager RefreshConfigInternal: The Automount consensus is true.

Example #4

 

In this last example, both the witness server and the remaining single DAG member were restarted. Thus, the bootTimeFSWCookie does equal the boot time of the witness server <and> the bootTimeCookie does not equal the boot time of the remaining DAG member. As such, the DACP bit remains at 0.

 

76 00000000 3664 Cluster.Replay ActiveManager GetBootTimeWithWmi: WMI says that the boot time for dc.exchange.msft is 05/27/2012 19:47:49.
77 00000000 3664 Cluster.Replay ActiveManager DetermineAutomountConsensus: checking if the replay service has restarted since the MommyMayIMount bit was set.
78 00000000 3664 Cluster.Replay ActiveManager GetBootTimeWithWmi: WMI says that the boot time for MBX-2.exchange.msft is 05/27/2012 19:55:40.
79 00000000 3664 Cluster.Replay ActiveManager DetermineAutomountConsensus: WMI says the boot time is 05/27/2012 19:55:40, and the boot time when the Mount bit was set was 05/27/2012 19:27:30.
80 00000000 3664 Cluster.Replay ActiveManager DetermineAutomountConsensusUnanimity: There is only one node in the cluster -- this is not sufficient to allow mounts!
81 00000000 3664 Cluster.Replay ActiveManager DetermineAutomountConsensusForSingleMachine: checking if the file share witness has restarted since the MommyMayIMount bit was set.
82 00000000 3664 Cluster.Replay ActiveManager DetermineAutomountConsensusForSingleMachine: WMI says the boot time for the FSW server is 05/27/2012 19:47:49, and the boot time when the Mount bit was set was 05/27/2012 19:36:51.
83 00000000 3664 Cluster.Replay ActiveManager DetermineAutomountConsensusUnanimity is returning False.
84 00000000 3664 Cluster.Replay ActiveManager Automount consensus not reached, going to Unknown AM role.

 

When performing a datacenter switchover where only a single node remains in the cluster supporting the DAG, any reboot that changes both the boot time of the witness server and the boot time of the DAG member will prevent databases from mounting automatically. If the reboots were necessary and valid operations, administrators can force the databases online without causing split brain.

 

========================================================

Datacenter Activation Coordination Series:

 

Part 1:  My databases do not mount automatically after I enabled Datacenter Activation Coordination (https://aka.ms/F6k65e)
Part 2:  Datacenter Activation Coordination and the File Share Witness (https://aka.ms/Wsesft)
Part 3:  Datacenter Activation Coordination and the Single Node Cluster (https://aka.ms/N3ktdy)
Part 4:  Datacenter Activation Coordination and the Prevention of Split Brain (https://aka.ms/C13ptq)
Part 5:  Datacenter Activation Coordination:  How do I Force Automount Concensus? (https://aka.ms/T5sgqa)
Part 6:  Datacenter Activation Coordination:  Who has a say?  (https://aka.ms/W51h6n)
Part 7:  Datacenter Activation Coordination:  When to run start-databaseavailabilitygroup to bring members back into the DAG after a datacenter switchover.  (https://aka.ms/Oieqqp)
Part 8:  Datacenter Activation Coordination:  Stop!  In the Name of DAG... (https://aka.ms/Uzogbq)
Part 9:  Datacenter Activation Coordination:  An error cause a change in the current set of domain controllers (https://aka.ms/Qlt035)

========================================================