question

HowardGyton-6093 avatar image
0 Votes"
HowardGyton-6093 asked ZhengqiLou-MSFT commented

Exchange 2016 DAG okay, but rebuilt node missing in Failover Cluster

Hi,

We recently rebuilt one of our Exchange servers, and have come across an issue with the Windows Failover Clustering, rather than the Exchange side of things. Once the server had been rebuilt, we added that note back into the DAG via the Exchange console. We then proceeded to re-seed the passive database copies. All of that worked okay, but we get failures when we test the replication health.

It looks like the process of adding the clustering service, but without being told it was waiting for a server restart to complete, which we didn't do. I suspect that is the reason why in the Windows Failover Clustering, it only shows a single node. When I attempt to add the newly built node to that cluster, it fails stating that the node is already part of the cluster.

Running the following command shows:

cluster /cluster:DAG02 /add /node:SERVER1

Configuring node SERVER1
---------------------------------------
12% Validating cluster state on node SERVER1.This phase encountered an error for Cluster object 'Node SERVER1 appears to be a member of a cluster. It is either a member of an existing cluster or the node was not cleaned up after being evicted from a cluster. If you are sure this is not a member of a cluster run the Remove-ClusterNode cmdlet with the -Force parameter to clean up the cluster information from the node and then try to add it to the cluster again.' but will continue. The error status is 5065 (0x000013C9).
This phase has failed for Cluster object 'SERVER1' with an error status of 5065 (0x000013C9).
This phase has failed for Cluster object 'SERVER1' with an error status of 5065 (0x000013C9).
Cleaning up SERVER1.

System error 5065 has occurred (0x000013c9).
The cluster node is already a member of the cluster.

cluster node

Listing status for all available nodes:

Node Node ID Status
-------------- ------- ---------------------
SERVER2 2 Up


Checking the database copy status on SERVER1:

Get-MailboxDatabaseCopyStatus -Server SERVER1

Name Status CopyQueue ReplayQueue LastInspectedLogTime ContentIndex
Length Length State
---- ------ --------- ----------- -------------------- ------------
EDB AC 01\SERVER1 Healthy 0 0 16/03/2021 09:50:05 Healthy
EDB DG 01\SERVER1 Healthy 0 0 16/03/2021 09:50:21 Healthy
EDB HJ 01\SERVER1 Healthy 0 0 16/03/2021 09:49:47 Healthy
EDB KM 01\SERVER1 Healthy 0 0 16/03/2021 09:49:11 Healthy
EDB NR 01\SERVER1 Healthy 0 0 16/03/2021 09:47:09 Healthy
EDB SZ 01\SERVER1 Healthy 0 0 16/03/2021 09:49:48 Healthy


And on SERVER2:

Get-MailboxDatabaseCopyStatus -Server SERVER2

Name Status CopyQueue ReplayQueue LastInspectedLogTime ContentIndex
Length Length State
---- ------ --------- ----------- -------------------- ------------
EDB DG 01\SERVER2 Mounted 0 0 Healthy
EDB AC 01\SERVER2 Mounted 0 0 Healthy
EDB HJ 01\SERVER2 Mounted 0 0 Healthy
EDB KM 01\SERVER2 Mounted 0 0 Healthy
EDB NR 01\SERVER2 Mounted 0 0 Healthy
EDB SZ 01\SERVER2 Mounted 0 0 Healthy

I'm not sure how to proceed here.

I don't know whether it would be safe to run the suggested command, "Remove-ClusterNode SERVER1 -force" to cleanup the metadata, then attempt to re-join it to to failover cluster, without upsetting anything else on the Exchange side.

I don't know whether running the "Clear-ClusterNode" on the affected node would help, and allow me to add this node back in to the "DAGO2" cluster.

office-exchange-server-administrationwindows-server-2016windows-server-clustering
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

HowardGyton-6093 avatar image
1 Vote"
HowardGyton-6093 answered

It looks like it was much simpler than we thought. For some reason, when I added the rebuilt server into the DAG, it was not automatically joined to the Failover Cluster, as you suggest. As I was fiddling around trying to find what was wrong, and finding the message about the pending reboot, I noticed that the service was Disabled. I switched this to Automatic after the reboot. Then I found that trying to manually add it to the cluster failed.

A colleague found that if you switch the service back to Disabled, it then allows itself to join the Failover Cluster! It looks like that message I saw about it being a member of an existing cluster is bogus, and it really should report that the service is not Disabled.


5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

ZhengqiLou-MSFT avatar image
1 Vote"
ZhengqiLou-MSFT answered ZhengqiLou-MSFT commented

Hi @HowardGyton-6093 ,

Good day!

Please run the following cmdlet to check the the DAG and try to remove the Server1 and retry adding it if there is Server1, if not you can try adding it.

 Get-DatabaseAvailabilityGroup
 Remove-DatabaseAvailabilityGroupServer -Identity "DAGName" -MailboxServer Server1
 Add-DatabaseAvailabilityGroupServer -Identity "DAGName" -MailboxServer Server1

If this couldn't work, you should run the Remove-ClusterNode. You don't have to worry about the data loss, this command only remove the node from the cluster, it's like removing the member from a DAG.

I think you will could add the server after removing the node.

Regards,
Lou


If the response is helpful, please click "Accept Answer" and upvote it.
Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi Lou,

Thanks for responding!

Get-DatabaseAvailabilityGroup

Name Member Servers Operational Servers


DAG02 {SERVER2, SERVER1}

I think the Exchange side is fine, and fully replicated.

Essentially, the server was rebuilt after I had removed all the passive copies from it, then removed it from the DAG. But I did not remove it from the Failover Cluster, so I suspect it is that meta data that needs clearing up, as it the cluster still believes it is a member there.

I think I will try your suggesting of either remove-clusternode, or the clear-clusternode, and see if it lets me add the node back to the cluster.

I'll respond with my results.

Thanks again.


0 Votes 0 ·

Hi Howard,

Thanks for sharing those info!

From the result we can see the SERVER1 is still in the DAG, as the DAG uses a similar tech with Cluster, I'm considering if you could remove SERVER1 from the DAG and then try re-join.

Also remove-clusternode is another way to get out, but you don't have to do any settings from Windows Cluster Manager or others, after joining the DAG, the member server will be automatically added to the cluster.

Regards,
Lou

0 Votes 0 ·
HowardGyton-6093 avatar image
0 Votes"
HowardGyton-6093 answered ZhengqiLou-MSFT commented

Both DAG, and failover cluster are now healthy!

· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Greeeeeeet!

Nothing could be better than fixing this issue so quickly!
Then you can mark your answer as accepted so this could help others have the same issue.

Best regards,
Lou

0 Votes 0 ·