question

CatherineJaszewski-5685 avatar image
0 Votes"
CatherineJaszewski-5685 asked CatherineJaszewski-5685 commented

What causes a Mailbox Database Copy to randomly go into HealthyDisconnect State - Error 2153 MSExchRepl

I have a 2019 Exchange environment with one DAG containing two Mailbox Servers. One of the Mailbox Servers host the Active Database Copy. The other holds the passive database copy. Every 15 minutes the passive copy database goes into a HealthyDisconnect State and logs a MSExchRepl error 2153. I've restarted the Mailbox Server that holds the Passive copy - This did not resolve the issue. Test-ReplicationHealth comes up good (i.e. passes). Get-MailboxDatabaseCopy -connectionstatus comes up okay as well. Verified Firewalls are off too.

Does anyone know why the passive database copy goes into HealthyDisconnect state with MSExchRepl 2153 error?

Please advise.
Thank you,

office-exchange-server-administration
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

AndyDavid avatar image
0 Votes"
AndyDavid answered

Do the events eventually clear? How often do you backup and when? Sounds like the network connection is being overwhelmed.

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

AndyDavid avatar image
1 Vote"
AndyDavid answered ZhengqiLou-MSFT commented

Hey there, Im thinking network issues first.
Then anti-virus if network is ruled out

Hard to know however at this point. When did this start?

· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi Andy,

Always good to hear from you. I'm not sure when it started. We have some errors in the FailoverCluster Manager suggesting something happened late Sunday night (Sept 12th) The mailbox server with the active databases couldn't communicate with the the other nodes...but the FailoverCluster Manager is showing all nodes up and the the DAG as well. I will have our Network look at the the network (tomorrow as he is gone for the day) and I will start looking at the AV component.
Thank you for your response. You are always so helpful. I will let you know how we progress. Thank you,

1 Vote 1 ·
ZhengqiLou-MSFT avatar image ZhengqiLou-MSFT CatherineJaszewski-5685 ·

Andy is right, the status DisconnectedandHealthy means:

The mailbox database copy is no longer connected to the active database copy, and it was in the Healthy state when the loss of connection occurred. This state represents the database copy with respect to connectivity to its source database copy. It may be reported during DAG network failures between the source copy and the target database copy.
https://docs.microsoft.com/en-us/exchange/high-availability/manage-ha/monitor-dags?view=exchserver-2019#get-mailboxdatabasecopystatus-cmdlet

You could run GET-MailboxDatabaseCopyStatus | FL to check the details and hopefully you will find something useful.

Best regards,
Lou

0 Votes 0 ·
CatherineJaszewski-5685 avatar image
0 Votes"
CatherineJaszewski-5685 answered CatherineJaszewski-5685 commented

Good Morning Andy,

We suspect the issue is happening because there are backups going on (Backup Exec). The 2153 error message pops up about a minute or two after a backup of server (NAS) starts and then continues popping up every 15-20 minutes.

My new question(s) is this:
Is there a setting I can tweak to prevent the DisconnectedHealthy on the passive copy from happening?
Because we are not loosing any emails and the copy queue recovers, is this an event we can live with? Or is there the potential for corrupting the passive database?

Please advise.

Thank you,

· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi @CatherineJaszewski-5685 ,

And how long does the backup take?
I think the backup progress is taking the most of your network bandwidth or disk usage. If this error doesn't happen again after the backup finishes, and the copy status comes to Passive Healthy, I think you may ignore this.

As I posted above, DisconnectedandHealthy means the database copy is healthy while it suffers a connection loss. So you don't have to worry about it.

Best regards,
Lou

0 Votes 0 ·

That is what I need to hear Lou (and Andy)

Yes. Backup Team is admitting there is an issue with the current backup jobs taking up bandwidth. We are not sure how long the backup will take. So my job is to perform a quick risk assessment and I think I have my answer.
Yes. The passive database copy reconnects quickly (seconds) and reverts back to a Healthy state. And we are getting no complaints from our end users except the occasional "it's slow". I continue to monitor Server Health, Health Report, Replication Health and everything as it should be.

I am leaning towards just waiting out for the backup team to resolve their issue.

Thank you for all your help and comments. it always appreciated

0 Votes 0 ·