question

TRDx2-0049 avatar image
0 Votes"
TRDx2-0049 asked ·

What is causing EventID 2153 MSRepl

I was reviewing our Application logs on our 3 Exchange 2016 servers and came across the following error message.

The log copier was unable to communicate with server 'Exchange1.Domain.com'. The copy of database 'MailDB03\Exchange1' is in a disconnected state. The communication error was: An error occurred while communicating with server 'Exchange1'. Error: Unable to write data to the transport connection: An established connection was aborted by the software in your host machine. The copier will automatically retry after a short delay.

Our current setup is 3 exchange 2016 servers CU16. Exchange1 and Exchange2 are in our primary datacenter site. Exchange3 is in our backup datacenter site. All databases are active on Exchange1. I don't see the errors in the application logs on either of the other 2 nodes in our primary datacenter, only on Exchange3.

When I run the following commands I get:

Get-MailboxDatabaseCopyStatus * All 5 databases across the three nodes are healthy. CopyQueueLength and ReplayQueueLengths are 0. Occasionally they show 1 on ReplayQueueLength on either of the two passive nodes (Exchange2 and Exchange3).

Get-MailboxDatabaseCopyStatus -ConnectionStatus | FT Identity,IncomingLogCopyingNetwork on Exchange2 shows

MailDB01\Exchange2 {Exchange1,MapiDagNetwork} for all five databases

On Exchange3 (DR)

MailDB01\Exchange3 {Exchange1\MapiDagNetwork, An error occurred while communicating with server 'Exchange1'. Unable to write data to the transport connection: An established connection was aborted by the software in your host machine.}

I get the above message on all 5 databases on exchange3.

Test-replicationhealth on all nodes passes all tests.

I am thinking that maybe something with the fact that Exchange1 and Exchange2 are on the same network and Exchange3 is on a separate network in the Backup datacenter? Everything can ping each other.

I am going to do some failover testing tonight to see if there is any impact but if anyone has any ideas what this is or how to correct it please let me know.

Thanks

office-exchange-server-administrationoffice-exchange-server-ha
10 |1000 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

KaelYao-MSFT avatar image
0 Votes"
KaelYao-MSFT answered ·

Hi,@TRDx2-0049

Thanks for the detailed information.
Since the "Get-MailboxDatabaseCopyStatus -ConnectionStatus | FT Identity,IncomingLogCopyingNetwork " on Exchange3 shows error on all 5 databases, the problem may be caused by the network problems between your PR and DR sites.

Besides failover testing, please also try removing a database copy from Exchange3 and reseeding it for test if possible.

In addition, is Exchange3 using the same hardware as Exchange1 or Exchange2?
According to this case, disk I/O problem would also be the cause.


If the response is helpful, please click "Accept Answer" and upvote it.
Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

·
10 |1000 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

TRDx2-0049 avatar image
0 Votes"
TRDx2-0049 answered ·

I have paused replication on each database and the resumed. No real issues with catching up. I created a new database and seeded to the node in question without issue. I have activated the databases on Exchange3 and their were no noticeable issues. When I put Exchange3 in maintenance and restart it the error seems to go away for a short time about an hour or so. Then it slowly comes back one database at a time.

To answer your question about hardware. All Exchange servers are virtualized. Exchange1 & 2 are on the same hardware. I am not sure what hardware Exchange3 is on.

The article you referenced I have also seen. I would like to also know what perfmon counters the original poster used to identify the issue. When I monitor Disk read and write queue length I don't see anything that would indicate an I/O. Should there be other counters that would be more useful to identify a disk issue?

· 4 ·
10 |1000 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Should there be other counters that would be more useful to identify a disk issue?
You may also check these counters:
Average Disk Seconds Per Read
Average Disk Seconds Per Write
60174-5.png


If the response is helpful, please click "Accept Answer" and upvote it.
Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

0 Votes 0 ·
5.png (13.8 KiB)

@TRDx2-0049
Hi,
I am writing here to confirm with you how thing going now?
Please let us know if you would like further assistance.


If the response is helpful, please click "Accept Answer" and upvote it.
Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

0 Votes 0 ·
TRDx2 avatar image TRDx2 KaelYao-MSFT ·

Thanks for a follow up. I have been monitoring disk performance and don't see anything that would indicate a disk issue. I have opened a case with Microsoft on Tuesday and have yet to actually speak with an engineer on the issue. Once I find out what the issue is I will post back on what we do for resolution.

0 Votes 0 ·
Show more comments
TRDx2 avatar image
1 Vote"
TRDx2 answered ·

As a follow up I ended up contacting Microsoft on this issue. To resolve the issue we suspended database copy and resumed it. That seems have fixed it for now but I am continuing to monitor.

The 2 errors I was getting after running:


Get-MailboxDatabaseCopyStatus * | ft Name,Status,CopyQueuelength,ContentIndexState,IncomingLogCopyingNetwork -AutoSize

Were:
{Exchange1,MapiDagNetwork,An error occurred while communicating with server 'Exchange1'. Error: The requested address is not valid in its context}
and:
{Exchange1,MapiDagNetwork,An error occurred while communicating with server 'Exchange1'. Error: Unable to write data to the transport connection: An established connection was aborted by the software in your host machine.}

we ran:
Suspend-MailboxDatabaseCopy MailDB01\Exchange3
Resume-MailboxDatabaseCopy MailDB01\Exchange3

Hopefully that helps someone else with considerable less headaches.



· 1 ·
10 |1000 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Thanks for your sharing!

I have converted this comment to an answer.
Please feel free to mark it as the answer to the question.
It may highlight the answer and be helpful to other community members.
Thanks for your understanding.


If the response is helpful, please click "Accept Answer" and upvote it.
Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

0 Votes 0 ·