question

JasonChapman-8201 avatar image
0 Votes"
JasonChapman-8201 asked joyceshen-MSFT commented

Microsoft Exchange Transport Service won't start (keeps saying starting)

… it says starting, but fails and keeps retrying & we are panicking!

Environment

  • Windows Server 2012 R2 (Version 6.3 Build 9600) running Exchange Server 2013 (Version 15.00.1497.023) virtualised in Hyper-V.

  • This server runs all of the exchange services

  • Server also runs WSUS

  • PDC is on another VM - all running fine

  • DNS Server is on another VM - all running fine

  • 50 Users all using Outlook (2013 or 2016), none using it at the moment

  • Processor - Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz, 2098 Mhz, 6 Core(s), 6 -Logical Processor(s)

  • Total Physical Memory - 22.0 GB

  • Available Physical Memory- 4.15 GB
    Server normally rebooted monthly with updates, normally takes 10 mins to come back to life completely
    All external e-mail is in / out via Mimecast cloud services.
    we run ESET Mail Security for Microsoft Exchange Serverv.7.1.10009.0
    There is no inbound access to the exchange server from the internet, only from LAN & Mimecast.

Symptom
Last week:
Whilst doing host (not the vm) updates saved and then resumed all servers. "Microsoft Transport Service" showed as starting and no e-mail was flowing. Weird as save / resume shouldn't have stopped any services. We don't have amazing notes, but restarted all servers. Same. Then about 30 minutes later the service just seemed to start and was OK.

Today:
Installing OMSA on host and as a precaution saved the VMS (Exchange followed by other servers)
After the VM state was Saved and then Started again, the "Microsoft Exchange Transport" Service was showing as 'Running', but we received an error when in the Exchange Mail Queue to say "Exchange can't connect to the Microsoft Exchange Transport Service". We then rebooted the Server VM and the Service status changed to "Starting" where it get stuck for a period of time before showing as no status, and then starting again. Each time the process ID for the MSExchnageTransport.exe would change.

Event Log of failure:
14001 - MSExchangeTransport - "The worker process with process ID 20024 is not responding and will be forced to shut down."
What we've tried
Rebooting the server again
Disabling the service so that it stopped on its next restart attempt (it looks like the service attempts to restart every 15 mins or so), moved all of the contents out of the ..\TransportRoles\data\Queue folder before changing the service back to automatic & start, so it attempted to re-start, but it again got stuck on Starting.
The Queue folder is populated with same files as we removed (although the mail.que is smaller).
Checked the Event Logs and we can't see anything in particular that may be causing the service to not start properly.
So we are stuck. Really could do with some help here.

Thanks in Advance - Jason




office-exchange-server-administrationoffice-exchange-server-mailflow
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

AndyDavid avatar image
0 Votes"
AndyDavid answered JasonChapman-8201 commented

"we run ESET Mail Security for Microsoft Exchange Serverv.7.1.10009.0"

Are the Exchange dirs excluded? Does it start if you disable or remove the anti-malware software?

· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.


Hi, thanks for reading.

Both mail integration and protection have been suspended and no change.

The only real error we get is: (full below)

"AD Configuration Readers" & "Microsoft.Exchange.Transport.PoisonMessage", but I think this may be a ref herring (both say elapsed time of 14s)

Provider

[ Name] MSExchangeTransport

EventID 7010

[ Qualifiers] 32772

Level 3

Task 6

Keywords 0x80000000000000

TimeCreated [ SystemTime] 2021-10-17T08:16:14.000000000Z

EventRecordID 21599799

Channel Application

Computer DPExch01.XXXXX(removed).local

Security

The activation of modules are taking longer than expected to complete. Current state of components:<LoadTimings>

<Component Name="TransportConfiguration" Elapsed="00:00:00.1201929" />

<Component Name="Microsoft.Exchange.Transport.SystemCheckComponent" Elapsed="00:00:00.0299731" />

<Component Name="Certificate, RemoteDelivery and Database components" Elapsed="00:00:00.3008514">

 <Component Name="Database and dependents" Elapsed="00:00:00.3007596">

   <Component Name="MessagingDatabase" Elapsed="00:00:00.1835170" />

   <Component Name="ResourceManager" Elapsed="00:00:00.0148119" />

   <Component Name="MessageDepot" Elapsed="00:00:00.0015975" />

snipped as forums won't let me paste 1600+ chars.



0 Votes 0 ·
joyceshen-MSFT avatar image
0 Votes"
joyceshen-MSFT answered joyceshen-MSFT commented

Hi @JasonChapman-8201

According to my search, I found this link seems to discusses the related issue as yours, please check if the troubleshooting steps in this link are helpful to your scenario.

MSExchangeTransport 7004
This event indicates that an issue may exist that prevents the Microsoft Exchange Transport service (MSExchangeTransport.exe) from starting in a timely manner. You may experience this issue in one of the following scenarios.

  • The SenderReputation database takes a long time to replay log files for a large information store database. This may indicate that the SenderReputation database is corrupted. In this scenario, MSExchangeTransport Event ID 14001 may be logged every five minutes. Additionally, a SenderReputation database event for successfully replaying log files is never displayed.

  • You apply an Exchange Update Rollup package to a computer on which the local computer account does not have Internet access. The binary files in the Update Rollup packages are digitally signed. This requires the computer to perform certificate validation checks to verify the packages. If the local computer account does not have direct access to the Internet, the certificate verification check must time-out. This issue may occur when the computer's default gateway does not allow for Internet access or when the computer uses an authenticating proxy server for Internet access.

  • An e-mail client that does not recognize the global message size restrictions is used. This may include earlier versions of Microsoft Outlook such as Microsoft Outlook 2003 SP1 and earlier versions. In this scenario, an e-mail client that does not recognize the global message size restrictions could submit excessively large messages for processing.

  • Exchange is installed on a domain controller.

  • Exchange is installed on a computer that has a slow disk subsystem.

  • An outdated version of an antivirus software is installed on the Exchange server.


If an Answer is helpful, please click "Accept Answer" and upvote it.

Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.


· 3
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi @joyceshen-MSFT , I'm really sorry, I replied earlier, but it didn't seem to post the response:

The SenderReputation database takes a long time to replay log files


I need to look into this. I don't know what I am looking for in the success.

All other Q's are not an issue and I uninstalled the AV today as MS requested. Still no solution and no real help from the MS support today (other than uninstall the MS).

What Event ID's should I be looking for?
14001 is only logged once as I have set it not to attempt to restart, e.g. tonight I asked it to start the service @ 19:56:50.

At 19:57:06 I got a 7010 and the long delay in the detail was: <Component Name="Microsoft.Exchange.Transport.PoisonMessage" Elapsed="00:00:14.4387799" IsRunning="true" > & <Component Name="AD Configuration Readers" Elapsed="00:00:14.4388831" IsRunning="true">
Both have which have appeared before. then the 14001 - process terminated as it isn't responding @ 20:11:50
But then a 7004 on MSExchangeTransport 20:36:50..
The activation of all modules took longer than expected to complete. Total Load Time: 00:24:58.9219603 Total Start
<Component Name="AD Configuration Readers" Elapsed="00:24:58.1901082">
<Component Name="Microsoft.Exchange.Transport.PoisonMessage" Elapsed="00:24:58.1891092" />

No AV, No connectors active

0 Votes 0 ·

Hi,

Please try using Get-ServerHealth and Get-HealthReport to get more information about your server health
Manage health sets and server health in Exchange Server


0 Votes 0 ·

Hi,

Any update about this issue so far?

0 Votes 0 ·
AndyDavid avatar image
0 Votes"
AndyDavid answered JasonChapman-8201 commented

"50 Users all using Outlook (2013 or 2016), none using it at the moment"

Does this mean the server is not in production?

Can you build a new server and move mailboxes? Maybe use this to upgrade to 2016/2019?

· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi @AndyDavid

D

oes this mean the server is not in production?

No it means I have 50 very unhappy users! They are able to use Mimecast Continuity, which is a kind of stripped down webmail, but all e-mails are waiting to be journalled (not sure of term) into exchange when it comes back to life. They are able to use Outlook to see e-mails / public folders prior to Saturday.

It is a single exchange server sitting on a host with a couple of other windows servers and not a lot of headroom for CPU / RAM / Disk. I don't think we have the experience / confidence to spin up another one or migrate / update to 2016/2019 just yet. Ideally get this back to life and then look at: 2016/29 Off365, but most users have 5+GB mailbox and we have 11 "mailboxes" making up I think about 800GB of public folders.

JAC
0 Votes 0 ·