question

TSchneider-3686 avatar image
0 Votes"
TSchneider-3686 asked Awalk-1363 commented

Health service stops on 2012R2 servers on Log end to end workflow with access denied

For a couple of weeks we now see the following new behavior:

On our Windows Server 2012R2 systems the Health Services stops with the following error:

 The System Center Management Health Service 75BEBE6D-7C3B-362D-3AC7-2613679FB06F running on host JTA23007Pxxxxt and serving management group with id {A9D908C8-532E-C695-796F-F5EAF0453908} is not healthy. Some system rules failed to load.

On the affected system the follwoing entry is in the event log:

 Failed to create process due to error '0x80070005 : Access is denied.
 ', this workflow will be unloaded. 
    
 Command executed: "C:\windows\system32\windowspowershell\v1.0\powershell.exe" -ExecutionPolicy Unrestricted -Command "& '"C:\Program Files\Microsoft Monitoring Agent\Agent\Health Service State\Monitoring Host Temporary Files 8112\2021\LogEndToEndEvent.ps1"'"
 Working Directory: C:\Program Files\Microsoft Monitoring Agent\Agent\Health Service State\Monitoring Host Temporary Files 8112\2021\ 
    
 One or more workflows were affected by this.  
    
 Workflow name: Microsoft.SystemCenter.AgentManagement.LogEndToEndEvent 

Out of our 850 2012R2 systems this happens every day on around 5 of them. Restarting the SCOM agent solves the issue. We have all flavours of Windows Servers (2008R2, 2016, 2019) but it only happens on the 2012R2 systems.

The issue occurs on both of our SCOM environments running 2019UR1 and 2019UR3. So most likely not related to the SCOM agent.
I suspect that it might be caused by a recent Windows patch as I seem to remember having seen it happen first in our dev environment.

So far I cannot recognize a pattern in the errors. And it does not happen regularly on the same system so that I could start to investigate any further.


Has anybody seen something similar ?

Thanks
Thorsten










msc-operations-manager
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Crystal-MSFT avatar image
0 Votes"
Crystal-MSFT answered Crystal-MSFT edited

@TSchneider-3686 , For the error message we get, it shows access is denied. Please go to "C:\windows\system32\windowspowershell\v1.0" and find Powershell.exe. Check the permission for the account that runs the Operation Manager agent like local system and see if it has read & execute rights. If not, grant the permission to see if it is working.

However, if the issue still persists, please clear cache to see if it can be fixed.
https://docs.microsoft.com/en-us/system-center/scom/manage-clear-healthservice-cache?view=sc-om-2019

In addition, I notice you doubt it may be related with windows updates. To clarify this, we can uninstall the patch one by one to see if we can find the one that may be related.

Hope it can help.


If the response is helpful, please click "Accept Answer" and upvote it.
Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.


5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

CyrAz avatar image
0 Votes"
CyrAz answered Crystal-MSFT commented

LogEndToEndEvent.ps1 is a very basic script that uses a momapi (scom agent) function to log an event to OperationsManager event log :
https://systemcenter.wiki/?GetElement=Microsoft.SystemCenter.AgentManagement.LogEvent&Type=WriteActionModuleType&ManagementPack=Microsoft.SystemCenter.2007&Version=10.19.10505.0

However, it looks like the script is not even starting ("failed to create process"), which is quite weird; and not even all the time which is even weirder.
Under what account is your agent running?
Do you somehow restrict the modification of the ExecutionPolicy to prevent it from being set to Unrestricted?
Do you see corresponding events in the Security event log?

· 5
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi CyrAz,

this particular rule is run every 15 minutes. And it works most of the times and only fails on around four servers once a day. So it cannot be permission or cache related. We are not restricting the ExecutionPolicy in any way and all of the other monitors and rules continue to work.

The agent is run under the SYSTEM account. There are no "deny" entries in the Security log at around that time.

Because it not happen regularly on servers it is really hard to track down as I yet have not come across a single system that was affected twice. As said, we have around 850 servers running 2012R2 and it is random every day which ones will be affected.

Thanks
Thorsten

0 Votes 0 ·

@TSchneider-3686 Just as CyrAz mentioned, the issue is strange. I would like to check if there's any Antivirus installed on these machines.

0 Votes 0 ·

@Crystal-MSFT There is no Windows Defender on those machines, but only Falcon CrowdStrike. No other AV products

Thanks


0 Votes 0 ·
Show more comments
CyrAz avatar image
0 Votes"
CyrAz answered

If it always happens around the same time, you could take a workflow trace : https://monitoringguys.com/2020/12/15/tracing-scom-workflows-with-powershell/
They usually provide a lot of useful information!

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Awalk-1363 avatar image
0 Votes"
Awalk-1363 answered Awalk-1363 commented

@TSchneider-3686 Hello-I am dealing with the same exact issues. Same setup as far as AV- were you able to find a resolution to this?
Any help would be greatly appreciated!

· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

I was able to find the solution for those interested. There is a setting in Crowdstrike's Windows prevention policy under "Sensor visibility" --> "Additional User Mode Data"(AUMD). This can occasionally cause these errors because CS is inspecting the thread injections. We noticed this as primarily an issue with Windows 2012 and COM objects.
You can toggle this off to test it out, once that setting is toggled you will need to restart any machines tied to that policy to take effect.

Note that this will limit much of the product's capabilities so automating a restart of SCOM on those devices is the optimal solution in my opinion.

Good luck and hope that helps!

0 Votes 0 ·