question

JohnpCurtiss avatar image
0 Votes"
JohnpCurtiss asked Crystal-MSFT commented

scom not reporting high cpu

happens all the time, has happened for several versions of scom and several versions of windows. an agent server goes from 15% cpu to 100% cpu in about 3 seconds, which is super unhealthy. but scom never notices because the agent server is too busy to let the agent tell scom about it, i guess? so the monitor never changes state, so no alerts are generated, no recoveries are started. has anybody else seen this, and how do you handle it?


here's a box that was at 100% cpu for over two days straight. no alert. not until somebody manually went in and restarted a runaway service this morning did scom start seeing CPU readings again.

86979-servercpu.jpg


msc-operations-manager
servercpu.jpg (48.7 KiB)
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Crystal-MSFT avatar image
0 Votes"
Crystal-MSFT answered Crystal-MSFT edited

@JohnpCurtiss, Research and find a blog from Kevin describe our situation, this seems to be that the monitor runs every 15 minutes, and evaluates after 3 samples. The samples are not consecutive samples. they are AVERAGE samples.

Before a monitor state change, all the thresholds must be met This means that even if our server is stuck at 100% CPU utilization, it will not genet an alert most of the time. We can see more details in the following link:
https://kevinholman.com/2017/05/13/how-does-cpu-monitoring-work-in-the-windows-server-2016-management-pack/
Note: Non-Microsoft link, just for the reference.

Hope it can help.


If the response is helpful, please click "Accept Answer" and upvote it.
Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.


5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

RogerXue-3369 avatar image
0 Votes"
RogerXue-3369 answered

Issue
1) an agent server goes from 15% cpu to 100% cpu in about 3 seconds and no SCOM alert

By default, SCOM uses monitor "Total CPU Utilization Percentage" to monitor high CPU utilization but this monitor only generate alert when CPU Queue Length and utilization high than threshold. So, merely high CPU utilization does not trigger the alert.


Roger

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

JohnpCurtiss avatar image
0 Votes"
JohnpCurtiss answered Crystal-MSFT commented

That's not it. My queue length has been set to zero via override for a very long time. My interval is also ten minutes, and samples is set to two. I get cpu alerts all the time when a server sits at 96% for ten minutes. This is a separate problem.

· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

@JohnpCurtiss, From your description, I know the queue length is set to 0. But for almost 2 days, we didn't get any CPU data or alert on the affected agent. Could you let us know if we get any other alert with this affected agent during the two days?

0 Votes 0 ·

@JohnpCurtiss,, Hope things are going well. I am writing to see if there's any alert during the two days? Please help to provide the information to go further.

Thanks and have a nice day!

0 Votes 0 ·
CyrAz avatar image
0 Votes"
CyrAz answered

Don't take my word for it, but if the CPU is so high that SCOM agent can't even collect perf metrics, it would make sense it can't either send an alert about these metrics.
However there may have been alerts about WMI query failed or failed scripts or failed perf counter collection etc; whether in SCOM itself or in Operations Manager event log.

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.