question

SaiyadRahim-8697 avatar image
0 Votes"
SaiyadRahim-8697 asked ·

SCOM Ping Monitor

Hi All,

Someone in our environment rebooted a Critical Production Server and no alert was received from SCOM.

Management is out for blood as to why and to have it fixed.
Issue i find is that the Server is a VM and it went down and up in 6 seconds.

SCOM is too slow to detect this failure as it only polls every 60 secs and will alert of a failure on the 4th second.

Has anyone been able to find a suitable fix to this apart from setting up Event ID monitors for Shutdown, Startup etc.

I have also tried OpsLogix Ping Monitor but find that while it is effective in alerting, I can not customise alert console descriptions which is a significant draw back for alerts going to my Level 1 support.

I am thinking of a Powershell monitor that should be run "independent" of the SCOM Agent as if the SCOM Agent Service stops for any reason, for example a server is being shut down, it will kill the Agent service and I might not receive any alert from the script.

Does anyone out here have any good ideas or such a script that can help save my bacon.

msc-operations-manager-generalmsc-operations-manager-authoring
· 1
10 |1000 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

I'd rather use a Startup/Shutdown PowerShell Script. You can do this via AD or local Group Policy setting.
Within the script you could send an email to the 1st level support, noting that the system is going to shutdown and has been started.

77212-screenshot-2021-03-12-125007.png
The system waits until the script has finished and won't shutdown.

Additionally you can create an override within the Agent heartbeat settings for the critical production server, and lower the heartbeat interval as low as 5 seconds.
77223-screenshot-2021-03-12-125923.png


0 Votes 0 ·
RogerXue-3369 avatar image
0 Votes"
RogerXue-3369 answered ·

SCOM uses heartbeat to determine whether agent is up or not. If this setting is too short, it has high possibility that it will generate a false alert owning to netwrok or communicate issue. If you want to monitor server up and down for a short period of time such as 6 seeconds, you should create following event alert for server reboot.

You may create an event alert rule for monitoring follow event
Event ID Description
41 The system has rebooted without cleanly shutting down first. This error could be caused if the system stopped responding, crashed, or lost power unexpectedly.
1074 Logged when an app (ex: Windows Update) causes the system to restart, or when a user initiates a restart or shutdown.
6006 Logged as a clean shutdown. It gives the message "The Event log service was stopped".
6008 Logged as a dirty shutdown. It gives the message "The previous system shutdown at time on date was unexpected".



Roger

· 1 ·
10 |1000 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi Roger,

Is there any way to get alerted from SCOM when the server actually goes "down" (like with in 3 -4 seconds) and not when the server is "Up" and running?

0 Votes 0 ·
RogerXue-3369 avatar image
0 Votes"
RogerXue-3369 answered ·

You may consider using free OpsLogix Ping Management Pack to ping target host to check whether it is up/down. You can configure how many seconds are waited before performing the next ICMP ping and how many "ping replies" can be missed before raising an alert.

https://www.opslogix.com/ping-management-pack

Roger

· 5 ·
10 |1000 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Agreed, especially because it uses a managed module so the workflow will be noticeably lighter than using a scripted monitor for example.

0 Votes 0 ·

Thanks Guys.

I do have that installed and i agree that is the closest i got to what I need.

However, in OpsLogix, I can't find a way to be able to Target specific Groups instead of Adding Hosts Manually and is there a way to Override the Alert Description on the scom console with custom message/text?

Customising the Alert Description on the console is important for Support Teams to read and follow instructions for each type of Server when it goes down and escalate to correct team accordingly.

0 Votes 0 ·

No you can't target groups with this MP (and in a more general way, you can't target groups when you create monitors or rules in SCOM).
You can however override the alerts names and descriptions : https://kevinholman.com/2020/08/02/how-to-override-the-alert-name-and-alert-description-of-a-sealed-monitor/

0 Votes 0 ·

Hi Cyril,

I have seen this article but I am yet to give it a try.
What I am not sure of here is how do I set different descriptions for example for a File Server alert and different Description for SQL Server Alerts.

From reading this article, it seems like it will be the same "Text Description" for any Server type that generates a Ping Lost alert....close but no cigar.
Hence, if I could Target this Monitor to Individual Groups, I could Override the Text and Target it to appropriate Groups.

Just seems like OPsLogix need to take that extra few steps to complete this Monitor....or have a Pro Version with all the features that a MP requires.

0 Votes 0 ·
Show more comments
SaiyadRahim-8697 avatar image
0 Votes"
SaiyadRahim-8697 answered ·

How though?

There is no override option in GUI so it will need to be in the XML right?

But how do I craft the XML to identify to use "Text Description A" where Server Name %filesvr% and use "Text Description B" where Server NAME %SQL%....is this the logic to use or something else?

·
10 |1000 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.