Someone in our environment rebooted a Critical Production Server and no alert was received from SCOM.
Management is out for blood as to why and to have it fixed.
Issue i find is that the Server is a VM and it went down and up in 6 seconds.
SCOM is too slow to detect this failure as it only polls every 60 secs and will alert of a failure on the 4th second.
Has anyone been able to find a suitable fix to this apart from setting up Event ID monitors for Shutdown, Startup etc.
I have also tried OpsLogix Ping Monitor but find that while it is effective in alerting, I can not customise alert console descriptions which is a significant draw back for alerts going to my Level 1 support.
I am thinking of a Powershell monitor that should be run "independent" of the SCOM Agent as if the SCOM Agent Service stops for any reason, for example a server is being shut down, it will kill the Agent service and I might not receive any alert from the script.
Does anyone out here have any good ideas or such a script that can help save my bacon.