SCOM generates a false positive when a container volume volume is renamed after container redeployment or restart

py 206 Reputation points
2020-08-20T06:51:22.317+00:00
Operations Manager
Operations Manager
A family of System Center products that provide infrastructure monitoring, help ensure the predictable performance and availability of vital applications, and offer comprehensive monitoring for datacenters and cloud, both private and public.
1,413 questions
0 comments No comments
{count} votes

Accepted answer
  1. SChalakov 10,261 Reputation points MVP
    2020-08-24T07:47:10.26+00:00

    Hi @py ,

    this is really strange. I can tell from the screenshots that the monitors targets the "Logical Disk" class indeed.
    Can you please do a simple test. On the last screenshot with the "Override Properties", what happen if you select Enable --> False and then click the "Enforced" option to enforce the override?
    Does the monitor fire for this particular disk then?

    Thanks and Regards,
    Stoyan


4 additional answers

Sort by: Most helpful
  1. CyrAz 5,181 Reputation points
    2020-08-24T11:58:26.873+00:00

    (Continuing from the old thread on technet forum)

    The proper way to do this override is not a dynamically populated group, because what you really want here is to avoid these volumes to get discovered in the first place instead of having them discovered and disabling their monitoring, as I explained in the technet thread.

    You answered that you did not have the ExcludeFileSystemName nor the ExcludeFileSystemType overrides for the Discover Universal Linux Logical Disks.
    That is likely because you are still running an older version of the Microsoft.Linux.Universal.Monitoring MP, as these overrides only became available with version 7.6.1064.0 (which was released with SCOM 2016 UR1, if I'm not mistaken).

    Could you check which version of the MP you are running, and update it if it's too old?

    1 person found this answer helpful.

  2. SChalakov 10,261 Reputation points MVP
    2020-08-20T07:55:03.21+00:00

    Hi py,

    I had the same challenge with Kubernetes containers, where there were many pods with the exact same type of disks.
    First things first, there is only one community MP, which monitors containers and it is pretty specific, so there is great chance that you cannot use it.
    This being sad, there are no other MP for SCOM (for now), which can help you monitor containers.

    Now to the question: How to solve the issue with the disks and their dynamic names? This is what I did:

    • Create a dynamic group in SCOM with the Linux disk, where the disk name contained "overlay" (In my case it contained "kubelet")
    • Create an override on the alerted monitor and override it for the group (Enabled=False). This will stop the monitor from alerting.
    • Go ahead and do the same for the Linux logical disk discovery - override it also for the same group.

    This will stop the disks from being discovered and of course will stop the alerts.

    Hope I could clarify this for you!


    (If the reply was helpful please don't forget to accept as answer, thank you)
    Regards,
    Stoyan


  3. SChalakov 10,261 Reputation points MVP
    2020-08-21T06:52:43.95+00:00

    HI PY,
    actually if your group is populated with the disks, then the group configuration is fine.
    Now we need to check why the rule is still alerting. Most probably the rule traget is different then the objects in the group.
    Can you please do me a favour and check the following:

    • In the SCOM console, go to the alert, richt click and select Overrides
    • Afterwards select the "Override rule" and then you will get the options:

    For all objects of class :<<Class Name>>
    For a specific object of the class: <<Class name>>

    I need to know what Class exactly is the GUI showing in the options? Is it "Logical Disk" or is it something else?

    Thanks


    (If the reply was helpful please don't forget to accept as answer, thank you)
    Regards,
    Stoyan

    0 comments No comments

  4. py 206 Reputation points
    2020-08-24T07:32:39.59+00:00

    Hello @SChalakov

    Thanks.

    here is another latest example in another environment.

    Group

    19842-1.png

    Discovered object

    19748-1.png

    Alert

    19843-1.png

    Override

    19851-1.png

    Override properties

    19749-1.png

    Kindly advise

    0 comments No comments