SCOM Linux Platform unknown after discovery

Patrick Gemme 1 Reputation point
2020-08-18T20:31:08.583+00:00

I have a handful of Linux servers that get added to SCOM just fine and are monitored without much issue but they stay in the Platform: Unknown and Version: Unknown group. They are a random group of the same OS's I already have and are working fine.

If I watch the Task Status of the WSMan Probe Task during the import I can see that the OS version is discovered correctly:
OSName: Ubuntu
OSType: Linux
OSVersion: 18.04

I've reinstalled the agent and attempted different versions with the same result. I wouldn't be concerned but this seems to be the same group that the disk isn't being monitored. When I check the Health Explorer -> under Availability the Hardware Availability Rollup is the empty green circle.

Operations Manager
Operations Manager
A family of System Center products that provide infrastructure monitoring, help ensure the predictable performance and availability of vital applications, and offer comprehensive monitoring for datacenters and cloud, both private and public.
1,419 questions
0 comments No comments
{count} votes

6 answers

Sort by: Most helpful
  1. AlexZhu-MSFT 5,551 Reputation points Microsoft Vendor
    2020-08-19T08:30:03.427+00:00

    Hi,

    The problem may be related to the management packs. We may try to purge all the linux management packs and re-import them. We may use the follow PS to do this in batch
    Get-scommanagementpack | where {$_.name -like 'linux'} | remove-scommanagementpack

    18978-scom-linux-remove-mp.png

    For more details, we may refer to the below thread:
    https://social.technet.microsoft.com/Forums/exchange/en-US/d82d5169-a5fd-4fe1-ac21-3d4893bc8c76/scom-2016-platform-unknown-and-version-unknown-after-sucessful-agent-deployment?forum=operationsmanagerunixandlinux

    Hope the above information helps.

    Regards,

    Alex Zhu

    0 comments No comments

  2. SChalakov 10,266 Reputation points MVP
    2020-08-19T09:05:11.007+00:00

    Hi,
    I would use only the latest MP versions and will make sure that the latest UR depending on your SCOM version is also applied.
    If this is the case then I would check in SCOM if there are any failed workflows on those agents. I would look for related events in the Operations Manager event log on all management servers, which are part of your Linux Monitoring Ressource Pool.
    Please take a look here and try to run the tests Steve Weber suggests:

    scom-linux-logical-disk-health-monitor-not-working

    There are also other options for workflow debugging, but I would start with this.


    (If the reply was helpful please don't forget to accept as answer, thank you)
    Regards,
    Stoyan

    0 comments No comments

  3. Patrick Gemme 1 Reputation point
    2020-08-19T21:00:27.267+00:00

    These servers are passing Steve Weber's test that you suggest and is able to see the filesystems on them. I haven't found anything related in the Operations Manager event log on any of the SCOM servers. I'm currently on SCOM 1807 and running the 7.7.1136.0 MP's for Unix/Linux. I'm attempting to use the TraceConfig and read through some of those logs now.

    I'm a bit cautious to do what Alex suggests, as I have a lot of production servers there so I'll test that technique before proceeding. I will see what is involved in upgrading to SCOM 2019, which was on my todo list anyway.

    0 comments No comments

  4. AlexZhu-MSFT 5,551 Reputation points Microsoft Vendor
    2020-08-20T07:48:53.35+00:00

    Hi,

    Thank you for the update. Yes, for production environment, we may need to do a full backup/test before making any changes.

    If we want to upgrade to 2019, we may check the following step-by-step guide.

    https://thesystemcenterblog.com/2019/03/15/upgrading-to-scom-2019-step-by-step/
    Note: this is not from MS, just for your reference.

    Hope the above information helps.

    Regards,
    Alex Zhu


  5. Patrick Gemme 1 Reputation point
    2020-09-07T15:47:31.19+00:00

    @AlexZhu-MSFT I've been able to fully upgrade to 2019 and get to the latest MPs for Linux (10.19.1082.0) but I still have the same issue with a handful of machines.

    I've upgraded the Linux agent on some of them (those show up as 1.6.4-7). For some I've uninstalled and re-discovered them. I've checked their DNS, their certificate names, reverse DNS, nameservers, hostnames; and also compared those to the same version linux from other production servers that are working fine.

    Still no luck, many linux servers refuse to move out of the 'unknown' platform. Any other ideas?

    0 comments No comments