SCOM 2012 R2 - Linux Servers going offline

Anchal gupta 21 Reputation points
2020-09-24T06:57:28.45+00:00

Hi Team,

Last week we have migrated our Management servers and database servers to new datacenter and after migration we saw something very strange with respect to Linux server monitoring where some of the servers went offline.

We did some troubleshooting and found that there is no issue with firewall or port as we could reinstall the agent on those linux servers.

Now the main issue is that, the servers which are offline, we tried reinstalling the agent and it gets successful and server turns healthy but after few minutes it gets offline after generating a heartbeat failure alert.

We have checked the event viewer and saw some events related to profile.
27876-image.png

After this event, we have distributed the account again to all those affected servers by creating the group but it didn't make any difference but the event went away from the event viewer but the servers are still offline.

I need urgent help on this.

Thanks to all of you in advance.

Operations Manager
Operations Manager
A family of System Center products that provide infrastructure monitoring, help ensure the predictable performance and availability of vital applications, and offer comprehensive monitoring for datacenters and cloud, both private and public.
1,428 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. SChalakov 10,276 Reputation points MVP
    2020-09-24T08:10:47.363+00:00

    Hi @Anchal gupta ,

    can you please check again how your profiles are configured and if all the Linux related profiles are mapped to the proper account and also targeted correctly. I have seen this error multiple times and it is always related to account targeting and distribution. So please ensure that:

    • the Linux accounts are distributed to the members of your Linux monitoring ressource pool (Account Distribution Security, More Secure)
    • the accounts are mapped to the proper profiles under Run As Profiles.

    Regards,
    Stoyan


  2. AlexZhu-MSFT 5,551 Reputation points Microsoft Vendor
    2020-09-25T05:11:11.067+00:00

    Hi,

    Since the problem happens after migration and we hasn't changed any operations manager settings, it may relates to network.

    Besided the SSH port (default value is 22), for linux agent, tcp port 1270 is also required.

    From Linux side, we can check if port 1270 is listening by omiserver and the firewall is open, see below example.

    28253-scom-linux-wsman.png

    If port 1270 is listening, from management server side, we can use test-netconnection to see if we can connect to it. If not, ask network team for help. See below example.

    28272-scom-linux-wsman-01.png

    Hope the above information helps.

    Alex Zhu


    If the response is helpful, please click "Accept Answer" and upvote it.

    0 comments No comments