How do I fix a grey agent where Health Service Watcher is Not Monitored?

Pauciloquent 71 Reputation points
2022-05-06T03:13:03.737+00:00

Hi

One of my new computers in AWS are getting greyed out. As soon as I approve the agent, it greys out. When I look at its Health Explorer, this is how it looks like:

199389-1.png

199320-2.png

5723 port is open. Ping is blocked. Reinstalled the agent. Agent v10. Tried almost everything as per the article how to fix grey agents but in vain.

What I have noticed Health Service Watcher is not Monitored. Now I am not sure why it's like that?

Can anyone please help me in this regard?

@SChalakov @CyrAz

Operations Manager
Operations Manager
A family of System Center products that provide infrastructure monitoring, help ensure the predictable performance and availability of vital applications, and offer comprehensive monitoring for datacenters and cloud, both private and public.
1,413 questions
0 comments No comments
{count} votes

4 answers

Sort by: Most helpful
  1. CyrAz 5,176 Reputation points
    2022-05-06T13:31:07.04+00:00

    Any chance the agent is running on a domain controller? In that case, have a look at hslockdown : https://kevinholman.com/2016/11/04/deploying-scom-2016-agents-to-domain-controllers-some-assembly-required/
    Have a look at it even if it's not a DC, actually

    1 person found this answer helpful.

  2. Andrew Tabar 171 Reputation points
    2022-05-09T15:41:34.987+00:00

    Stop the Microsoft Monitoring Agent service.
    Go into C:\Program Files\Microsoft Monitoring Agent\Agent and rename/delete the Health Service State directory.
    (this is called "clearing cache")
    Restart the Microsoft Monitoring Agent service.
    Watch the Operations Manager event log, should start to see event ID 1201 events in the log if it's approved.

    If it's not working, check the Microsoft Monitoring Agent certificate.
    Start MMC
    Add/Remove Snapin, select Certificates - Computer Account
    Look in the Microsoft Monitoring Agents - Certificate folder. Make sure the certificate name exactly matches the FQDN of the server (this is the name of the server and the "Primary DNS Suffix" as seen in ipconfig /all). If it doesn't fix the Primary DNS Suffix, delete the cert, clear cache.

    1 person found this answer helpful.
    0 comments No comments

  3. CyrAz 5,176 Reputation points
    2022-05-06T09:23:56.81+00:00

    Anything interesting in agent's operationmanager event viewer?


  4. SChalakov 10,261 Reputation points MVP
    2022-05-06T10:24:09.643+00:00

    Hi @Pauciloquent ,

    Agreee with CyrAz, you need to check the event log immediately after Service-Restart. What events are being logged?
    Another thing: On the screenshot I can see that the agent chnages state also to a Warning, what is the State Change Context for that? What details are logged there?
    You could do a verbose agent tracing, but it requires advanced SCOM knowledge in order to be able to extarct some useful information.

    I would also try to filter the event log on the agent primary management server, in order to check whether there are any clues there

    Regards,

    ----------

    (If the reply was helpful please don't forget to upvote and/or accept as answer, thank you)

    Regards
    Stoyan Chalakov