question

Pauciloquent avatar image
0 Votes"
Pauciloquent asked AndrewTabar-2806 answered

How do I fix a grey agent where Health Service Watcher is Not Monitored?

Hi

One of my new computers in AWS are getting greyed out. As soon as I approve the agent, it greys out. When I look at its Health Explorer, this is how it looks like:

199389-1.png


199320-2.png



5723 port is open. Ping is blocked. Reinstalled the agent. Agent v10. Tried almost everything as per the article how to fix grey agents but in vain.

What I have noticed Health Service Watcher is not Monitored. Now I am not sure why it's like that?

Can anyone please help me in this regard?


@StoyanChalakov @CyrAz

msc-operations-manager
1.png (8.5 KiB)
2.png (12.0 KiB)
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

CyrAz avatar image
0 Votes"
CyrAz answered Pauciloquent commented

Anything interesting in agent's operationmanager event viewer?

· 3
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Nope. All looks good. It’s mind boggling.

0 Votes 0 ·
CyrAz avatar image CyrAz Pauciloquent ·

Even right after an agent restart? There almost always is something in the event viewer....
Or sometimes on the management server event viewer

0 Votes 0 ·

Hi @CyrAz

I have restarted the healthservice:

I can see 1210 event in Agent events. There are two following Warnings too:

Event ID: 26002
Level Warning

Description:

The Windows Event Log Provider was unable to open the OperationsManager event log on computer 'scomagent.abc.com' for reading. The provider will rety opening the log every 30 seconds.

Most recent error details: The specified channel could not be found. Check channel configuration.

One or more workflows were affected by this.

Workflow name: Windows.WPM.WMIError.ID10401.Cluster.Event
Instance name: socmagent.abc.com

2nd Warning:

Event ID: 10409
Level: Warning

Object enumeration failed

Query 'select Name from Win32_Directory where name = 'Directory location'

Details: Invalid query

On assigned Management Server, there are multiple Event ID 20000

Source: OpsMgr Connector
Level: Information

Description:

A device is not part of thsi management group has attempted to access this Health Service.
Requesting Device Name:

But affected device is not in the above Events though.

0 Votes 0 ·
StoyanChalakov avatar image
0 Votes"
StoyanChalakov answered Pauciloquent commented

Hi @Pauciloquent,

Agreee with CyrAz, you need to check the event log immediately after Service-Restart. What events are being logged?
Another thing: On the screenshot I can see that the agent chnages state also to a Warning, what is the State Change Context for that? What details are logged there?
You could do a verbose agent tracing, but it requires advanced SCOM knowledge in order to be able to extarct some useful information.

I would also try to filter the event log on the agent primary management server, in order to check whether there are any clues there

Regards,


(If the reply was helpful please don't forget to upvote and/or accept as answer, thank you)

Regards
Stoyan Chalakov

· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi

Thanks heaps for the reply.

Regarding restarting health-service and evnets, I commented on CyrAZ reply.

Interestingly I can see the performance view of the affected agent. Even Availability report is also green.

199702-image.png


199606-image.png



But agent is greued out. We are running SCOM 2019. Now the same thing is happening on two other agents too that I just approved with the different Primary Management Server. All affected three agents are in different timezones than their Primary management servers though.

0 Votes 0 ·
image.png (68.0 KiB)
image.png (50.7 KiB)
CyrAz avatar image
1 Vote"
CyrAz answered CyrAz edited

Any chance the agent is running on a domain controller? In that case, have a look at hslockdown : https://kevinholman.com/2016/11/04/deploying-scom-2016-agents-to-domain-controllers-some-assembly-required/
Have a look at it even if it's not a DC, actually

· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi @CyrAz and @StoyanChalakov

So, I fixed it. One of the management servers was greyed. As soon as I fixed that greyed MS, those three greyed agents became green right away. Now I will call it accidental fix but I want to know the reason. That Greyed MS wasn't not a primary server of any of the affected agents. Then how it could affect the status of other agents in console? Any idea?

1 Vote 1 ·
CyrAz avatar image CyrAz Pauciloquent ·

I can only make assumptions but I guess that the affected agents had failed over to that MS at some point.
I'm not too sure what's the criteria for an agent to failover, maybe TCP 5723 being unresponsive; which could explain why the agents didn't failover to another MS despite this one being greyed out.

0 Votes 0 ·
AndrewTabar-2806 avatar image
1 Vote"
AndrewTabar-2806 answered

Stop the Microsoft Monitoring Agent service.
Go into C:\Program Files\Microsoft Monitoring Agent\Agent and rename/delete the Health Service State directory.
(this is called "clearing cache")
Restart the Microsoft Monitoring Agent service.
Watch the Operations Manager event log, should start to see event ID 1201 events in the log if it's approved.

If it's not working, check the Microsoft Monitoring Agent certificate.
Start MMC
Add/Remove Snapin, select Certificates - Computer Account
Look in the Microsoft Monitoring Agents - Certificate folder. Make sure the certificate name exactly matches the FQDN of the server (this is the name of the server and the "Primary DNS Suffix" as seen in ipconfig /all). If it doesn't fix the Primary DNS Suffix, delete the cert, clear cache.

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.