Troubleshooting Cluster Discovery in Operations Manager

In this post, we'll be focusing on Cluster Discovery issues. Troubleshooting discovery issues in SCOM is pretty easy when it comes to the day-to-day workflows. All that we do is look through the discovery in the MP – find its data source (VBScript mostly) and try executing it manually. However, with Cluster Discovery the procedure is not that straightforward. Cluster discovery in SCOM is carried out by Cluster Modules that are inbuilt in SCOM. The working of these modules is not exposed like how discovery logics are exposed in management packs. Hence we will have to depend on the events logged in ETL traces.

The most common symptom that I hear from my customers is that their SQL instances residing in various cluster instances aren't discovered. While troubleshooting, I’ve seen that it finally ends up being an issue with the cluster discovery.

There are 2 types of cluster discovery issues that are common:

  1. There are many cluster instances in the server and only one shows as discovered.
  2. None of the cluster instances in the server is discovered.

(The procedure to check these have been covered below.)

Before we go through both of these scenarios, there are a few common troubleshooting steps that we need to follow. Start by checking the following:

  1. Do the nodes that host the cluster have SCOM agents installed?
  2. Do they show up as monitored in SCOM console? (Healthy, Warning or Critical – OK, Greyed out – Not OK)
  3. Do you have the latest Windows Core OS (Base OS) management packs?
  4. Is Agent Proxy enabled for the nodes that host the cluster? (Follow: https://technet.microsoft.com/en-us/library/hh264858.aspx?f=255&MSPPError=-2147217396).

Great! We should be more than set now. Now, open the Agentless Managed view in Administration pane. Look for the cluster we are trying to discover against the list that is displayed in this view. Do you see your cluster here? If not, go back to the previous 4 steps and ensure all of them are completed. If you have just followed the above steps and you don’t see your cluster listed, wait for a few more minutes & try refreshing the view again.

If the cluster shows up under Agentless Managed, we need to now check if we can find the individual cluster instances/services. For this, navigate to the Monitoring Pane and choose Discovered Inventory. In the right pane, choose Change Target Type. Look for Windows Cluster Service (For Virtual Server) . Click on OK. Now, type in the cluster name and identify how many instances show up. If you find all your cluster services in the list, the above steps did the trick for you. However, if you don’t find one or all of the cluster services, follow the post further.

From here, we will divide into two tracks covering each scenario respectively.

SCENARIO 1: I see only one cluster instance in the Discovered Inventory (Windows Cluster Service for Virtual Server) view

Like I had mentioned, this is a very common scenario and the troubleshooting is not complicated. Try following these steps:

  1. Open SCOM Console and navigate to the Authoring pane.
  2. Select Object Discoveries under Management Pack Objects.
  3. If you find that the view has been scoped already, remove the scope.
  4. Now, look for the following discovery: Windows Clustering Discovery
  5. If you find many, choose the one you see under Windows Cluster or Virtual Server.
  6. Now, right click on the discovery and choose Overrides > Override the Object Discovery.
  7. The next context menu options are left to you. If you want to override it for all the servers, choose the first option. If you would like to have a granular setting, choose an appropriate option that suits.
  8. You will see that there are three overrideable parameters shown and that Multiple Servers Discovery is one of them.
  9. Enable this parameter and change its value from False to True.
  10. Scroll towards the extreme right and check the Enforced option.
  11. Click on Apply & OK.

From here on it could take a few minutes. Grab some coffee and when you come back, you should see the other cluster instances to be discovered as well. If you are looking to see the SQL objects residing in the cluster, you should probably give it some more time.

SCENARIO 2: I don’t see any cluster instances showing up in the view

To troubleshoot this, we would need to collect ETL traces.

  1. Go to one of the nodes that hosts the cluster and open a cmd prompt in elevated mode.
  2. Type the following commands at the command prompt:

cd C:\Program Files\<operations manager folder \server>\tools

StopTracing.cmd

StartTracing.cmd VER

  1. Restart the Microsoft Monitoring Agent service from Services.msc.
  2. Let the trace run for 15 minutes and then type the following commands:

StopTracing.cmd

FormatTracing.cmd

  1. Once the formatting completes, you will find the logs under C:\Windows\Logs\OpsMgrTrace
  2. Look for the file: TracingGUIDSNative.log
  3. Open the log and look for the following line:

[ModulesClusterLibrary][][Error][][CMOMClusterMonitorDiscovery::NotificationImpl]      [MOMClusterMonitorDiscovery_cpp292]Some of the Virtual server critical (or key) properties is (are) NULL

The above line suggests that the cluster discovery ran but when parsing through the resources, it encountered a number of orphan entries. To resolve the discovery issue, all that we need to do now is clear the Windows Cluster orphan entries from the registry.

Hope that helps!

Do let me know through the comments below if you ran into any issues when implementing this.

- Sanjeev