SNMP v3 discovery not working

T_Schneider 186 Reputation points
2020-10-13T11:02:26.893+00:00

Hi all,

some information upfront:

  • SNMP v1/v2 is working ok (APC UPS units)
  • firewall is turned off on management servers
  • SNMP v3 is properly configured on target system and can be queried ok using Paessler SNMP Tester

All of our APC UPS units are monitored via SCOM 2019 UR1 using SNMP v1/v2. Now the devices should be reconfigured for v3 to make them more secure. However, SCOM is not able to discover them. As mentioned, the credentials and setup have been confirmed using SNMP Tester from the very same machine as the discovery runs from.

I can see events 120xx in the OpsMgr event log stating that the discovery is running:

Event 12008

Discovery Completed   
Discovery type: FullDiscovery   
Devices discovered: 0   
Windows computers filtered: 0   
Devices in pending list: 1   
Devices excluded: 0   
Duration Total (sec): 23   
Duration of Probing (sec): 23   
Duration of Processing (sec): 0   
  
Workflow name: discovery09f29983.328e.48b1.9878.bec925cf1530   
Instance name: APCv3   
Instance ID: {823D56A6-EBA9-72AB-F1EA-1525D0AB636C}   

I've used the OpsMgr traces but couldn't see anything in there. Using Microsoft NetMonitor shows that data packets are sent to the APC device, but no response is received:
31967-grafik.png

Using SNMP Test I can see the following communication (taken from the same management server):
32013-grafik.png

I've tried a lot but are currently at a loss. Has anybody run into a similar issue ?

Thanks
Thorsten

Operations Manager
Operations Manager
A family of System Center products that provide infrastructure monitoring, help ensure the predictable performance and availability of vital applications, and offer comprehensive monitoring for datacenters and cloud, both private and public.
1,413 questions
0 comments No comments
{count} votes

Accepted answer
  1. T_Schneider 186 Reputation points
    2021-11-25T09:54:29.027+00:00

    We finally got it working !

    After having worked with Microsoft engineers for many many month they now provided us with a new sm-snmp.dll that is able to discover and monitor APC SNMPv3 devices.
    The problem was that they did not follow RFC 3414 on all requirements and missed to populate some fields and had wrong chraracters in another. So a successful discovery depends on how strict the target system is. Apparently, the APC devices are while Cisco switches are more relaxed.

    Most likely this fix will be included in SCOM 2019 UR4

    Thorsten

    1 person found this answer helpful.

4 additional answers

Sort by: Most helpful
  1. SChalakov 10,261 Reputation points MVP
    2020-10-13T11:38:30.553+00:00

    Hi Thorsten,

    any other related events on your management servers? Please check on all servers, which are part of the network device monitoring resource pool.
    Firewall is always a topic, but surely not in your case, so nice try with disabling it :)
    I would also check in the SCOM conolse, under "Pending management" you should also be able to see some kind of reason for this? Can you please check it out?
    The other thing to doiuble check are your run as accounts and the requirements regarding those:

    Run As accounts for network monitoring in Operations Manager

    Last. but not least. According to:

    Monitoring networks by using Operations Manager

    after the ping and the initial contact over SNMP, SCOM should send a SNMP Get request. Do you see such request in the network traces?

    I am just thinking out loud, hoping you might recall something that went forgotten.

    Regards,

    ----------

    (If the reply was helpful please don't forget to upvote or accept as answer, thank you)
    Stoyan

    1 person found this answer helpful.
    0 comments No comments

  2. T_Schneider 186 Reputation points
    2020-10-13T12:58:54.173+00:00

    Hi Stoyan,

    thanks for your suggestions. I did read through them and can confirm that everything is configured as it should be. There are no additional messages in the event log. In Network Pending view I can see the to be monitored device as "No Response SNMP".

    The network trace does not show any additional traffic apart from the four attempts as shown in the first screenshot I posted before. And this is the problem I'm having here as we do not see a response from the device. Using the SNMP Tester from the same management server we can see the communication going back and forth.

    I have no idea why the APC unit would not send a response back to the discovery server. Even if some parameter is incorrect it should send some information back. Since the SNMP tester sits directly on the network port it would see all packets first before they hit SCOM.

    Thanks
    Thorsten

    1 person found this answer helpful.
    0 comments No comments

  3. SChalakov 10,261 Reputation points MVP
    2020-10-14T08:11:27.947+00:00

    Hi Thorsten,

    very odd indeed. I am not quite sure how I would troubleshoot this. Usually such devices have a configuration option to which hosts they can send SNMP responses, but in your case this is also not a topic, because you mentioned that they have been monitored, using SNMPv1 or v2.
    Is there an option to involve the vendor in a way? If they are not responding to SCOM, it can be some config related to SNMPv3 on the APC end.

    Regards,
    Stoyan

    0 comments No comments

  4. T_Schneider 186 Reputation points
    2021-03-05T09:05:24.597+00:00

    We've been going back and forth between Microsoft support and APC support. Lots of network traces were generated, we even built a temporary SCOM environment on the same subnet as the APC device to rule out any network related issues.

    It looks like the issue is in SCOM not following the SNMP v3 implementation to 100% per RFC 3414. During the discovery phase traffic is unencrypted and unauthorized. The msgAuthenticationParameters should be set to a zero-length value. In the network trace it can be seen that SCOM fills that parameter with 12 octects of zeroes. Technically, that is not a zero-length value. The APC device in turn does not accept this and therefore refuses to respond.

    And now we are in between not able to solve that.... Neither of them seems to be willing to adjust their code.

    Let's see if we can get some traktion in this case

    Thorsten

    0 comments No comments