Alert suppression (troubleshooting)

 

Unfortunately there is no way to troubleshoot if alert suppression works as expected, especially in the case when dynamic value to suppress against was used. By dynamic value I mean replacement that runtime must try to do in the case of using $Data/<xpath expression to retrieve value of data type property>$ inside of “SupressionValue” tag.

We recognized this problem and tried to address it with SP1 release of Operations Manager 2007.

Brief reminder of how suppression works. Runtime retrieves the values defined inside of “SuppressionValue” tags and calculates hash based on combination of such values and workflow ID. This hash later allows for recognition if alert history count is to be incremented (which means that alert is suppressed) or if brand new alert should be raised.

Again, in the case of $Data/<xpath expression to retrieve value of data type property>$ runtime will evaluate XPath expression against XML representation of data type serving as an input for alert generating module. If such operation yields a result, value is added for hash calculation. If XPath expression doesn’t yield any result, runtime will need to use complete string “ $Data/<xpath expression to retrieve value of data type property>$ ” as value for hash calculation. You can see that such string is treated as a constant and is not really affecting the calculation of the hash. This off course causes undefined behavior, especially in the case when XPath was supposed to yield unique value for better alert customization.

In order to help with troubleshooting, SP1 introduced an event (5402) written to “Operations Manager” event log. This event carries information which should help to recognize if property value replacement failed to yield result due to typo, non existence or some other reason. There is also alert raised to allow OpsMgr operator to recognize this problem without inspecting event log.

Sample event:

Event Type: Warning
Event Source: HealthService
Event Category: Health Service
Event ID: 5402
Date: 10/29/2007
Time: 9:26:27 AM
User: N/A
Computer: CUPIDDP13D

Description:
Parameter replacement during creation of the alert failed causing unexpected suppression used.

Alert: B5615F97-1D44-433F-D25C-2E8916D13498
Workflow: EventBased.Test.AlertFromEvent
Instance: sampleInstance
Instance ID: {BEC075CC-4008-5A4F-9D8D-6BC9C1012D36}
Management Group: sampleGroup

Failing replacement: $Data/UnreachableEventDisplayNumber$

How to troubleshoot:

1. Open Windows Power Shell

2. Get and store monitoring object

in case of sample events from above you run:

$mo = Get-MonitoringObject –Id “BEC075CC-4008-5A4F-9D8D-6BC9C1012D36”

3. Get a rule to recognize what management pack to change

in case of sample events from above you run:

$rule = Get-Rule -MonitoringObject $mo -Criteria "Name = 'EventBased.Test.AlertFromEvent'"

4. Retrieve management pack

in case of sample events from above you run:

$rule.GetManagementPack()

Resolution:

If management pack is sealed, please disable alert generating rule identified by the value of “Workflow” from event description. Then contact Management pack developer and request fix for alert suppression.

In the case it is your custom management pack, please correct XPath expression desired for alert suppression, where this supression is identified by “Failing replacement” inside of rule identified by “Workflow” from event description, increase version number and re-import corrected management pack.

In the case rule was created using authoring part of UI, please edit rule properties and correct XPath expression in suppression tab. Incorrect XPath is again identified by “Failing replacement” from event description.