Modifying single-value monitors to allow for multiple possible return codes

I recently got a question in the Cross Platform Forums about an issue a customer was seeing with monitors for their network cards. Although this issue was with Solaris, it really applies to any OS. The issue was that they used a dual-NIC failover setup where one NIC is a backup in case the first one fails. The failure of the first NIC activates the second, which takes over until the first one can be replaced. The issue is that the backup NIC, when in a standby state, is not “plumbed” – that is, it does not have an assigned IP address and is shown as enabled but offline. When the network provider of the cross platform agent does its data gathering, it polls the attached NICs and returns a code based on the status it finds. In the case of an active NIC, it returns “2”. For the offline NIC, it returns “6”. The problem is that the default monitor examines the returned status code and compares it against a single value (2) to determine if the NIC is OK or not. Even though an admin expects that a value of “6” is still a good value, Operations Manager’s monitor configuration causes this to be shown as an error.

So how do you make it so that OpsMgr will accept multiple values as a “success” code? This is where I had to enhance my education on management packs and overriding classes within existing MPs. What we have to do is override the existing monitor with a new monitor that we create that uses a different evaluation type. Looking at the list of available expression evaluators, I found that there was a “RegExExpression” evaluation type, which is exactly what I needed. So what I did was build a barebones MP that contained enough information to build the monitor to override the existing monitor (the code shown below). Then after importing the MP, I just went and disabled the existing monitor for the class I wanted to change. That’s it! Now I had a monitor that would choose from multiple possible return codes and show them as success codes, rather than being forced to accept only one possible success code.

 

In the example below, I use “MatchesRegularExpression” to compare the return code to a RegEx as “2|6”. This translates (for those of you not familiar with regular expressions) to “2 or 6”. I could just as easily put more complex evaluations in there to check for even more return codes, or possibly text strings. The goodness here is that I don’t have to write a “not” regular expression to be opposite to the success evaluation. I just use the “DoesNotMatchRegularExpression” evaluation and it’s taken care of for me.

 

So here are the exact steps (modify as necessary for other operating systems):

  1. Save the following code as Microsoft.Linux.Solaris.Overrides.xml
  2. Import the MP into OpsMgr.
  3. Go to Authoring > Monitors > Solaris 10 Network Adapter > Entity Health > Availability
  4. You should see two monitors there. one is the original and one is the custom one you just imported
  5. Right-click on "network Adapter Health" and select Overrides > Disable the Monitor > For all objects of class: Solaris 10 Network Adapter
  6. Click Yes to accept

Your monitors should now accept 2 or 6 as valid result codes.

 <ManagementPack xsi:noNamespaceSchemaLocation="..\..\..\..\ext\MPSchema\ManagementPackSchema.xsd" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">  <Manifest>    <Identity>      <ID>Microsoft.Linux.Solaris.Overrides</ID>      <Version>1.0.0.0</Version>    </Identity>    <Name>_Solaris Overrides for NIC Failover</Name>    <References>      <Reference Alias="System">        <ID>System.Library</ID>        <Version>6.1.7023.0</Version>        <PublicKeyToken>31bf3856ad364e35</PublicKeyToken>      </Reference>      <Reference Alias="SystemHealth">        <ID>System.Health.Library</ID>        <Version>6.1.7023.0</Version>        <PublicKeyToken>31bf3856ad364e35</PublicKeyToken>      </Reference>      <Reference Alias="Unix">        <ID>Microsoft.Unix.Library</ID>        <Version>6.1.7000.256</Version>        <PublicKeyToken>31bf3856ad364e35</PublicKeyToken>      </Reference>      <Reference Alias="Linux">        <ID>Microsoft.Linux.Library</ID>        <Version>6.1.7000.256</Version>        <PublicKeyToken>31bf3856ad364e35</PublicKeyToken>      </Reference>         <Reference Alias="Solaris10">        <ID>Microsoft.Solaris.10</ID>        <Version>6.1.7000.256</Version>        <PublicKeyToken>31bf3856ad364e35</PublicKeyToken>      </Reference>    </References>  </Manifest>  <TypeDefinitions>    <MonitorTypes>      <UnitMonitorType ID="Microsoft.Linux.Solaris.Overrides.WSMan.Status.RegExFiltered.MonitorType" Accessibility="Public">        <MonitorTypeStates>          <MonitorTypeState ID="StatusOK"/>          <MonitorTypeState ID="StatusFailed"/>        </MonitorTypeStates>        <Configuration>          <xsd:element name="TargetSystem" type="xsd:string" />          <xsd:element name="Uri" type="xsd:string" />          <xsd:element name="Filter" type="xsd:string" minOccurs="0" maxOccurs="1" />          <xsd:element name="SplitItems" type="xsd:boolean"/>          <xsd:element name="Interval" type="xsd:unsignedInt" />          <xsd:element name="SyncTime" type="xsd:string" minOccurs="0" maxOccurs="1" />          <xsd:element name="InstanceName" type="xsd:string" />          <xsd:element name="InstanceProperty" type="xsd:string" />          <xsd:element name="Status" type="xsd:string" />          <xsd:element name="SuccessRegExp" type="xsd:string" />        </Configuration>        <OverrideableParameters>          <OverrideableParameter ID="Interval" ParameterType="int" Selector="$Config/Interval$"/>          <OverrideableParameter ID="SyncTime" ParameterType="string" Selector="$Config/SyncTime$"/>          <OverrideableParameter ID="SuccessRegExp" ParameterType="string" Selector="$Config/SuccessRegExp$"/>        </OverrideableParameters>        <MonitorImplementation>          <MemberModules>            <DataSource ID="DS" TypeID="Unix!Microsoft.Unix.WSMan.TimedEnumerator.Filtered">              <TargetSystem>$Config/TargetSystem$</TargetSystem>              <Uri>$Config/Uri$</Uri>              <Filter>$Config/Filter$</Filter>              <SplitItems>true</SplitItems>              <Interval>$Config/Interval$</Interval>              <SyncTime>$Config/SyncTime$</SyncTime>              <InstanceName>$Config/InstanceName$</InstanceName>              <InstanceProperty>$Config/InstanceProperty$</InstanceProperty>            </DataSource>            <ConditionDetection ID="ErrorFilter" TypeID="System!System.ExpressionFilter">              <Expression>                <RegExExpression>                  <ValueExpression>                    <XPathQuery Type="String">$Config/Status$</XPathQuery>                  </ValueExpression>                  <Operator>DoesNotMatchRegularExpression</Operator>                  <Pattern>$Config/SuccessRegExp$</Pattern>                </RegExExpression>              </Expression>            </ConditionDetection>            <ConditionDetection ID="SuccessFilter" TypeID="System!System.ExpressionFilter">              <Expression>                <RegExExpression>                  <ValueExpression>                    <XPathQuery Type="String">$Config/Status$</XPathQuery>                  </ValueExpression>                  <Operator>MatchesRegularExpression</Operator>                  <Pattern>$Config/SuccessRegExp$</Pattern>                </RegExExpression>              </Expression>            </ConditionDetection>          </MemberModules>          <RegularDetections>            <RegularDetection MonitorTypeStateID="StatusOK">              <Node ID="SuccessFilter">                <Node ID="DS"/>              </Node>            </RegularDetection>            <RegularDetection MonitorTypeStateID="StatusFailed">              <Node ID="ErrorFilter">                <Node ID="DS"/>              </Node>            </RegularDetection>          </RegularDetections>        </MonitorImplementation>      </UnitMonitorType>    </MonitorTypes>  </TypeDefinitions>  <Monitoring>    <Monitors>      <UnitMonitor ID="Microsoft.Linux.Solaris.Overrides.NetworkAdapter.Health.Monitor"    Accessibility="Public" Target="Solaris10!Microsoft.Solaris.10.NetworkAdapter" TypeID="Microsoft.Linux.Solaris.Overrides.WSMan.Status.RegExFiltered.MonitorType" Enabled="true" ParentMonitorID="SystemHealth!System.Health.AvailabilityState">        <Category>PerformanceHealth</Category>        <AlertSettings AlertMessage="Microsoft.Linux.Solaris.Overrides.NetworkAdapter.Health.AlertMessage">          <AlertOnState>Warning</AlertOnState>          <AutoResolve>true</AutoResolve>          <AlertPriority>Normal</AlertPriority>          <AlertSeverity>Warning</AlertSeverity>          <AlertParameters>            <AlertParameter1>$Data/Context/Value$</AlertParameter1>          </AlertParameters>        </AlertSettings>        <OperationalStates>          <OperationalState HealthState="Success"    MonitorTypeStateID="StatusOK"     ID="StatusOK"/>          <OperationalState HealthState="Warning"    MonitorTypeStateID="StatusFailed" ID="StatusFailed"/>        </OperationalStates>        <Configuration>          <TargetSystem>$Target/Host/Property[Type="Unix!Microsoft.Unix.Computer"]/NetworkName$</TargetSystem>          <Uri>http://schemas.microsoft.com/wbem/wscim/1/cim-schema/2/SCX_IPProtocolEndpoint?__cimnamespace=root/scx</Uri>          <Filter/>          <SplitItems>false</SplitItems>          <Interval>300</Interval>          <InstanceName>$Target/Property[Type="Unix!Microsoft.Unix.LogicalDevice"]/DeviceID$</InstanceName>          <InstanceProperty>//*[local-name()="Name"]</InstanceProperty>          <Status>//*[local-name()="EnabledState"]</Status>          <SuccessRegExp>2|6</SuccessRegExp>        </Configuration>      </UnitMonitor>    </Monitors>  </Monitoring>  <Presentation>    <StringResources>      <StringResource ID="Microsoft.Linux.Solaris.Overrides.NetworkAdapter.Health.AlertMessage" />    </StringResources>  </Presentation>  <LanguagePacks>    <LanguagePack ID="ENU" IsDefault="true">      <DisplayStrings>        <DisplayString ElementID="Microsoft.Linux.Solaris.Overrides.NetworkAdapter.Health.AlertMessage">          <Name>Network Adapter Connection Health</Name>          <Description>Monitors the network adapter connection's health. </Description>        </DisplayString>        <DisplayString ElementID="Microsoft.Linux.Solaris.Overrides.NetworkAdapter.Health.Monitor">          <Name>Solaris 10 Network Adapter Health</Name>          <Description>Network Adapter Health</Description>        </DisplayString>        <DisplayString ElementID="Microsoft.Linux.Solaris.Overrides.NetworkAdapter.Health.Monitor" SubElementID="StatusOK">          <Name>Network Adapter Health is OK</Name>          <Description>Network Adapter Health is OK</Description>        </DisplayString>        <DisplayString ElementID="Microsoft.Linux.Solaris.Overrides.NetworkAdapter.Health.Monitor" SubElementID="StatusFailed">          <Name>Network Adapter Health is not OK</Name>          <Description>Network Adapter Health is not OK</Description>        </DisplayString>      </DisplayStrings>      <KnowledgeArticles>        <KnowledgeArticle ElementID="Microsoft.Linux.Solaris.Overrides.NetworkAdapter.Health.Monitor" Visible="true">          <MamlContent>            <maml:section xmlns:maml="http://schemas.microsoft.com/maml/2004/10">              <maml:title>Summary</maml:title>              <maml:para>This monitor generates an alert when the agent detects that the network adapter has been disconnected from the network and no longer has network connectivity.</maml:para>              <maml:para>If the computer has multiple network adapters, the alert may arrive before network connectivity for the effected adapter has been reestablished. However, remote clients and applications may still have difficulty accessing resources on the computer despite the other adapter or adapters. In addition, the local computer may not be able to access some network resources.</maml:para>              <maml:para />            </maml:section>            <maml:section xmlns:maml="http://schemas.microsoft.com/maml/2004/10">              <maml:title>Causes</maml:title>              <maml:para>Your computer's network adapter lost its connection to the network.</maml:para>              <maml:para>The adapter's connection to the network can be lost if you remove a network cable from your network adapter or if you are roaming between wireless access points with a mobile system. Other possible causes include network issues, firewall issues, or a malfunction of the network adapter or its driver.</maml:para>              <maml:para />            </maml:section>            <maml:section xmlns:maml="http://schemas.microsoft.com/maml/2004/10">              <maml:title>Resolutions</maml:title>              <maml:para>If your computer is connected to the network by cable, confirm that the cable is plugged in properly. If you have a wireless network connection, confirm that you have a signal and the proper credentials for the wireless network.</maml:para>              <maml:para>If the network connection is working properly, check the following possible causes and take corrective action:</maml:para>              <maml:list>                <maml:listItem>                  <maml:para>The network is down. Try to ping or traceroute to the host.</maml:para>                </maml:listItem>                <maml:listItem>                  <maml:para>The firewall on your computer is blocking network broadcast traffic.</maml:para>                </maml:listItem>                <maml:listItem>                  <maml:para>Your computer's network adapter or driver is not functioning correctly.</maml:para>                </maml:listItem>              </maml:list>              <maml:para />            </maml:section>          </MamlContent>        </KnowledgeArticle>      </KnowledgeArticles>    </LanguagePack>  </LanguagePacks></ManagementPack>