OpsMgr: Calculating the Overall Availability of Distributed Applications in Percentage

This post demonstrates how we can calculate the percentage of availability of distributed application managed objects over time, based on the state of a specific monitor in System Center Operations Manager, with a PowerShell script that leverages OpsMgr Cmdlets.

The following picture shows an example of an OpsMgr native dashboard that consist of 2 widgets created from the Speedometer Gauge Widget Template and the PowerShell Text Display Widget Template, displaying the percentage of availability of the Operations Manager Management Group distributed application object in two different ways. Both of these widgets are PowerShell-driven and use the same Powershell script from Sample Script 1 below.
 
image 

The following picture shows an example of an OpsMgr native dashboard with 1 widget created from the Blue Bar State Widget Template, displaying the percentage of availability and number of state changes of all the distributed application objects in the management group, within the last 15 hours. This widget is also PowerShell-driven and uses the same Powershell script from Sample Script 2 below.

image

The specific monitor used to calculate the percentage of availability of the Operations Manager Management Group distributed application object over the last X hours, is the Entity Health (System.Health.EntityState) aggregate rollup monitor targeted to it. The Entity Health rollup monitor is the root rollup monitor for all managed objects and it determines the overall health state of each managed object based on the worst state of any of its members (default policy) consisting of unit, aggregate rollup or dependency monitors running against that managed object.
image

Sample Script 1 calculates the percentage of availability of the Operations Manager Management Group distributed application object over X hours by summing up the total duration in seconds its Entity Health rollup monitor was in Healthy state between “now” and X hours ago, and dividing it with X * 3600.

Here is an example of how Sample Script 1 identifies state changes happening within the Entity Health rollup monitor , calculates the total time difference in seconds between each state change from Healthy to Not Healthy between “now” and 15 hours ago, and sums them all up in order to get the percentage of time the Operations Manager Management Group distributed application object was in Healthy state over the last 15 hours.

image

Sample Script 1:  

Sample Script 1 takes into account the following conditions:

  • If the initial state of the monitor is Healthy and No State Changes has been identified over the last X hours, then it is Available 100% of the time.
  • If the initial state of the monitor is Not Healthy (in Warning or Error or Uninitialized) and No State Changes has been identified over the last X hours, then it is Available 0% of the time.
  • If the first state change is a change from a Healthy to Not Healthy state, include the time difference between the start time and the time of that state change into the total time the monitor is in Healthy state.
  • If the last state change is a change from a Not Healthy to Healthy state, include the time difference between the time of that state change and the current time (now) into the total time the monitor is in Healthy state.
 
function New-Collection ( [type] $type ) 
{
   $typeAssemblyName = $type.AssemblyQualifiedName;
   $collection = new-object "System.Collections.ObjectModel.Collection``1[[$typeAssemblyName]]";
   return ,($collection);
}

$newline = "`r`n" 

#Last x hours UTC
$backdateHour  = 15  #NOTE: Replace with a numerical value for the time window in hours.

#Get Entity Health Monitor
$monitorObject = Get-SCOMMonitor -Name System.Health.EntityState

#Get the Distributed Application object. NOTE: Use the Object ID for more accuracy. 
$DistributedApp = get-scomclassinstance -displayname "Operations Manager Management Group"

#Add monitor to collection
$monitorCollection = new-collection $monitorObject.GetType()
$monitorCollection.Add($monitorObject)

#Get health state information of the Entity Health Monitor running against the Distributed Application object
$monitoringStates = $DistributedApp.GetMonitoringStates($monitorCollection)

#Get all state changes of the Entity Health Monitor running against the Distributed Application object
$stateChanges = $monitoringStates.GetMonitoringStateChangeEvents()

#Get start and end points of date time range
#Get current date time in UTC and date time X hours ago
$aggregationInterval = $backdateHour
$dt = New-TimeSpan -hour $aggregationInterval
$nowlocal = Get-Date
#Convert local time to UTC time
$now = $nowlocal.ToUniversalTime()
$from = $now.Subtract($dt)

#Get all state changes between date time range and sort based on state change time
$stateChangesSortedx = $stateChanges | where{$_.TimeGenerated -gt $from} | sort-object @{Expression={$_.TimeGenerated}; Ascending=$true} 

$index = 0
$TotalTimeDiff = 0

#If state changes exists ...
If($stateChangesSortedx.count -ne 0)
{

    foreach($stateChangesSorted in $stateChangesSortedx)
    {    
        if($index -lt 1)
        {
            If($stateChangesSorted.OldHealthState -eq "Success")
            {
             $TimeDiff = $stateChangesSorted.TimeGenerated - $from
          $TotalTimeDiff = $TotalTimeDiff + $TimeDiff.TotalSeconds
            }
            else  #New State Equals Success
            {  
                if($stateChangesSortedx.count -eq 1 -and $stateChangesSorted.NewHealthState -eq "Success") #Only 1 single state change
                {
                    $TimeDiff = $now - $stateChangesSorted.TimeGenerated
                    $TotalTimeDiff = $TotalTimeDiff + $TimeDiff.TotalSeconds               
                }
            }

        }
        else
        {
            If($stateChangesSorted.OldHealthState -eq "Success")
            {
                $TimeDiff = $stateChangesSorted.TimeGenerated-$stateChangesSortedx[$index-1].TimeGenerated
                $TotalTimeDiff = $TotalTimeDiff + $TimeDiff.TotalSeconds                
            }
         else
         {
         
             if($index -eq $stateChangesSortedx.count-1 -and $stateChangesSorted.NewHealthState -eq "Success")
                {
                    $TimeDiff = $now - $stateChangesSorted.TimeGenerated
                    $TotalTimeDiff = $TotalTimeDiff + $TimeDiff.TotalSeconds               
                }         
         }  
        }
        $index++
    }
}
else # if no state change exist
{
    if($monitoringStates.HealthState -eq "Success")
    {$TotalTimeDiff = $backdateHour*3600}
    else
    {$TotalTimeDiff=0}

}

#Calculate percentage of availability of the distributed application object
$PercentAvailable = [Math]::Round(($TotalTimeDiff/(3600*$backdateHour))*100,2)

if($monitoringStates.HealthState -eq "Success")
{
    $HealthState1 = "Healthy"
}
else
{
    $HealthState1 = $monitoringStates.HealthState
}

$result = "Current Health State of "+ $DistributedApp.displayName + " is " + $HealthState1 + ", was available " + [String]$PercentAvailable  +"% of the time over the last " + $backdateHour + " hour(s)." + $newline + $newline + "Number of State Changes: " + $stateChangesSortedx.count + $newline

$result

 

The following picture shows another example of an OpsMgr native dashboard with the Speedometer and Text Display widgets, displaying the percentage of availability of the Operations Manager Management Group distributed application object in two difference ways, but this time, its above the 80% threshold.

image 

Sample Script 2:  

Sample Script 2 is similar to Sample Scrip1. The only difference is that the script returns all distributed application objects under the System.Service class and loops through to calculate their overall percentage of availability to produce the following result:

image

 
function New-Collection ( [type] $type ) 
{
   $typeAssemblyName = $type.AssemblyQualifiedName;
   $collection = new-object "System.Collections.ObjectModel.Collection``1[[$typeAssemblyName]]";
   return ,($collection);
}

$newline = "`r`n" 

#Last x hours UTC
$backdateHour  = 15  #NOTE: Replace with a numerical value for the time window in hours.

$monitorObject = Get-SCOMMonitor -Name System.Health.EntityState

#Get parent Class for all Distributed Applications
$class = get-scomclass -Name System.Service
#Return all Distributed Application Objects
$DistributedApps = get-scomclassinstance -clas $class

#Add monitor to collection
$monitorCollection = new-collection $monitorObject.GetType()
$monitorCollection.Add($monitorObject)

ForEach($DistributedApp in $DistributedApps)
{
#Get health state information of the Entity Health Monitor running against the Distributed Application object
$monitoringStates = $DistributedApp.GetMonitoringStates($monitorCollection)

#Get all state changes of the Entity Health Monitor running against the Distributed Application object
$stateChanges = $monitoringStates.GetMonitoringStateChangeEvents()

#Get start and end points of date time range
#Get current date time in UTC and date time X hours ago
$aggregationInterval = $backdateHour
$dt = New-TimeSpan -hour $aggregationInterval
$nowlocal = Get-Date
#Convert local time to UTC time
$now = $nowlocal.ToUniversalTime()
$from = $now.Subtract($dt)

#Get all state changes between date time range and sort based on state change time
$stateChangesSortedx = $stateChanges | where{$_.TimeGenerated -gt $from} | sort-object @{Expression={$_.TimeGenerated}; Ascending=$true} 

$index = 0
$TotalTimeDiff = 0

#If state changes exists ...
If($stateChangesSortedx.count -ne 0)
{

    foreach($stateChangesSorted in $stateChangesSortedx)
    {    
        if($index -lt 1)
        {
            If($stateChangesSorted.OldHealthState -eq "Success")
            {
             $TimeDiff = $stateChangesSorted.TimeGenerated - $from
          $TotalTimeDiff = $TotalTimeDiff + $TimeDiff.TotalSeconds
            }
            else  #New State Equals Success
            {  
                if($stateChangesSortedx.count -eq 1 -and $stateChangesSorted.NewHealthState -eq "Success") #Only 1 single state change
                {
                    $TimeDiff = $now - $stateChangesSorted.TimeGenerated
                    $TotalTimeDiff = $TotalTimeDiff + $TimeDiff.TotalSeconds               
                }
            }

        }
        else
        {
            If($stateChangesSorted.OldHealthState -eq "Success")
            {
                $TimeDiff = $stateChangesSorted.TimeGenerated-$stateChangesSortedx[$index-1].TimeGenerated
                $TotalTimeDiff = $TotalTimeDiff + $TimeDiff.TotalSeconds                
            }
         else
         {
         
             if($index -eq $stateChangesSortedx.count-1 -and $stateChangesSorted.NewHealthState -eq "Success")
                {
                    $TimeDiff = $now - $stateChangesSorted.TimeGenerated
                    $TotalTimeDiff = $TotalTimeDiff + $TimeDiff.TotalSeconds               
                }         
         }  
        }
        $index++
    }
}
else # if no state change exist
{
    if($monitoringStates.HealthState -eq "Success")
    {$TotalTimeDiff = $backdateHour*3600}
    else
    {$TotalTimeDiff=0}
}

$PercentAvailable = [Math]::Round(($TotalTimeDiff/(3600*$backdateHour))*100,2)

#Calculate percentage of availability of the distributed application object
if($monitoringStates.HealthState -eq "Success")
{
    $HealthState1 = "Healthy"
}
else
{
    $HealthState1 = $monitoringStates.HealthState
}

$result = "Current Health State of "+ $DistributedApp.displayName + " is " + $HealthState1 + ", was available " + [String]$PercentAvailable  +"% of the time over the last " + $backdateHour + " hour(s)." + $newline + $newline + "Number of State Changes: " + $stateChangesSortedx.count + $newline

$result
}

IMPORTANT NOTE about Sample Script 1 and Sample Script 2:

Both sample scripts assume that the distributed application objects are always NOT in a greyed out state and hence also assumes that the management server(s) managing them are always available and running.

Disclaimer:
All information on this blog is provided on an as-is basis with no warranties and for informational purposes only. Use at your own risk. The opinions and views expressed in this blog are those of the author and do not necessarily state or reflect those of my employer.