Downtime, SLA, and outages workbook

Introducing a simple way to calculate and report SLA (service-level agreement) for Web Tests through a single pane of glass across your Application Insights resources and Azure subscriptions. The Downtime and Outage report provides powerful pre-built queries and data visualizations to enhance your understanding of your customer's connectivity, typical application response time, and experienced down time.

The SLA workbook template is accessible through the workbook gallery in your Application Insights resource or through the availability tab by selecting SLA Reports at the top. Screenshot of availability tab with SLA Reports highlighted.

Screenshot of the workbook gallery with downtime and outages workbook highlighted.

Parameter flexibility

The parameters set in the workbook influence the rest of your report.

 Screenshot of outage/maintenance parameters tab in the downtime and outages workbook.

Subscriptions, App Insights Resources, and Web Test parameters determine your high-level resource options. These parameters are based on log analytics queries and used in every report query.

Failure Threshold and Outage Window allow you to determine your own criteria for a service outage, for example, the criteria for App Insights Availability alert based upon failed location counter over a chosen period. The typical threshold is three locations over a five-minute window.

Maintenance Period enables you to select your typical maintenance frequency and Maintenance Window is a datetime selector for an example maintenance period. All data that occurs during the identified period will be ignored in your results.

Availability Target 9s specifies your Target 9s objective from two 9s to five 9s.

Overview page

The overview page contains high-level information about your total SLA (excluding maintenance periods if defined), end to end outage instances, and application downtime. Outage instances are defined by when a test starts to fail until it is successful based on your outage parameters. If a test starts failing at 8:00 am and succeeds again at 10:00 am, then that entire period of data is considered the same outage.

 GIF of overview page showing the overview table by test.

You can also investigate your longest outage that occurred over your reporting period.

Some tests are linkable back to their Application Insights resource for further investigation but that is only possible in the Workspace-based Application Insights resource.

Downtime, outages, and failures

The Outages and Downtime tab has information on total outage instances and total down time broken down by test. The Failures by Location tab have a geo-map of failed testing locations to help identify potential problem connection areas.

 GIF of Outages and Downtime tab and Failure by Location tab in the downtime and outages workbook.

Edit the report

You can edit the report like any other Azure Monitor Workbook. You can customize the queries or visualizations based on your team's needs.

 GIF of selecting the edit button to change the visualization to a pie chart.

Log Analytics

The queries can all be run in Log Analytics and used in other reports or dashboards. Remove the parameter restriction and reuse the core query.

 GIF of log query.

Access and sharing

The report can be shared with your teams, leadership, or pinned to a dashboard for further use. The user needs to have read permission/access to the Applications Insights resource where the actual workbook is stored.

 Screenshot of share this template.

Next steps