Downtime, SLA, and outages workbook
Introducing a simple way to calculate and report SLA (service-level agreement) for Web Tests through a single pane of glass across your Application Insights resources and Azure subscriptions. The Downtime and Outage report provides powerful pre-built queries and data visualizations to enhance your understanding of your customer's connectivity, typical application response time, and experienced down time.
The parameters set in the workbook influence the rest of your report.
App Insights Resources, and
Web Test parameters determine your high-level resource options. These parameters are based on Log Analytics queries and used in every report query.
Failure Threshold and
Outage Window allow you to determine your own criteria for a service outage, for example, the criteria for App Insights Availability alert based upon failed location counter over a chosen period. The typical threshold is three locations over a five-minute window.
Maintenance Period enables you to select your typical maintenance frequency and
Maintenance Window is a datetime selector for an example maintenance period. All data that occurs during the identified period will be ignored in your results.
Availability Target % specifies your target objective & takes custom values.
The overview page contains high-level information about your total SLA (excluding maintenance periods if defined), end to end outage instances, and application downtime. Outage instances are defined by when a test starts to fail until it is successful based on your outage parameters. If a test starts failing at 8:00 am and succeeds again at 10:00 am, then that entire period of data is considered the same outage.
You can also investigate your longest outage that occurred over your reporting period.
Some tests are linkable back to their Application Insights resource for further investigation but that is only possible in the Workspace-based Application Insights resource.
Downtime, outages, and failures
The Outages and Downtime tab has information on total outage instances and total down time broken down by test. The Failures by Location tab have a geo-map of failed testing locations to help identify potential problem connection areas.
Edit the report
You can edit the report like any other Azure Monitor Workbook. You can customize the queries or visualizations based on your team's needs.
The queries can all be run in Log Analytics and used in other reports or dashboards. Remove the parameter restriction and reuse the core query.
Access and sharing
The report can be shared with your teams, leadership, or pinned to a dashboard for further use. The user needs to have read permission/access to the Applications Insights resource where the actual workbook is stored.