OPS116: Monitoring & Responding to alerts in hybrid environments using Azure Monitor

A deep dive of the framework Microsoft Retail has leveraged over the last 3-4 years to monitor all their on-prem system, including in-store Video walls and others. It's based on Azure Public-Offering technologies. It leverages Application Insights, OMS (SCOM too), Log Analytics, Azure Storage (Blob/Tables), Azure Automation, and PowerShell.

The goal is to provide an extensible framework that provides the ability to automate alert resolution through azure automation/hybrid technologies. In this session we will go over how you can respond to an alert. When an alert (either from Log Analytics via App Insights, OMS, SCOM, external data) is triggered, it triggers an action (defined by the action group) that calls the framework entry-point (basically a webhook or normal url), the framework looks up metadata for the component that triggered the alert, and calls the user-defined self-healing PowerShell script, if the self-healing script fails, or doesn't exist, a support ticket (using your own ticketing system) is created for the owner of the component, and any/all details of the alert and subsequent root cause data gathered by self-healing script are added to the ticket.

The framework logs the call and all actions taken to it's own App Insights instance, so at the end of the day you can report on how many alerts fired, and how many the framework resolved/alerted on. The goal is to help customers develop a problem management/SRE mindset to identify and address issues via automation and prioritization.

Free up time by automating the little things so you can focus on the larger priorities.

✔️ Resources:

  • IT Ops Talks Hybrid Event: https://aka.ms/ITOpsTalks
  • IT Ops Talk Community Chat: https://aka.ms/ops116-chat
  • Azure Alerting: https://aka.ms/ops116-AzureAlerts
  • Azure Action Groups: https://aka.ms/ops116-AzureActionGroups
  • Azure Automation: https://aka.ms/ops116-AzureAutomation
  • Azure Hybrid Workers: https://aka.ms/ops116-HybridWorker
  • Azure Functions: https://aka.ms/ops116-AzureFunctions

🔴 To watch more sessions from the IT Ops Talks: All Things Hybrid event check out our playlist: https://www.youtube.com/playlist?list=PLjt5SKzX1iI8k8_I80quMWgeNdSGyXzaF

🔖 Chapters:

  • 0:00 Introduction
  • 1:33 Agenda
  • 2:00 History
  • 5:15 Design
  • 6:55 Overview of the response framework
  • 10:45 Implementation
  • 14:40 Step through Explanation
  • 30:59 Real World Example
  • 49:00 Additional uses/POCs
  • 58:30 Wrap Up