Monitoring SQL Backup Failures with Azure Operational Insights Search and Dashboards
Besides the SQL Server Assessment Intelligence Pack, you can also use Log Management functionality to collect the ‘Application’ event log from your SQL Servers and monitor activity from it with search.
For example, this is my simple recipe for finding SQL Backup Failures
Type=Event EventID=3041 EventLog=Application Source=MSSQLSERVER
That’s it. I can see in my workspace I had a few of those back in January:
I can slightly massage the output to get better readability by selecting only certain fields piping my results into the SELECT command
Type=Event EventID=3041 EventLog=Application Source=MSSQLSERVER | Select Computer,RenderedDescription
In my case I am investigating an occurrence in the past, so I want to understand why the failure occurred, so the next logical step would be to look for the time when the backup failure occurred, by interacting with the time bars on the top left under ‘narrow your results’ – clicking on the bars and narrowing down till we see the few individual events and a narrow enough time period
and now I want to see WHAT ELSE was happening on the same machine, so I add a filter for the ‘Computer’ field, and I remove everything else from my search:
This shows me WHAT ELSE I know about that machine around that time frame, and I see, among other things that are always found (i.e. events, securityevents don’t come at regular intervals, but all the times), I see there is ONE configuration change – interesting
Let’s drill into that… and we see that someone had installed a Defragmentation tool on the server around the same time as the backup failures!
I think we now understand why the backup failed that day…Operational Insights made it super easy to get to the root cause!