Our webapp started throwing SqlExceptions over the weekend, which quickly escalated this morning once the bulk of our users started using the app again. I managed to track the issue down to 1 db, which was pegged at 100% CPU. My first attempt to resolve it was to scale it up from a Basic to S0, which didn't help. I then tried stopping the webapp and restoring a copy of the DB to see if a new instance of the DB would be free from these issues - it didn't help. Then in a blind panic, I kept scaling up in the hope that at some point the issue would resolve. It finally did at S3.
So thankfully my webapp is now working again, but I don't want to be paying for an S3 DB when I clearly shouldn't be at that level, so I'd like to figure out what the cause of the issue is, and how to properly resolve it so that I can scale back down.
Under Diagnose, there is this message:
Active: At 02:26 AM, Saturday, 27 February 2021 UTC, the Azure monitoring system received the following information regarding your SQL database resource tht-uks1/tht-main:: We're sorry your SQL database is experiencing transient login failures. Currently, Azure shows the impacted time period for your SQL database resource at a two-minute granularity. The actual impact is likely less than a minute – average is 2s. We're working to determine the source of the problem
I don't know if this is related or not, but figured it was worth mentioning.
Can anyone point me in the right direction for understanding what went wrong?