Query Store - unable to get rid of last force failure status

flakey321 21 Reputation points
2021-09-27T13:58:52.437+00:00

Hi,

I've had a Query Store issue that i've not come across before with regard to the last_force_failure reason continuing to be reported long after the original issue has disappeared.

So I have a forced plan which for one short interval was bypassed and a worse plan was chosen but then the query went back to using it's original plan.
135520-image.png

It displayed the reason for failure as 'GENERAL_FAILURE'. Even though the query was now running on its original forced plan again it repeatedly reported (we have alerting set up for any plan forcing failures) that previously there had been a failure. I originally removed the plan from query store via sp_query_store_remove_plan and unforced/reforced the plan. The Stored Proc didn't run for a while, but the next time it did the error was back however in each run interval the stored proc had successfully used the forced plan i.e. it was still reporting the 'reason it last failed' rather than 'it's failed again this is the reason'.

I then took this a step further which i was sure would fix it. As well as unforcing the plan, i removed the actual query from query store using sp_query_store_remove_query. I waited until the next time the Stored Proc ran via the application and then forced again and could see it was using a totally different query_id and plan_id. I thought this was all resolved, but the last_failure_reason was populated again as soon as the stored proc ran again with the forced plan (and again i can see it was choosing the forced plan each time, no reoccurrence of it using a non-forced plan).

So for now i've left the plan unforced as it's actually choosing an optimal plan each time but i'm keen to get to the bottom of why this last_failure_reason keeps repopulating.

Any pointers appreciated.

SQL Server
SQL Server
A family of Microsoft relational database management and analysis systems for e-commerce, line-of-business, and data warehousing solutions.
12,759 questions
0 comments No comments
{count} votes

4 answers

Sort by: Most helpful
  1. Erland Sommarskog 101.4K Reputation points MVP
    2021-09-28T19:13:56.13+00:00

    First of all, what does "SELECT @@version" report?

    Next what do you get from this query:

    SELECT * FROM sys.query_store_plan WHERE last_force_failure_reason <> 0
    

    You said you removed the plan, but the only place I can find last_force_failure_reason is in sys.query_store_plan. So, one of these things must have happened:

    1. The failure occurred again.
    2. You removed a different plan.

    What is a little tricky with plan forcing is that the actually plan may be a different plan, one which is "morally equivalent" to the original plan. This can lead to some confusion.

    0 comments No comments

  2. flakey321 21 Reputation points
    2021-09-29T08:18:51.157+00:00

    Hi Erland

    Thanks for taking the time out to respond. We're on SQL2017 CU22, running the query

    SELECT * FROM sys.query_store_plan WHERE last_force_failure_reason <> 0

    yields no results which adds to the mystery, and i agree the only place i could find last_force_failure_reason was in sys.query_store_plan.

    I recorded the plan i removed as below which matches the plan_id in the screenshot in my post, so i'm confident the correct plan was removed:

    EXEC sp_query_store_remove_plan 24040434

    and when i did my initial unforce/force after removing the plan I could see it had disappeared and only the original forced plan was there

    136233-image.png

    and checking the DMVs for all runtime intervals and the tracked queries GUI it consistently showed the forced plan used everytime, I didn't see any reoccurrence of the query using anything apart from the forced plan

    I'm a real fan of Query Store and generally know my way round it with regard to failed forced plans but this one has me stumped


  3. Erland Sommarskog 101.4K Reputation points MVP
    2021-10-06T20:59:49.637+00:00

    But isn't what is happening that plan forcing keeps on failing every now and then? You removed the bad plan, but it seems that you kept the good plan.

    What if leave things like they are for a while, but add force_failure_count to your query and see if it increments?

    0 comments No comments

  4. flakey321 21 Reputation points
    2021-10-08T06:56:55.987+00:00

    That's the strange thing, whenever i've queried the DMVs i haven't seen it fail to force it again in any runtime intervals - just that one initial time. I had removed both the plan and the query from query store and it still alerted even though it was using totally new query and plan ids. Certainly the logical explanation is that it's failed to use the forced plan again but i can't see any evidence of this happening.

    For now i've left the plan unforced as the 'suboptimal' plan was still well in the bounds of acceptability, and its actually used the optimal plan each time anyway without being forced so far. I suspect initially this was a stored procedure that autotune had picked up rather than one ourselves. I'll leave it like that unless we see issues with it again and then take it from there.

    Thank you Erland for all the time you've taken out to look into this, it's appreciated.

    0 comments No comments