question

flakey321 avatar image
0 Votes"
flakey321 asked flakey321 answered

Query Store - unable to get rid of last force failure status

Hi,

I've had a Query Store issue that i've not come across before with regard to the last_force_failure reason continuing to be reported long after the original issue has disappeared.

So I have a forced plan which for one short interval was bypassed and a worse plan was chosen but then the query went back to using it's original plan.
135520-image.png



It displayed the reason for failure as 'GENERAL_FAILURE'. Even though the query was now running on its original forced plan again it repeatedly reported (we have alerting set up for any plan forcing failures) that previously there had been a failure. I originally removed the plan from query store via sp_query_store_remove_plan and unforced/reforced the plan. The Stored Proc didn't run for a while, but the next time it did the error was back however in each run interval the stored proc had successfully used the forced plan i.e. it was still reporting the 'reason it last failed' rather than 'it's failed again this is the reason'.

I then took this a step further which i was sure would fix it. As well as unforcing the plan, i removed the actual query from query store using sp_query_store_remove_query. I waited until the next time the Stored Proc ran via the application and then forced again and could see it was using a totally different query_id and plan_id. I thought this was all resolved, but the last_failure_reason was populated again as soon as the stored proc ran again with the forced plan (and again i can see it was choosing the forced plan each time, no reoccurrence of it using a non-forced plan).

So for now i've left the plan unforced as it's actually choosing an optimal plan each time but i'm keen to get to the bottom of why this last_failure_reason keeps repopulating.

Any pointers appreciated.

sql-server-general
image.png (41.8 KiB)
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

ErlandSommarskog avatar image
0 Votes"
ErlandSommarskog answered

First of all, what does "SELECT @@version" report?

Next what do you get from this query:

SELECT * FROM sys.query_store_plan WHERE last_force_failure_reason <> 0

You said you removed the plan, but the only place I can find last_force_failure_reason is in sys.query_store_plan. So, one of these things must have happened:
1. The failure occurred again.
2. You removed a different plan.

What is a little tricky with plan forcing is that the actually plan may be a different plan, one which is "morally equivalent" to the original plan. This can lead to some confusion.

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

flakey321 avatar image
0 Votes"
flakey321 answered flakey321 commented

Hi Erland

Thanks for taking the time out to respond. We're on SQL2017 CU22, running the query

SELECT * FROM sys.query_store_plan WHERE last_force_failure_reason <> 0

yields no results which adds to the mystery, and i agree the only place i could find last_force_failure_reason was in sys.query_store_plan.

I recorded the plan i removed as below which matches the plan_id in the screenshot in my post, so i'm confident the correct plan was removed:

EXEC sp_query_store_remove_plan 24040434

and when i did my initial unforce/force after removing the plan I could see it had disappeared and only the original forced plan was there

136233-image.png



and checking the DMVs for all runtime intervals and the tracked queries GUI it consistently showed the forced plan used everytime, I didn't see any reoccurrence of the query using anything apart from the forced plan

I'm a real fan of Query Store and generally know my way round it with regard to failed forced plans but this one has me stumped


image.png (19.7 KiB)
· 4
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

So where do you see this "last forced failure" status? Is that in the UI? I rarely use the UI myself, so I don't really find my way there. Could you post a screenshot of where you see this?

0 Votes 0 ·
flakey321 avatar image flakey321 ErlandSommarskog ·

Hi Erland sorry for late reply have been off work for a few days.

So the last forced failure reason I got from a query i have that joins several DMVs, with the the failure reason coming from sys.query_store_plan The original issue I had was on the 21st September. The usual plan it should run as per my original screenshot was 24039277 and the one it briefly ran when it couldn't force (and caused the original error) was 24040434.

After I'd removed the plan from QS and unforced/forced the plan my DMV query now showed no failure reason against 24039277 for the historical runs as below:

137717-image.png

However the next time i had a batch of the Stored Proc running on the 23rd, it reverted to the previous failure reason for the forced plan 24039277 despite the fact it hadn't failed to use the plan, and no plan apart from 24039277 had been used since the original issue on the 21st

137813-image.png


0 Votes 0 ·
image.png (27.9 KiB)
image.png (12.3 KiB)

So the last forced failure reason I got from a query i have that joins several DMVs, with the the failure reason coming from sys.query_store_plan

As you might have guess, I am now going to ask if you can share that query? Else I have even less idea of what we are looking at.

0 Votes 0 ·
Show more comments
ErlandSommarskog avatar image
0 Votes"
ErlandSommarskog answered

But isn't what is happening that plan forcing keeps on failing every now and then? You removed the bad plan, but it seems that you kept the good plan.

What if leave things like they are for a while, but add force_failure_count to your query and see if it increments?

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

flakey321 avatar image
0 Votes"
flakey321 answered

That's the strange thing, whenever i've queried the DMVs i haven't seen it fail to force it again in any runtime intervals - just that one initial time. I had removed both the plan and the query from query store and it still alerted even though it was using totally new query and plan ids. Certainly the logical explanation is that it's failed to use the forced plan again but i can't see any evidence of this happening.

For now i've left the plan unforced as the 'suboptimal' plan was still well in the bounds of acceptability, and its actually used the optimal plan each time anyway without being forced so far. I suspect initially this was a stored procedure that autotune had picked up rather than one ourselves. I'll leave it like that unless we see issues with it again and then take it from there.

Thank you Erland for all the time you've taken out to look into this, it's appreciated.

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.