question

UdhamSingh-6769 avatar image
0 Votes"
UdhamSingh-6769 asked UdhamSingh-6769 commented

Does AlwaysOn setup adds to row versioning on Primary replica

Hi,

We have 4 replica set of an AOAG in SQL server 2016 Enterprise edition, 2 sync and 2 async. We have notice that on primary replica we are getting "NonSnapshot Version Transactions" counter value in sys.dm_os_performance_counters. And hence sometime it is triggering the "Longest Transaction Running Time" alert threshold > 50.

Nowhere we are using\changing the isolation level to SNAPSHOT or READ_COMMITED_ISOLATION level, it is the default one i.e. READ COMMITTED for every transaction that is happening.

Any idea how row versioning is happening on Primary replica.

Regards,
Udham Singh

sql-server-generalsql-server-transact-sql
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

ErlandSommarskog avatar image
0 Votes"
ErlandSommarskog answered UdhamSingh-6769 commented

I assume that at least one of your secondary replicas is a readable secondary? That will indeed triggering vesion-store handling on the primary, in the same manner as if you had enabled READ_COMMITTED_SNAPSHOT.

· 5
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Yes my secondary replicas are Readable. Well that make sense.
But still I don't understand it completly, could you share any blog post\link, where this is explained in detail.

0 Votes 0 ·

This was my top hit on Google when I searched for "Readable secondary enables version store". I did not read it myself, but since I know the author well, I have faith in that it will be helpful to you. Dmitri knows his stuff.
https://www.apress.com/de/blog/all-blog-posts/readable-secondaries-in-sql-server/16064064

1 Vote 1 ·

But I have understood from above shared blog and other blogs is:

When SI and/or RCSI are not enabled on the primary replica, but the secondary replica is enabled for read workload.
The row version is only generated on the secondary replica; because RCSI or SI is not enabled on the primary replica, there is really no need to create row versions there.
The row versions need to be generated on the secondary replica, which means that the 14-byte overhead needs to be added to the new and modified rows on the primary, because the primary and secondary replicas must be physically identical. "


So if row versioning is not happening on Primary replica then why "Longest Transaction Running Time" counter is showing the activity on primary replica ?

0 Votes 0 ·

Today I had an occurrence of this alert, we had and TRNSACTION running on Primary replica under "BEGIN TRAN" for longer duration. On Secondary also same alert triggered at same time. Please see the details below.

From Primary:
tran_elapsed_time_ss : 876
WaitType : MISCELLANEOUS
open_tran :1
status : sleeping

From Secondary:
CMD : DB STARTUP
TRANSACTION_ISOLATION_LEVEL : READ COMMITTED
WaitType : REDO_THREAD_PENDING_WORK
open_tran :0
status : background
tran_elapsed_time_ss : 890
blocked: 0

0 Votes 0 ·

This is what I have got. Any idea if Row versioning is not happening on Primary then why "Longest Running Tran TIme" counter is showing the activity.


135846-alwayson-row-versioning.gif


0 Votes 0 ·
Yufeishao-msft avatar image
0 Votes"
Yufeishao-msft answered UdhamSingh-6769 commented

Hi @UdhamSingh-6769

This counter only shows activity when the database is under committed snapshot isolation level, it does not log any activity if the database is in any other isolation level.

But all queries that run against the secondary databases are automatically mapped to snapshot isolation transaction level, even when other transaction isolation levels are explicitly set.
https://docs.microsoft.com/en-us/sql/database-engine/availability-groups/windows/active-secondaries-readable-secondary-replicas-always-on-availability-groups?view=sql-server-ver15

· 3
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

"But all queries that run against the secondary databases are automatically mapped to snapshot isolation transaction level, even when other transaction isolation levels are explicitly set."

On which replica, primary or secondary?

0 Votes 0 ·

secondary replica

0 Votes 0 ·

okay, so if row versioning is not happening on Primary replica then why "Longest Transaction Running Time" counter is showing the activity on primary replica ? And ending up in triggering alert threshold > 50.

0 Votes 0 ·
DmitriKorotkevitch-5392 avatar image
1 Vote"
DmitriKorotkevitch-5392 answered UdhamSingh-6769 commented

Hi All,

First of all - let me apologize regarding slight error in Apress blog post content. This is the old post and I corrected it in my upcoming troubleshooting and performance tuning book. :)

Udham is indeed correct - row versioning is not used on primary when it is not enabled. With readable secondaries, you would get 14-byte pointer added (beware of fragmentation impact). You would also have ghost cleanup (and version store cleanup when used) deferred based on oldest active transactions across all nodes. But SQL Server would not use version store on primary unless it is enabled or used by other processes (triggers, MARS, online index rebuild, etc).

Speaking of Longest Transaction Running Time - unfortunately, I don't know the answer. I cannot duplicate the problem in simple setup - active explicit transaction on secondary does not increase this counter on primary in my setup (would be easy to check in your environment with perf monitor, I suppose). I don't want to question monitoring strategy but I'd be a bit reluctant to use this performance counter to drive conclusions. It is not sampled frequently enough and it may be hard to correlate the information with the other sources. It is entirely possible that long running transaction on primary is gone by the time you received the alert.

So bottom line - why do you think long running transaction alert is triggered by secondaries activity instead of legit long running transaction on primary?

I would look at sys.dm_tran_database_transactions and other transactions DMV to confirm that the problem exists in the first place.

Sincerely,
Dmitri

· 7
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Really appreciate your response.
Since the environment where this alert is getting triggered is production so not really possible to replicate scenario. And don't have test setup handy at this movement.
If you can give a try on your test setup, below is the scenario when I am seeing the row versioning happening on Primary node and Longest Transaction Running Time counter is showing the activity on primary replica, hence ending up in triggering alert threshold less than 50. Same sessions are seen in dm_tran_active_snapshot_database_transactions DMVs.


There is a functionality that is considered as single transaction. So they begin the code execution under single transaction.

Select the data from a table based on the specific conditions lets say ID less than 100000.

Start the transaction - BEGIN TRAN
BEGIN THE LOOP lets say for 10000 times - assume we have 10K records to process
Based on each loop perform the DML operation (insert or update or delete) through SPs.
END LOOP

COMMIT or ROLLBACK

In the dm_tran_active_snapshot_database_transactions DMV, DML operation is showing getting captured. The scenario details I have collected from the DEV team after discussing the DML procedure code.

0 Votes 0 ·

I can't avoid the feeling that that loop should be replaced with a single MERGE statement. Or at least three set-based INSERT, UPDATE and DELETE statements.

0 Votes 0 ·

I got to replicate the scenario. And yes it is creating the row version on the Primary Replica and hence "Longest Transaction Running Time" counter is showing the activity on primary replica.

Environment details:
SQL version on all 4 nodes:
Microsoft SQL Server 2016 (SP2-CU17) (KB5001092) - 13.0.5888.11 (X64)
Mar 19 2021 19:41:38
Copyright (c) Microsoft Corporation
Enterprise Edition (64-bit) on Windows Server 2016 Datacenter 10.0 <X64> (Build 14393: )
2 nodes are in SYNC and 2 are in ASYNC.
Isolation level remain the default i.e. READ COMMITED for the session under which I am executing the transaction.

Transaction as follow:

Select From table1
Select
From table2
Begin TRAN
Declare @ID int
Set @ID = 1
While @ID < 10000
BEGIN
Exec Updatetable1SP @ID
Exec DeleteInsertTable1SP @ID
Exec DeleteInsertTable2SP @ID
Set @ID = @ID + 1
END
Commit

I am not sure why row versioning is happening on Primary Replica under READ COMMITED Isolation level. I may connect with Microsoft Support team to understand this behavior or this could be some bug.

0 Votes 0 ·

By "primary replica" you mean master, don't you? I assume so because you are trying to update data.

Do you have any triggers on the tables you are modifying? Those triggers may use row versioning with inserted/deleted virtual tables.

0 Votes 0 ·
Show more comments