Previously committed rows might be missed if NOLOCK hint is used
I received a question from one of our customers about using the NOLOCK hint: can it cause missing rows in scans even if the rows were committed well before my SELECT with NOLOCK starts?
The NOLOCK hint is employed by many users to avoid contention on tables where updates are performed concurrently with selects. The popular belief is that the only side effect of using the NOLOCK hint is that it may cause reading uncommitted rows. These would be the rows that may be either rolled back or are part of larger concurrent transaction from which some rows may not be retrieved because they had not been written to the pages at the time the NOLOCK SELECT has been scanning them. It is much less obvious that even rows committed a long time before my NOLOCK transaction started might be skipped in the scan. (After I asked several SQL Server experts around me, I found that no one except for the development lead who owns the component in question knew this.) Additionally, a scenario theoretically exists (I don’t have a repro yet; read below) where the same record might be accounted for twice if SELECT reads a table with a NOLOCK hint.
This blog contains a simple repro – two scripts that you have to run on two connections concurrently showing double-counting rows. The third script is used to identify cases when the count of rows in the table decreased in the subsequent scan despite the fact we are only inserting rows into the table.
I will then show how you can realize the same concurrency benefit in SQL Server 2005 that is provided by NOLOCK while maintaining transactional consistency, and without using the NOLOCK hint. Later, I will explain which table scans might have this problem and which will not (even with NOLOCK), and how you can distinguish the scans by investigating their showplans. Then, I put up a challenge – create a repro that shows double-counting previously committed rows with the NOLOCK hint.
First, you should understand that SQL Server is using two kinds of scans for reading data from tables and indexes: allocation scans and range scans. An allocation scan is scanning the pages in physical (allocation) order as they are laid down on the disk. In this situation, no particular logical order of rows is guaranteed. If the optimizer requires rows to be scanned in a certain order, then a range scan is employed. This is usually when there is a predicate on the leading column of an index or if the optimizer requires visiting the rows in the logical order (for example, because of aggregation or ORDER BY in the query).
Next, you should know that if a new row is inserted into an index (clustered or not), it might cause page split. This happens if the existing page is full and the new row belongs between two existing rows (because of the key value of the new row is between the key values of the two rows) and it cannot fit physically into the same page. The page is split into two (approximately ½ empty each), and the new row is inserted where it belongs.
Here is a simple scenario that shows that the NOLOCK hint might miss some rows when concurrently executed updates (INSERTs in our case) are causing page splits. The INSERTs in script #2 will cause page splits because the table has a clustered key. First, you insert two rows, one with key value 1, and another with key value 10000. Then, you insert rows with key values 2, 9999, 3, 9998, etc. When the first page is full, it is split as described above. However, if the concurrently performed SELECT is using the NOLOCK hint on the same table, and the optimizer had chosen the allocation scan, the scan is moving only forward. Therefore, it might miss the ½ page of rows if the newly allocated page falls behind the scan.
Here are the scripts:
Script #1 – This script is trying to count the rows in the table t1. You can run three versions; see the “activate only one…” comment in the script. The first version has no hint, the second has the NOLOCK hint, and the third has both the NOLOCK hint and a BETWEEN predicate. The predicate causes the optimizer to select the range scan instead of the unordered scan in the previous two cases. You start this script and let it loop on the while statement until 10000 rows are inserted into the table t1 by the second script.
drop table tcnt;
create table tcnt(num int, cnt int);
drop table t1;
create table t1 (a int primary key, b char(500));
declare @i int, @tcnt1 int, @tcnt2 int;
insert into tcnt values (0,0);
-- Activate only one of the following three SELECT statements and compare the behaviors
-- select @tcnt2=count(*) from t1; --- this will perform the correct count
-- select @tcnt2=count(*) from t1 with (NOLOCK); -- this one might skip some previously committed records
--select @tcnt2=count(*) from t1 with (NOLOCK) where a between 0 and 10001; -- this one will work correctly because the predicate is causes the optimizer to select a range scan
insert into tcnt values (@i,@tcnt2);
Script #2 – You activate this script from another connection against the same database where you ran the first one.
declare @imin int, @imax int;
insert into t1 values (@imin,'x');
insert into t1 values (@imax,'y');
Script #3 – You run this one to investigate the result after the two previous scripts are finished. It will show all cases when the count(*) value decreased after additional committed inserts were performed. You can observe this only in the second scenario in script #1, when you run the query with NOLOCK hint and without the BETWEEN predicate. If NOLOCK is not used or the NOLOCK hint is used in conjunction with the BETWEEN predicate, the following query returns an empty result:
select * from tcnt t1, tcnt t2 where t1.num<t2.num and t1.cnt>t2.cnt;
If the statement select @tcnt2=count(*) from t1 with (NOLOCK); is executed in script #1, the result of the above select * is non-empty. The following shows some of many rows I see in my output:
num cnt num cnt
----------- ----------- ----------- ----
622 1189 624 1187
623 1191 624 1187
687 1310 690 1309
688 1312 690 1309
689 1313 690 1309
1044 2004 1051 2003
1045 2006 1051 2003
1046 2008 1051 2003
1047 2010 1051 2003
1048 2012 1051 2003
1049 2014 1051 2003
1050 2015 1051 2003
1460 2817 1463 2815
The “num” column is the sequence number of the count(*) against the table where we inserted rows. The “cnt” column is the result of the same count(*). Therefore, one can expect that if my count(*) #622 counted 1189 rows, count #624 must logically see more rows because we are only inserting. However, we see only 1187 rows. This shows that scan #624 must have missed some rows we have encountered in scan #622. An important detail is that the insert transaction of each single row was committed, which is the default behavior.
In SQL Server 2005, no-blocking scans can be performed by using alter database <dbname>set READ_COMMITTED_SNAPSHOT ON (for more information, see (http://msdn2.microsoft.com/en-us/library/ms189050.aspx). This is the best way to get rid of the NOLOCK statements, achieve greater concurrency, and return consistent data. But be aware that there can be negative performance implications – for example, the READ_COMMITTED_SNAPSHOT will force using range scans while the NOLOCK hint is using allocation scan. Paul Randal’s blog at http://blogs.msdn.com/sqlserverstorageengine/archive/2006/11/09/when-can-allocation-order-scans-be-used.aspx explains many cases when SQL Server might use allocation scans. However, the blog does not mention cases when the allocation scan is replaced by range scan even if NOLOCK (or TABLOCK or READ UNCOMMITED SNAPSHOT isolation) is used. You can find out if range scan is used with NOLOCK hint by investigating the graphical showplan. There you will see the following Seek Predicate property (at least one of the Start or End Range must be present for the scan to be “range scan”):
Start Range: [test].[dbo].[t1].a >= CONVERT_IMPLICIT(int,[@1],0), End Range: [test].[dbo].[t1].a <= CONVERT_IMPLICIT(int,[@2],0)
Similarly, in the XML showplan, you will find either something similar to StartRange ScanType="GE” or EndRange ScanType="LE" (note the GE is for the >= , and LE is for the <= predicate type, respectively).
I’m challenging readers to find similar reproductions showing twice-scanned rows. Please submit your solution here, or submit feedback to http://blogs.msdn.com/sqlcat/contact.aspx. I will reward the first correct solution by sending the author one of the books from the “Inside Microsoft® SQL Server™ 2005” series (http://www.microsoft.com/mspress/findabook/list/series_SH.aspx).
After I finished the above investigation I searched the Web and found Tony Rogerson’s blog with the same theme – skipping and double-counting previously committed rows when NOLOCK hint is used. The blog is located at http://sqlblogcasts.com/blogs/tonyrogerson/archive/2006/11/10/1280.aspx. Maybe I should have searched first; but this way, I enjoyed the discovery, and creating the repro. I decided to post my own blog for several reasons - the repro is a bit different, I explain the discrepancy between the range and allocation scans, and I show how you can find out if your NOLOCK is returning unexpected row sets. Most importantly, I think there are still many users using the NOLOCK hint without realizing the potentially serious consequences that both blogs describe.
SQL Server Development