SetFileIoOverlappedRange Can Lead to Unexpected Behavior for SQL Server 2008 R2 or SQL Server 2012 (Denali)

You should be aware the Windows bug with API = SetFileIoOverlappedRange . This is used by SQL Server (2008 R2 and SQL 2012) only when Locked Pages has been enabled and is in use by the SQL Server.

SQL Server Support has NOT encountered this in a production environment. The issue was found during Denali (SQL 2012) testing. This is for information purposes only.   WORKAROUND: Disable Locked Pages for the SQL Server instance.

Problem

Calls to the SetFileIoOverlappedRange can corrupt the OVERLAPPED structures of SQL Server. This can lead to incorrect offsets, wrong timings, early completion, etc…

This is called every time we open an FCB (SQL Server's user mode File Control Block Structure). For example:

· Open of current database files (MDF, NDF, LDF) - standard recovery

· Open of new database files (MDF, NDF, LDF) - create database (Testing shows for a create database this is called 4 times to get everything going)

· Auto close opens (MDF, NDF, LDF) - use on closed database

· DBCC Snapshot opens (MDF, NDF)

· Snapshot database file opens (MDF, NDF)

· Restore/Log Shipping – When it opens the FCB to do the restore to *** Backup to disk does NOT use this code path. ***

 

Symptoms

Wide ranging in SQL from invalid write location, lost read or write, early access to a page that is not yet fully in memory, I/O list damage such as AVs, incorrect timing reports, and many others. You may not even see the situation until days later.

 

Bug Id #818757

This is a race condition trigged when using SQL 2008 R2 and (Denali (SQL 2012) before RC1). If you are running with Locked Pages In Memory there is a possibility of corrupting outstanding I/O requests when reads/writes are outstanding AND concurrently another file is being opened.

Unfortunately memory corruption (scribbler behavior) may happen in the different parts of the data structure. In the majority of the cases we’ve seen so far we corrupt I/O queue pointers, thus causing SQL Server instance to produce infinite amount of dumps (looping).

We have also seen in-memory corruption, calls to I/O completion when the I/O is not finished. Write/read at incorrect offset(s) within the file is also possible but unlikely.

Windows 8 contains the API fix that is being back ported to previous version(s) of Windows.

 

The SQL Server 2012 and the SQL Server 2008 R2 CU based fixes introduce trace flag 8903. These builds will not use the API unless the trace flag is enabled.

 

Windows 2008 / Vista

Windows 2008 R2 / Windows 7

(Windows 2008 R2 / Windows 7) + QFE*

Windows 8

SQL 2008 R2

Exposed

Exposed

Fixed

Fixed

SQL 2008 R2 – PCU2**

Fixed

Fixed

Fixed

Fixed

SQL 2008 R2 – PCU2 –T8903 enabled

Exposed

Exposed

Fixed

Fixed

SQL 2012

Fixed

Fixed

Fixed

Fixed

SQL 2012 – T8903 enabled

Exposed

Exposed

Fixed

Fixed

*The Windows fixes are slated to be included in a Windows Update package. Reference Bug #s 395823 (KB 2679255 with SP2 Target), 395948 (KB 2679255 with SP3 Target),

** The SQL Server issue is fixed in SQL 2012 RTM and with the bug # 870529 for SQL Server 2008 R2 PCU2 release.

API Reference: http://msdn.microsoft.com/en-us/library/windows/desktop/aa365540(v=vs.85).aspx

When the fixes are available I will be updating the matrix above and Microsoft will be creating the proper System Center Advisor rules to detect exposure.

Bob Dorr - Principal SQL Server Escalation Engineer