SQL Server Instant File Initialization: SetFileValidData (Windows) vs fallocate (Linux)
SQL Server Books Online documents Instant File Initialization and the associated security considerations. This blog highlights the underlying file system implementations and differences in behavior between Windows and Linux.
SQL Server performs the following API calls when creating or extending (growing) data and log files.
- CreateFile – Create or open a file
- SetEndOfFile– Establish the file size and acquire space from the I/O device
- SetFileValidData – Establish the valid data size
- If the file is a log file (LDF) SQL Server always writes a known pattern (other than zeros) to the allocated space.
- If the file is a data file (MDF/NDF) SQL Server checks the instant file initialization and trace flag 1805 settings to decide if a pattern should be written to the allocated space.
Trace flag: 1805 (undocumented in SQL Server Books Online): Disables instant file initialization for data files.
Note: The stamping is usually performed in 4MB chunks for optimal performance and alignment with both Windows and Linux file system page and block size alignments.
Windows (ntfs and refs) file systems have two members key to instant file initialization.
EOF – End Of File Location
VDL – Valid Data Length Location
Linux support file allocation with the fallocate system call (ABI.) The Windows API calls are mapped to Linux ABI calls as follows.
The main difference between Windows and Linux file systems is the tracking of extents, not a valid data length (VDL.) On Linux each extent contains a flag indicating if it has been written to the I/O device.
|Empty FileWhen the file is first created the EOF = 0 and the covering extent is set to (N) for not written.A read of a not-written extent always returns zeros on Linux. Linux does not engage the I/O device but simply fills the return buffer with zeros for the space tracked by the not-written extent.Hint: Check the documentation of the Linux file system for extent sizes and adjustments. The default size usually aligns with a memory page size boundary (often 4K) which aligns well with SQL Server 8K pages and 64K extent boundaries for optimal performance.|
|SetEndOfFileFile size growth takes place with an fallocate invocation. Linux acquires space from the I/O device and establishes the EOF as well as the tracking extent meta data, indicating not-written. The fallocate acquires space just as SetEndOfFile acquires space for a Windows file system, allowing fast creation of large files. The difference is the SetFileValidData. Linux does not provide the ability to set the extent tracking to ‘written’ without an actual write.Reference unimplemented Linux File System request to mark extents as written (security concerns): https://www.spinics.net/lists/linux-ext4/msg31637.htmlPerformance Consideration: If fallocate is not supported for the target file system, SQL Server uses ftruncate. Contrary to its name, the ftruncate ABI can be used to grow a file but is a thin-provisioned operation (space is not acquired only the metadata updated.) When ftruncate is required SQLPAL write zeros to the file to acquire the physical space and provide fallocate, zero read behavior to SQLPAL processes.|
|WriteWhen the first write takes place the the metadata data for the extent(s) are also updated. The extent may be split for tracking written and not-written data or the write extended, by the Linux kernel, to write zeros to the space on disk so the entire extent can be marked as written.Note: On most Linux file systems this results in two (2) write requests but may be more depending on the write size and offset alignment. 1 request for the file data and 1 request for the metadata change.|
Pay Me Now Or Later Decision
On Windows the SetFileValidData is a single metadata operation. Once the VDL is established a write (sequential or random) does not require additional metadata updates as the VDL == EOF. On Linux the write requires the extent update(s) requiring a write for the data and a write for the metadata.
You can create and grow a file as quickly on Linux as you can on Windows. However, the first write on Linux results in the metadata maintenance.
- If database creation speed is a concern use Instant File Initialization and allow the first writes to encounter the additional overhead.
Note: The overhead is often hidden by SQL Server as most writes take place with background processes such as Checkpoint or Lazy Writer. Bulk loading is an exception as write can occur on the active SQL Server session.
- If the creation speed can be elongated you can enable –T1805 causing the database, data file space to be stamped during creation and growth. The stamping is optimized in large chunks and becomes the write path encountering the first write data and metadata operations. Once the location has been written (stamped) the writes no longer require the additional metadata write.
Note: If the file system does not support fallocate the file is stamped with zeros by SQLPAL. The log file (LDF) always stamps a known pattern on top of the zeros and you can safely retain Instant File Initialization for the data files as the metadata has already been updated when SQLPAL zeroed the space.
Bob Dorr - Principal Software Engineer SQL Server