How It Works: SQL Server - VDI (VSS) Backup Resources
Last week I worked on an interesting case. The issue was a NT Backup against a single volume containing 500 SQL Server databases. It is rare to have so many databases on a single volume but it is possible and we had a case on this very issue.
When a VDI backup (BACKUP ... with SNAPSHOT to DEVICE ...) is issues it currently requires 3 threads to complete the backup. The controlling thread that accepts the BACKUP command and 2 helpers to handle VDI completion and meta data streaming activities.
The formula is quite simple. (Databases with files on the volume) * 3 = Required Workers.
For the issue I worked, 500 databases results in a requirement of 1,500 threads to complete the activity.
Step #1: SQLWriter (VSS Object) is loaded.
Step #2: SQLWriter enumerates all database file locations and determines the list of databases with files on the volume.
Step #3: SQLWriter issues the BACKUP ... with SNAPSHOT .. TO DEVICE command for all databases in the list.
BACKUP .. with SNAPSHOT will FREEZE the I/O for the database and report to the VDI consumer that the database I/O is frozen. At this point the VDI consumer can do whatever action required to secure the volume snapshot. Some solutions enable copy on write behavior, others split a physical mirror and other techniques can be employed by the hardware vendors.
Once the consumer has secured the data from the volume it uses the VDI interface to tell SQL Server that I/O can resume. SQL Server THAWs the frozen database and completes the backup command.
Review the SQL Server Errorlog to see the series of freeze and thaw activities that take place.
Understanding that ALL databases on the volume have to be frozen before the disk volume is considered safe from writes is a key to understanding the issue I worked. Since all 500 databases have to be frozen the required overhead for SQL Server is 1500 threads.
SQL Server optimizes this by leaving closed databases in a closed state. There is no reason to open, recovery and freeze I/O for a database that is currently closed. You should also be aware that SQL Server does not prevent the database from opening during the FREEZE / THAW window. Do not rely on this as workaround to the issue.
This brought up quite a bit of discussion on how to change the design of SQLWriter and BACKUP ... with SNAPSHOT to reduce thread resource consumption. Several design changes are under evaluation for future versions of SQL Server.
Until then you may need to increase your max worker thread setting to use VSS backup with large numbers of databases on the same volume.
SQL Server Principal Escalation Engineer