INFORMATIONAL: SHEDDING LIGHT on VSS & VDI Backups in SQL Server

Right, ever heard or wondered about snapshot backups in SQL Server? You might have come across the Volume Shadow Service (VSS) backup option in SQL Server. In this post, I will try to explain what they are, how they work and what to check for when you run into issues with VSS backups failing.

Before we get into the nitty-gritty of VSS, let me lay down some of the terms you will need to know,

VSS

VSS is basically a framework that consists of a set of functions that enable applications to perform backups of volumes.

Writer

Writers are applications that store information on disk and that work together with providers through the VSS interface.

SQL Server has 2 writers available:-

1) MSDE Writer - MSDEWriter works with SQL Server 7.0, SQL Server 2000, and SQL Server 2005.

2) SQL Writer - SqlServerWriter only works with instances of SQL Server 2005.

Provider

Providers own the shadow copy data and instantiate the shadow copies.

VDI

SQL Server provides application programming interfaces (API's) that enable an ISV to integrate SQL Server into its products. These specifications are available publicly for third-party vendors to develop backup solutions. Some popular ones include Symantec NetBackup, SQL Litespeed, Legato etc.

You must be thinking why I need this backup solution when we have the option of Native SQL Backups. Well, the difference with SQL backups and VSS backups is that, through VSS we can backup the database files while SQL Server is running (and has a handle to the DB Files). The primary difference is that these are "Snapshot backups" i.e. Backup of a file or folder on a specific volume at a specific point in time. So applications on a system can continue to write to the volumes while we do the backup.

Now you must be thinking how we can backup the file at a point in time when sqlserver itself has a handle to the file. Well, that’s the magic of VSS a.k.a Shadow Copy :)

How does it work?

One of the first phases of a snapshot creation process is "Backup Initialization." During this phase, the backup application does the following:

The backup application initializes the IVssBackupComponents interface.

The backup application calls the IVssGatherWriterMetadata API to perform metadata enumeration.

Here is where the default writer comes in. Whichever is the default Writer for SQLServer, that is used to connect to the instance and enumerate all the databases & gather its metadata (The default writer for SQL Server connects to *ALL* instances of SQL Server that are started on the local system).

The metadata is then returned to the backup application.

So potential issues that you could run into are when the connecting to the instance fails, the enumeration fails and hence you won’t be able to select any database for backups. Also if the SQL VSS writer service is not running, the SQLWriter is not listed and hence no SQL database files can be backed up.

Next up, from the metadata we form a "backup component" which is basically a list of what all to backup along with backup options. After this comes the Snapshot Initialization and finally the Snapshot Creation

In the snapshot creation phase, we do the following:

a) SQL Writer talks to SQL Server to prepare for a snapshot backup

b) Then all I/O for the database being backed up is frozen, and then we create the snapshot.

c) Once this is done, the I/O is resumed. This process is called "thaw".

We then take the snapshot and move it to a backup media if necessary. This phase is outside of SQL Server and in no way interferes with normal data operation. This is between the Backup Application and VSS. Once this is complete, we send a backup termination signal and the snapshot is deleted.

Here is a pictorial representation of a VDI Backup cycle (courtesy sqlbackuprestore.com )

You do have some options when doing VDI backups.

BLOCKSIZE: specifies the physical block size (bytes)

MAXTRANSFERSIZE: Unit of transfer (bytes) to be used between SQL Server and the backup media. The minimum is 64KB up to 4MB.

BUFFERCOUNT: Number of I/O buffers to be used for backup operations.

So the calculation for the amount of memory used in SQL for a backup operation = Maxtransfersize * BufferCount. This memory comes from the non-Bpool region (popularly known as MTL J )

E.g. maxtransfersize = 65536 and BufferCount = 40 ==> 2.5 MB

Below is a sample snippet using VDI interface to do backups.

clip_image004

<Code snippet>

backup->Devices->AddDevice(deviceSet->Name, DeviceType::VirtualDevice);

backup->Complete += gcnew ServerMessageEventHandler(this, &BackupFileContext::CompletionHandler);

backup->Information += gcnew ServerMessageEventHandler(this, &BackupFileContext::InformationHandler);

backup->NextMedia += gcnew ServerMessageEventHandler(this, &BackupFileContext::NextMediaHandler);

backup->PercentComplete += gcnew PercentCompleteEventHandler(this, &BackupFileContext::PercentageHandler);

backup->Database = this->databaseName;

backup->Action = (this->level == 'I')? BackupActionType::Log: BackupActionType::Database;

backup->Incremental = (this->level == 'D');

// backup->BlockSize = 512 * 128;

dbg(1, __LINE__, DBG_TEXT("database '{0}' checksum {1}"), this->databaseName, backup->Checksum);

try {

backup->SqlBackupAsync(this->server);

}

</Code snippet>

We are making use of VDI with SMO to perform backup.

Specifications here: http://msdn.microsoft.com/en-us/library/microsoft.sqlserver.management.smo.backup_properties(SQL.90).aspx

One thing to keep in mind when you are writing using SMO is that the SMO’s default Statement Timeout is 10 minutes. You might want to set this value higher or to infinite (0), especially for large databases. If your database backup doesn’t complete in 10 minutes, you will get a timeout & the backup operation is aborted.

<snippet>

using Microsoft.SqlServer.Management.Smo;

ServerConnection server = new ServerConnection(serverName) /* suppose this is your connection object */

server.ConnectionContext.StatementTimeout = 0 //infinite statement timeout. Needed for large databases.

Try

{

backup->SqlBackupAsync(this->server);

}

</snippet>

You also have options of excluding some databases to be excluded from backups. By default databases that are in restoring/loading/suspect are excluded. Also excluded are tempdb and mssqlsystemresource databases. This "exclusion list" is one of the metadata that is gathered by the writer.

One way that you can control this is using the registry key:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\BackupRestore\FilesNotToBackup

HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\BackupRestore\FilesNotToSnapshot

To get a list of available writers run this from a command prompt:

vssadmin list providers

To get detailed writer metadata information, you can use the VShadow.exe that is available with the Platform SDK. Run this from a command prompt once you have VSSSDK installed

vshadow.exe –wm2

Some things to keep in mind:-

1) If the connection to SQL Server fails, the enumeration doesn’t happen so the Writer is not listed and the backups fail. Even NTBackup when not backing up a SQL database tries to connect if SQLWriter is available. So make sure the connectivity is working.

2) There is a sample tool available with SQL Server 2000 Sample which illustrates VDI backups. It’s called simple.exe. This can be used to test if the SQL VSS component is functioning.

3) Anytime a VDI backup fails, you will get an event log entry like the following,

MSSQLSERVER Error 18210 SERVERNAME BackupVirtualDeviceFile::RequestDurableMedia: Flush failure on backup device 'RVX-{58702BA0-C160-4125-952E-CC21954404B7}'.

Operating system error 995(The I/O operation has been aborted because of either a thread exit or an application request.)

Make careful note of the HResult code returned by the OS. (Error 995 in this case). The SignalAbort function is called to terminate and it gives the reason for abnormal termination. Also note where the abort was initiated from - client (backup app) or the server (SQL server).

So when the server receives an error code from IClientVirtualDevice::CompleteCommand, it will terminate the command and close the Virtual Device. Any buffers which have I/O being performed return VD_E_ABORT status this is what we see in the event log entry. All the subsequent commands are cancelled and VDI backup set is closed by invoking "CloseDevice"

OR

2005-07-26 16:00:05.49 backup BACKUP failed to complete the command BACKUP log [AON_IDB_POSTDTS] TO VIRTUAL_DEVICE='GSB100'

WITH BLOCKSIZE = 65536 , BUFFERCOUNT = 40 , MAXTRANSFERSIZE = 262144 ,stats

You might get above message in the SQL Errorlog. Again this will also contain the HResult code returned by the backup operation.

I hope this was useful in de-mystifying some of the mysteries of VSS/VDI backups in SQL Server. As always stay tuned for more.

Regards,
Sudarshan N

Technical Lead | Microsoft SQL Server