Disaster Recovery Part 2

Okay, so some people noticed that I deleted pretty much all of the content from an earlier post on DR because, well, it was no longer valid. Most of it was sample scripts that were now replaced by supported shipping versions of those scripts. However, I guess there was some information that was still useful so I will put back a few comments here and address a question that was raised about the correctness of changing the Recovery Mode for our database / when can you do this.

 So first, let me say that the online documentation should be extremely good for our Disaster Recovery at this point. Michael McConnell, Shu Zhang and I spent a good bit of time making sure we covered every detail we needed and tried to order it so that it was more readable. If you have any complaints about the docs, please send feedback via this blog posting. I will get it to the appropriate person (Michael) to update the docs if necessary. You can find these docs at http://msdn2.microsoft.com/en-us/library/aa562140.aspx.

I will not repeat everything which is in the docs here, but most of the docs are spent explaining how to setup our backup jobs, how to configure log shipping, and how to restore the system in the event of a disaster. A number of companies have had their DBAs complain about having to use our backup process versus using their standard process which is to backup a database at perhaps 1am every day and then take periodic log backups. This link (http://msdn2.microsoft.com/en-us/library/aa577848.aspx), a sub topic of the above link, gives a good explanation of what we are doing ... transactional log marking. The problem with what the DBAs are doing is that they are treating each db as a completely separate entity. BizTalk, though, will at times use Distributed Transactions to ensure reliability of actions across databases. If you treat each db as a separated entity, then if DB1 and DB2 participate in a distributed transaction together and I do a log backup of DB1 before the transaction commits and a log backup of DB2 right after the transaction commits, then when I restore the two of them my environment will be in an inconsistent state. For that purpose we use transaction log marking with "restore to mark" for all the databases in our environment (as described in the previous link and in many online sql docs).

There is no magic here. This is all standard TSql. You can read through our code and figure out everything that we are doing. We never install custom components on database servers for doing external procedure calls and we never use xp_cmdshell so this is all code that any database developer could have written, it is just that most never seem to care about DTC (even if maybe they should).

The question has come up as to are there any alternatives. The quick answer is that if you want to use log backups, then no, there are no other alternatives. You would have to write all the same code that we do so you need to use the code that we wrote. However, if you don't really care so much about the data in your system and are okay with, say, 24 hour data loss windows (your data loss window with log backups is the interval between log backups), then you could optionally just have a period of time every day where you shut the entire system down and then just do full backups. Since there is no activity going on in the system when it is totally down, then you don't need to worry about distributed transactions. To shut the entire system down, you need to shutdown all BizTalk Host Instances, make sure no traffic is coming in via Isolated Hosts (for example http / soap receive adapters), shutdown the sql agent processes on the various databases, and make sure that no other administrative tasks are on going (like changing configuration, redeploying, adding a new sendport, ...). You should be able to verify that nothing is happening on the system via SQL profiler or some other device monitoring the databases. Once the whole system is down, you could take a backup of all of the databases in the system. To get a list of all of the databases which the BizTalk backup process was going to backup, you just need to "SELECT * FROM admv_BackupDatabases" from the BizTalkMgmtDb database. That is a view which all of our backup code uses during our DR process. If you add custom databases to our backup process (http://msdn2.microsoft.com/en-us/library/aa561198.aspx) then it will also show up in this list of databases. It is important to note that our log shipping story also copies sql agent jobs out of msdb when you configure the destination system and restores them when you finally restore to mark on the destination system. If you chose to not use our log shipping story and do the system shutdown full backup approach, you need to make sure you handle backing up and restoring msdb with the sql agent jobs.

At this point, if you are chosing to have a 24 hour data loss window and use scheduled, daily downtime to create a complete set of full backups for recovery, then you should also change the recovery mode on the databases to SIMPLE. This will automatically manage the size of the transaction log by essentially doing "truncate log on checkpoint". By default, our databases are all set to FULL recovery mode which will only truncate the log when the user performs a backup of log or when the user manually forces the log to truncate (ie BACKUP LOG WITH TRUNCATE ONLY type command). There are lots of docs out there on this, just search for SQL Server Recovery Models. The key is that if you leave it in FULL recovery mode and then don't do log backups (with our job), then your log will grow unbounded and you will eventually run out of disk space.

Hope that helps to clear up some confusion.



 Oh yeah ... for those of you who always ask about database mirroring ... I am really sorry that we can't support it. If you check out http://msdn2.microsoft.com/en-us/library/ms366279.aspx you will find a write up by the SQL team on issues with mirroring and distributed transactions. Even if you put everything in one database, we still use DTC transactions occassionally (we did not optimize to detect everything in one database and it is only with the latest System.Transaction work is their automatic support for upgrading from local to distributed transactions). Thx.