Backup solutions for Exchange 2007... (4)

Continuing on from previous blogs in this series; 'Backup solutions for Exchange 2007', 'Backup solutions for Exchange 2007... (2)' and 'Backup solutions for Exchange 2007... (3)' in which I discussed some of the options for backing up standalone mailbox role servers with or without LCR and CCR enabled, I now need to move on to the options for the last design which is as follows:

Design 4 - Multiple 2 node MNS Cluster with CCR plus SCR

...so just to reiterate (AGAIN) " ..in the context of data protection I have to mention the importance of Service Level Agreements. Before we can even start designing a backup solution it is vital that we have a good understanding of what our recovery objectives are. We really need to try and pin down firstly whether Outlook is a critical application in terms of our ability to communicate via email (so can we use a dial-tone recovery?) and secondly whether the data held within our databases is critical to the business (so we need to plan for a standard database recovery?). If it is a yes to both then we need to understand how long our business can be without access to Outlook and our Exchange data. Ok so this is a very difficult exercise and if the business will not dictate this then we should be directing the business by coming up with a number ourselves and then obtaining their agreement on this. Once we've got a number, like (at its most basic) 4 hours to restore the full service including all data, then we have something to aim for... "

The main options are as follows:  (These are similar to the previous design but with an additional option which is not to backup at all...)

  1. Traditional streaming backup to tape
  2. Traditional streaming backup to disk and then to tape
  3. Snapshot backups based on the Volume Shadow Copy Service
  4. Do not backup Exchange data and rely on native Exchange Server technology

Traditional streaming backup to tape

This is the standard form of backup solution that all Exchange Administrators will be familiar with in some form or another.  This is possible through the ESE api and is supported by NTBackup and the new System Centre Data Protection Manager 2007 (DPM), as well as numerous 3rd party products from our partners.  There are a number of advantages and disadvantages to using traditional streaming backup to tape when you are using standalone mailbox role servers.  These are summarized as follows:

Advantages Disadvantages
Mature technology with numerous options in terms of software and hardware Will impact the performance of the server during the course of the backup so needs to be considered particularly with companies providing a 24 hour service
Can run backups against multiple storage groups concurrently (NTBackup would required multiple backup jobs to do this) Need to be aware of its impact on IS Maintenance.  Your backup window should be staggered to avoid the IS Maintenance period
Generally simple to setup Can be relatively slow, particularly when compared to VSS snaps or streaming backup to disk
  Can be relatively expensive in terms of the number of tapes that are required
  Full backup each night is generally recommended to be able to meet most recovery objectives (alternative could be weekly fulls and daily differentials)
  Often restricted to relatively small databases in order to meet our recovery SLA's
  Must be run against the active database only.

Traditional streaming backup to disk and then to tape

Very similar in terms of advantages and disadvantages above; the differences being that the speed of any backup is going to be faster, and therefore the impact of your backup on performance and IS Maintenance will be minimized.  Also it is likely that if you need to restore your database from last night it will most likely still be on disk and therefore offline restores to a new storage group then becomes an option (making use of database portability). (*Be careful with public folders though as these are not 'portable'.) Also a traditional restore from disk is likely to be relatively fast, especially when compared to a restore from tape.  The pro's and con's are as follows:

Advantages Disadvantages
Mature technology with numerous options in terms of software and hardware Will impact the performance of the server during the course of the backup so needs to be considered particularly with companies providing a 24 hour service
Can run backups against multiple storage groups concurrently (NTBackup would required multiple backup jobs to do this) Need to be aware of its impact on IS Maintenance.  Your backup window should be staggered to avoid the IS Maintenance period
Generally simple to setup Can be relatively slow, particularly when compared to VSS snaps**
Generally faster than streaming backups to tape** Can be relatively expensive in terms of the number of tapes that are required
  Full backup each night is generally recommended to be able to meet most recovery objectives (alternative could be weekly fulls and daily differentials or even incremental's)
  Often restricted to relatively small databases in order to meet our recovery SLA's
  Requires additional disk space
  Must be run against the active database only.

**The speed of your backup and restore will be determined by a number of factors including network, tape device, RAID type,backup software etc etc.. To give you an idea MSIT used to use NTBackup to back up there Exchange 2003 data to disk and then tape and achieved the following:

  • "Individual backup throughput per storage group can be sustained at approximately 1.2 GB per minute
  • Total throughput can be sustained at approximately 4.8 GB per minute per Exchange virtual server with four concurrent backups running.
  • Restore rates can be achieved in the range of 2 GB per minute for a disk-to-disk-based restoration. This throughput is achievable once the disks being written to are not under any form of production load."

This information was taken from a 'Note on IT' article.

Snapshot backups based on the Volume Shadow Copy Service

The third option which many administrators might not be so familiar with is to take snapshot, 'point in time' backups of your Exchange data.  Snapshots are supported to run against the active copy of a storage group although continuous replication does now enable us to offload snaps to the replica database.  It is not currently possible (or at least supported) to backup the SCR target databases but in a deployment using CCR VSS snaps can be taken of the replica database and not necessarily of the active database which has the advantage of reducing the impact of the snap on the active database and on the active node. Products such as DPM will also take care of transaction log truncation of the active database even when the snap is operating against the replica and will 'follow' the replica database.  So in the event of a failover when the replica becomes the active, DPM will now protect the formerly active database and new replica.  This is configurable in DPM so administrators can choose for this behaviour to be overridden.

Support for VSS has been in place since Exchange 2003 but in my experience has not been widely adopted. (Indeed NTBackup does not provide support for 'Exchange aware' snaps.)  VSS allows files to be backed up when they are still open essentially by pausing disk I\O.  On an Exchange Server a read only copy of the Exchange data is copied to disk which will typically take a couple of seconds and will almost imperceptibly interrupt Outlook, if run against the active database.  There will be no impact to clients if snaps are taken against the replica database. We can take snaps every night for example, alongside transaction log synchronisation's every 15 minutes, and so will be able to restore to multiple points in time using a combination of the last snap and multiple transaction log synchronisation's. A good explanation of how this works in detail can be found here. Exchange 2007 has improved support for VSS including, for example, the ability to restore VSS backups to alternative locations (database portability again) but the technology is essentially the same.

Again there are numerous partner products that can provide you with the ability to take snapshots but DPM is the product which I think will really interest administrators who want to re-evaluate their backup solution.

DPM's approach is described as follows:

"DPM uses a combination of transaction log replication and block-level synchronization in conjunction with the Exchange VSS Writer to help ensure your ability to recover Exchange Server databases. After the initial baseline copy of data, two parallel processes enable continuous data protection with integrity:

· Transaction logs are continuously synchronized to the DPM server, as often as every 15 minutes.

· An “express full” uses the Exchange Server VSS Writer to identify which blocks have changed in the entire production database, and send just the updated blocks or fragments. This provides a complete and consistent image of the data on the DPM 2007 server. DPM 2007 maintains up to 512 shadow copies of the full Exchange Server database(s) by storing only the differences between any two images.

Assuming one “express full” per week, stored as one of 512 shadow copy differentials between one week and the next, plus seven days x 24 hours x 4 (every 15 minutes), DPM 2007 provides over 344,000 data consistent recovery points for Exchange."

Using VSS in an environment with CCR obviously has a number of advantages and disadvantages:

Advantages Disadvantages
Backup can be offloaded to the replica database reducing the impact on the active database and clients alike** Recovering historical data from a point in time prior to my first snap means I need to retain my tape devices - say beyond 7 days and up to 7 years
Might be able to eliminate or at least significantly reduce any reliance on tape based backups If I am mandated to keep data offsite I may need to retain my tape devices of replicate my backups offsite
Very fast backup (after the 1st) Might require large amounts of additional disk space
Potentially very fast recovery Often a little more complex to design and configure
Only one backup per storage group but with E2K7 a 1:1 ratio of databases:storage groups is recommended and you can run multiple VSS snaps in parallel  
Faster backup and recovery times means that databases can be larger so therefore fewer servers might be required  
IS Maintenance will not be interrupted as snaps taker far less time that traditional streaming backups  
Aside from the first full backup there is little performance impact for clients  
A solution like DPM means that control of most backups and recoveries is controlled by the messaging team and not by a separate team which can confuse and delay recoveries**  

**Depends on the solution that is deployed as to whether you can take advantage of this.

In an Exchange Server 2007 deployment with CCR the solution that is the easiest to manage and potentially the least expensive is the use of VSS through something like System Centre Data Protection Manager. I particularly like the fact that it can easily be managed from within the messaging team.  In my experience numerous disaster recovery situation have taken longer to resolve than they should have, due to miscommunication or a lack of knowledge between the messaging teams and the teams responsible for the backup solution. Other huge advantages are the fact that the VSS requestor can operate against the replica database and therefore not affect either the performance of the active node and or the service to the user community.  By only taking changes and continuously synchronising transaction logs it means that not only does the administrator have numerous recovery options it also means that both snaps and restores will be relatively very fast when compared to more traditional backup methods.

Do not backup Exchange data and rely on native Exchange Server technology

The 4th option that I want to discuss in this blog is the idea that as SCR introduces the option to have multiple targets and therefore additional copies of our Exchange databases, and the fact that SCR provides further resilience to scenarios such as logical database corruption, there is a case for not taking backups at all.  I have blogged about this before and there is a lot more detail in a previous blog than will be discussed here, and I am certainly not advocating any administrator abandoning their backup solution without considerable thought and planning, but I believe there is a case for reconsidering why we backup.  The exercise might only serve to reaffirm why you backup Exchange data in your particular deployment but nonetheless it might still be worth doing.

Advantages Disadvantages
Potentially significant reduction in costs associated with a backup solution No backup!   In many companies this will not be tolerated politically (and perhaps even legally)
Eliminate the performance and service impact of backups No long term retention of data beyond that residing in the database instances
Reduction in required software stack and hardware eases administrative burden and potentially improves resilience Removing one of the possible recovery paths increases the dependency on a well managed & monitored environment

It is worth considering what the alternatives are to backup and what recovery scenarios your business requires you to be able to recover from.  One of the concerns that has arisen in the past is that physical damage to a database caused by hardware issues for example might go undetected and often does until the backup software checksums each page it touches.  If we are not backing up our databases then there is a greater risk of damage going undetected until it causes unscheduled downtime.  SP1 introduces the option to turn on checksumming to take place during the online maintenance period to protect against this. There is more information about these changes in SP1 in article; 'Online Maintenance Database Scanning in Exchange 2007 SP1'.  The other main concern is logical corruption but there are a number of options to recover this situation only one of which is to restore from backup (for example using database repair tools, move mailboxes to a newly created database, resort to the SCR target before corruption is played into target (i.e. prior to replay lag time)).  Leaving aside legal requirements or industry guideline constraints, traditionally one of the main reasons to backup was to be able to take this data 'offsite' in order to be able to recover from a flood or fire in your data centre.  By replicating to a secondary data centre using SCR this is perhaps sufficient resilience although of course this requires a second data centre and all the operational and administrative complexity that this introduces.

My advice for what its worth is to revisit your recovery and data retention requirements and if a backup is required then as discussed previously the use of something like SCDPM protection of the CCR replica database offers the best solution to protecting your Exchange data...

I hope this quick series of blogs was useful. If you have any comments about the series or any other blogs I have submitted please let me know.