Offline Defrag And DAG Databases, Oh My!
Even though some of the very old KBs, which now refer to unsupported products, state that taking databases offline to run periodic offline defragmentation with ESEUTIL is not recommended some folks in the field still want to do this.
Previously when there was only a single copy of a database, running offline defragmentation would cause minimal impact, apart from the time required to do the defragmentation process which could be several hours or longer depending on database size and disk throughput. This changes when we consider having multiple copies of a database in a Database Availability Group (DAG).
So you may be wondering how best to defragment Exchange 2010 databases that are in a DAG as people often look at the white space in a database and seek to immediately reclaim it.
In short, this is not a good idea for a couple of reasons:
- Defragmenting DAG databases leads to more work
- Mailboxes are offline while the defragmentation completes
- This is generally a short sighted view as white space will be re-used
Please note that we are discussing offline defragmentation via ESEUTIL /D, and not online maintenance routines that now run 24 * 7 in newer versions of Exchange and in online maintenance windows in previous versions.
What happens when an Exchange database is defragmented using ESEUTIL /D? The defragmentation process will copy out valid pages of ESE data from the old database file to a new database. This process leaves white space behind as it does not contain data. You will note that I specifically said new database. This has a different GUID than the original database. Creating a database with the same name, but different GUID, means that Exchange sees them as different databases not as multiple copies of the same database.
This will result in errors like the following since the databases are not copies of one another. Errors that may be seen include, but are not limited to:
- An Active Manager operation failed. Error Operation failed with message: MapiExceptionJetErrorAttachedDatabaseMismatch: Unable to mount database. (hr=0x80004005, ec=-1216)
- The Exchange store database <databasename> copy on this server appears to be inconsistent with the active database copy or is corrupted. For more details about the failure, consult the Event log on the server
- Event ID 494: Database recovery failed with error -1216 because it encountered references to a database, 'database path', which is no longer present
- Event ID 454: Information Store (PID) <databasename> : Database recovery/restore failed with unexpected error –1216
- Event ID 9519: The following error occurred while starting database <databasename> : 0xfffffb40. Failed to configure MDB.
Let’s look at an example of the impact caused by running offline defrag against a database that is replicated in a DAG.
Defragmenting Exchange 2010 DAG Database
We shall defragment database, DB01. Our starting configuration has two copies of this database and all is currently running well.
So let’s dismount DB01, and then validate that the two mailbox servers have the same GUID for DB01. We are using ESEUTIL /MH to dump out the header from the database.
On the first mailbox server we see the Rand of 2733649. The GUID is displayed in the ‘DB Signature’ line and is the 'Rand’ value. Be sure to look at the correct signature as there is a signature for both logs and databases. It is expected that the Rand in these two lines will be different.
On the second mailbox server we see the same Rand of 2733649, you can see the server name in the title bar of the PowerShell window.
We have shown that the same database is present on both servers, i.e. both copies have the same Rand of 2733649.
Let’s now defragment DB01 on the first server, then see what happens……
Then let’s check the Rand to see if the old value of 2733649 is still present:
Nope, It’s not. The Rand is now 143007541. That shows that this is a different database. Same name, but this is a different database.
Trying to activate the database copy on another server will create a sea of red in the application event log. You will receive the errors listed above, and the most descriptive is Event ID 4807:
Recovering From Defragmenting DAG Database
At this point since the databases are no longer copies of one another we will have to re-seed the copy of the database. Depending upon database size, disk throughput and network capacity this can take an extended period of time. Let’s use PowerShell to re-seed the database copy:
Update-MailboxDatabaseCopy –DeleteExistingFiles –Identity DB01\Consea-MB2
This will have to be repeated for all database copies of the database in question. If there are multiple copies over a WAN link then it would be a good idea to manually specify the seeding source using the –SourceServer switch. That way one copy can be seeded over the WAN, and other copies can then use that as a local source, thereby minimising WAN traffic and decreasing time.
Note that there are multiple options worth checking out with Update-MailboxDatabaseCopy. They include options to explicitly choose a network, encryption and compression. Chances are if you used Exchange 2010 RTM then you are quite adroit at using the –CatalogOnly switch!
When the seeding task completes, we can check that the database copies are OK
Checking the Rand on the updated copy of the database, we can see that it has been updated and now has the same Rand which was generated by the defrag, 143007541.
Having to take a database offline for hours to defragment, and then manually reseeding all of its database copies is pretty painful. Is there a better way to do this?
There certainly is!
A New Hope
Since Exchange 2010 introduced the online mailbox move feature, it is now pretty seamless to perform mailbox moves to a new mailbox database and when the old database is empty, simply delete it! This process can be made even better with use of the SuspendWhenReadyToComplete parameter. As an example:
New-MoveRequest -Identity 'User-21' -TargetDatabase DB01 –SuspendWhenReadyToComplete
This copies the vast majority of the mailbox content and then pauses. The administrator will manually resume the move request using Resume-MoveRequest. So this means we can copy mailbox content through the day with no user impact. After hours the suspended move can then be rapidly completed. This has to be one of my favourite Exchange 2010/2013 features!
The same logic can also be applied to a mailbox database that must be evacuated for other reasons. This may be necessary if file system AV has scanned the database as it will be in an unknown and thus unsupported state.
Note that the Mailbox Replication Service (MRS) is throttled, and if you wish to apply a little accelerando to the move process then you will need to take a look at the throttling configuration.