Managing Local Continuous Replication

Microsoft Exchange Server 2007 will reach end of support on April 11, 2017. To stay supported, you will need to upgrade. For more information, see Resources to help you upgrade your Office 2007 servers and clients.

 

Applies to: Exchange Server 2007, Exchange Server 2007 SP1, Exchange Server 2007 SP2, Exchange Server 2007 SP3

In addition to the tasks for day-to-day management and administration of an Exchange organization, there are tasks that are specific to local continuous replication (LCR). Generally, the administrative tasks for LCR are:

  • Configuring disk storage for LCR and managing disk volumes.

  • Enabling and disabling LCR.

  • Monitoring replication activity.

  • Mounting, dismounting, creating, and removing databases.

  • Moving the location for storage of storage group or database files when a storage group is LCR enabled.

  • Viewing status and configuration information for an LCR-enabled storage group or database.

  • Verifying the health of an active copy or passive copy of the LCR data.

  • Managing replication and replay activity.

  • Activating the passive copy.

Configuring Disk Storage for Local Continuous Replication

LCR does not require specially configured disk storage. We recommend that you isolate the copies from each other as much as practical. LCR requires storage that provides adequate performance and storage capacity. Equivalent storage solutions should be configured for both copies of storage groups and databases that are enabled for LCR. We also recommend that you follow the configuration procedures provided by your storage vendor to complete the configuration.

Managing Disk Volumes

While managing an LCR environment, it may be necessary to manage disk volumes that are connected to your Exchange server. For example, the volume may need to be temporarily detached from the system for maintenance or other reasons. If maintenance needs to be performed on the disk volume containing the active copy of the storage group, the database in the active copy of the storage group should be dismounted. If maintenance needs to be performed on the disk volumes containing the passive copy of the storage group, all input/output (I/O) to the volume should be stopped by halting replication. For more information about managing disk volumes, see How to Prepare for Disk Management Activities for an LCR Copy.

Enabling Local Continuous Replication

Using LCR begins with enabling a storage group for LCR. You can accomplish this task by using the Exchange Management Console or the Exchange Management Shell.

Note

When a storage group has been enabled for LCR, a second copy of the database in the storage group is created and automatically maintained in the location specified for the LCR copy.

Important

Before enabling LCR, make sure that you have sufficient disk space to store the LCR copy.

To use LCR, you need to enable the storage group for LCR. For detailed steps about how to enable an existing storage group for LCR, see How to Enable Local Continuous Replication for an Existing Storage Group. For detailed steps about how to create a new LCR-enabled storage group, see How to Enable Local Continuous Replication for a New Storage Group.

Disabling Local Continuous Replication

You can disable LCR for a storage group using either the Exchange Management Console or the Exchange Management Shell. For detailed steps about how to disable LCR, see How to Disable Local Continuous Replication.

Important

Deleting a storage group that contains an LCR copy deletes the LCR copy and the production copy.

Tuning the Default Configuration of the Transport Dumpster

The transport dumpster is a feature of the Hub Transport server role that submits recently delivered mail after an unscheduled outage. The Hub Transport server maintains a queue of mail that was recently delivered to a mailbox:

  • In a clustered mailbox server in a CCR environment

  • In a storage group that is enabled for LCR

The transport dumpster should always be turned on when using cluster continuous replication (CCR) or LCR. The transport dumpster is enabled organization-wide by setting the amount of storage available per storage group and setting the time to retain mail in the transport dumpster.

You can use the Set-TransportConfig cmdlet to change the default configuration settings for the transport dumpster, which are applied at the storage group level.

We recommend configuring the MaxDumpsterSizePerStorageGroup parameter, which specifies the maximum size of the transport dumpster queue for each storage group, to a size that is 1.5 times the size of the maximum message that can be sent. For example, if the maximum size for messages is 10 megabytes (MB), you should configure the MaxDumpsterSizePerStorageGroup parameter with a value of 15 MB.

We also recommend configuring the MaxDumpsterTime parameter, which specifies how long an e-mail message should remain in the transport dumpster queue, to a value of 7.00:00:00, which is seven days. Messages will be removed from the transport dumpster when the size specified by MaxDumpsterSizePerStorageGroup is reached. Otherwise, they will be removed from the transport dumpster when the time specified by the MaxDumpsterTime parameter has elapsed. This should be sufficient time to allow for an extended outage to occur without loss of e-mail messages.

When using the transport dumpster feature, additional disk space will be needed on the Hub Transport server to host the transport dumpster queues. The amount of storage space required is approximately equal to the value of MaxDumpsterSizePerStorageGroup multiplied by the number of storage groups on all clustered mailbox servers in a CCR environment and all LCR-enabled storage groups in the Active Directory directory service site containing the Hub Transport server. In a CCR environment, request for redelivery from the transport dumpster on all Hub Transport servers in the site is performed automatically. In an LCR environment, the request for redelivery from all Hub Transport servers in the site occurs as part of the Restore-StorageGroupCopy task.

For detailed steps about how to enable and configure the transport dumpster, see How to Configure the Transport Dumpster. For more information about the Restore-StorageGroupCopy cmdlet, see Restore-StorageGroupCopy.

Monitoring Replication Activity

The passive copy of a database is only useful if it is kept current. Although LCR does not require any special monitoring, we do recommend regularly monitoring each storage group to verify that it is replicating log files correctly. The Microsoft Exchange Server 2007 Management Pack for Microsoft Operations Manager 2005 includes alerts for several critical problems related to LCR environments:

  • The Microsoft Exchange Replication service is not running. Note that the event that generates this alert does not repeatedly appear after the service is stopped, so any alert associated with it would be lost if it were cleared.

  • The passive copy is in a failed state.

  • The passive copy is in a healthy state, but it is significantly behind in log copying or replay.

Any of the preceding alerts generated by the Exchange 2007 Management Pack should be investigated and resolved as quickly as possible.

An alternative to using the Exchange 2007 Management Pack for Microsoft Operations Manager 2005 is to regularly run a script that executes the Get-StorageGroupCopyStatus cmdlet in the Exchange Management Shell. The Get-StorageGroupCopyStatus cmdlet gives queue lengths that incorporate the number of logs generated by the active copy. For performance reasons, the queue length performance counters only report information that is known to the Microsoft Exchange Replication service. Under very rare conditions, this can be inconsistent with the state on the active copy. For more information about the Get-StorageGroupCopyStatus cmdlet, see "Viewing Status Information" later in this topic.

Mounting, Dismounting, Creating, and Removing Databases

It may occasionally be necessary to mount or dismount databases in an LCR environment. If the storage group or database needs reconfiguration or maintenance, you must block the services interacting with both while the activity is occurring. This could be required to perform a reconfiguration or to correct issues with the server or database. When the database is dismounted, it is frozen from further changes. Neither the database nor the log files are changed while the database is dismounted.

You may want to add a database to a storage group that is enabled for LCR. The process is similar to that used to add a database in a stand-alone configuration except the additional path must be provided.

You may want to remove a database from a storage group that is enabled for LCR. The process is identical to that used to remove a database in a stand-alone configuration except that there are two copies of the data to remove: the active copy of the database and the passive copy of the database. For detailed steps about how to remove a database from a storage group that is enabled for LCR, see How to Remove a Database from a Storage Group Enabled for Local Continuous Replication.

Moving the Location of Storage Group and Database Files

You can use both the Exchange Management Shell and the Exchange Management Console to change the location of a database in an LCR-enabled storage group. In an LCR configuration, there are two database files, one for each copy. The locations for both copies can be changed independently or in tandem.

Note

The database file names and file paths must be the same for the active and passive copies.

Similar procedures are used to reconfigure the location of the storage group log and system files and the location of the database files in an LCR environment. For detailed steps about how to change the location of log files and system files for an LCR-enabled storage group, see How to Move a Storage Group in a Local Continuous Replication Environment. For detailed steps about how to change the location of database files in an LCR environment, see How to Move a Database in a Local Continuous Replication Environment.

Important

Databases cannot be placed at the root of a volume.

Viewing Status Information

After LCR has been enabled for a storage group, you can use the Exchange Management Console or the Exchange Management Shell to view the LCR-specific configuration settings for the storage group and its database.

Status Information for LCR

Exchange 2007 publishes a variety of status information for LCR copies. The following table describes the status information that is available for LCR-enabled storage groups. For detailed steps that explain how to obtain status information, see How to View the Status of a Local Continuous Replication Copy. The following table lists the properties in the order that they appear when viewing the full output of the Get-StorageGroupCopyStatus Exchange Management Shell cmdlet.

Status information available for LCR-enabled storage groups

Property Description

Identity

Server and name of the queried storage group.

StorageGroupName

Name of the queried storage group.

SummaryCopyStatus

Current overall status of the LCR copy. Possible values are:

  • Not Supported   The current configuration does not support continuous replication.

  • Disabled   The storage group and its database object have HasLocalCopy set to 0.

  • Failed   Verification failed (database or logs were incompatible with each other), or the storage group is improperly configured for LCR.

  • Seeding   Database seeding is in progress.

  • Suspended   Transaction log copying and replay are stopped.

  • Healthy   Status is healthy and normal, and nothing is blocking or blocked.

Microsoft Exchange Server 2007 Service Pack 1 (SP1) adds two additional status values:

  • Initializing   No log files have been closed and the Microsoft Exchange Replication service is waiting for a closed log file to replicate.

  • Service Down   The Microsoft Exchange Replication service is not running or cannot be contacted.

Failed

Verification of the database or logs that identified an inconsistency that prevents replication. Alternatively, there is a configuration or access problem with the active or passive copy. Possible values are True and False.

FailedMessage

Textual message that identifies the condition that caused replication to fail. It may not be the only replication problem area.

Seeding

Seeding in progress. Possible values are True and False.

Suspend

Replication (and replay) halted for the passive copy. This prevents the database from advancing and logs from being copied. Possible values are True and False.

SuspendComment

Optional administrator comment providing a reason or note as to why replication activity was halted.

CopyQueueLength

Number of transaction log files waiting to be copied to the passive copy log file folder. A copy is not considered completed until it has been checked for corruption.

ReplayQueueLength

Number of transaction log files waiting to be replayed into the passive copy.

LatestAvailableLogTime

Time stamp on the source storage group of the most recently detected new transaction log file.

LastCopyNotificationedLogTime

Time associated with the last new log generated by the active storage group and known to the copy.

LastCopiedLogTime

Time stamp on the source storage group of the last successful copy of a transaction log file.

LastInspectedLogTime

Time stamp on the target storage group of the last successful inspection of a transaction log file.

LastReplayedLogTime

Time stamp on the target storage group of the last successful replay of a transaction log file.

LastLogGenerated

Last log generation number that was known to be generated on the active copy of the storage group.

LastLogCopied

Last log generation number that was successfully copied to the passive copy log folder.

LastLogNotified

Last log generation number generated by the active storage group and known to the copy.

LastLogInspected

Last log generation number that was inspected for consistency and corruption.

LastLogReplayed

Last log generation number that was successfully replayed into the passive copy of the storage group.

LatestFullBackupTime

Time of the last full backup.

LatestIncrementalBackupTime

Time of the last incremental backup.

SnapshotBackup

Backup taken using legacy streaming APIs or Volume Shadow Copy Service (VSS). Possible values are True and False.

You can quickly assess the health of an LCR copy by looking at the values for SummaryCopyStatus, CopyQueueLength, ReplayQueueLength, and LastInspectedLogTime. These properties show whether the LCR copy is functioning correctly, and whether the LCR copy is relatively current in both copying and replaying logs. If the following conditions occur, you should determine the cause and correct the problem:

  • The copy is spending significant time in a state other than healthy.

  • The copy queue length is more than 5.

  • The replay queue length is more than 20.

  • The last inspected log time does not show a current time. There are two likely reasons that could cause this: Either the storage group is not experiencing much change, or the Microsoft Exchange Replication service is stopped.

The replay queue length and copy queue length values are available as performance counters. They are the CopyQueueLength and ReplayQueueLength performance counters under the MSExchange Replication performance object. For details about monitoring performance counters for LCR, see How to View Performance Counters for Local Continuous Replication.

There are some rare scenarios where the replication status can be misleading. The following is a list of those scenarios:

  • A storage group that is not active (that is, not changing) can report as being healthy when it might not be healthy. This situation could occur because the unhealthy condition could not be detected until a log is replayed.

  • During replication initialization, the replication status is being evaluated and may not be accurate. When the initialization completes, the status is updated.

  • The value of the LastLogGenerated field can be wrong when a database is dismounted. However, all logs with end user content are replicated if the storage group copy is replicating.

  • When there are one or more missing logs in the middle of a log stream, the passive copy continues to try to recover. In doing so, the replication status switches between failed and healthy states. The replay and copy queues will continue to grow.

  • In some very rare conditions, a log can be successfully verified but it can still fail to replay. In this situation, the system will alternate between failed and healthy states as it attempts to recover. The replay and copy queues will continue grow.

Note

In Exchange 2007 SP1, you can also use a new cmdlet called Test-ReplicationHealth to verify the heath and status of storage groups enabled for continuous replication. For more information about the Test-ReplicationHealth cmdlet, see Test-ReplicationHealth and the "Test-ReplicationHealth Cmdlet" section in Monitoring Continuous Replication.

Viewing Configuration Information

You can view configuration information for LCR-enabled storage groups and databases by using the Exchange Management Console and the Exchange Management Shell. Configuration information includes:

  • Storage groups   The location of the LCR transaction log files and LCR system files.

  • Databases   The location of the LCR database copy.

Additionally, you can determine if a storage group or database is configured to have an LCR copy. For detailed steps about viewing LCR configuration settings, see How to View Local Continuous Replication Configuration Settings.

Verifying the Integrity of the Passive Copy

When you use LCR, we recommend that you verify the integrity of the passive copy periodically by running a physical consistency check against the database and transaction log files. A physical consistency check examines the transaction logs and database files for corruption. You can perform the check by using the Exchange Server Database Utilities tool (Eseutil.exe). For detailed steps about how to use Eseutil to check the transaction logs and database files for physical corruption, see How to Verify a Local Continuous Replication Copy.

Note

Before you run a physical consistency check against a database, you must temporarily suspend all replication activity against the storage group. You can suspend replication activity by using the Suspend-StorageGroupCopy cmdlet in the Exchange Management Shell or suspend replication activity through the Exchange Management Console. When the consistency check has completed, you can resume transaction log replay activity by using the Resume-StorageGroupCopy cmdlet. We recommend that you perform verification during non-production hours and minimize the amount of time that replay activity is suspended. This is because suspending the storage group copy halts all updates to the LCR copy, thus causing some content to be vulnerable to a failure.

Managing Replication and Replay

Managing log file replication and replay in an LCR environment involves the following main activities:

  • Halting replication to the storage group copy

  • Restarting replication to the storage group copy

Halting and Restarting Changes to the Storage Group Copy and its Database

It may be necessary to halt and restart transaction log replication activity. Transaction log replication (including replay) is controlled at the storage group level. Because a storage group can contain only one database, replication is localized to one database. Transaction log replication occurs when the Microsoft Exchange Replication service is running, a storage group has been enabled for LCR, and both the active copy and passive copy are operational. If either the active copy or passive copy becomes unavailable, you must stop replication. In addition, some administrative tasks, such as seeding, require a storage group that is enabled for LCR to suspend replication. If you need to stop all access to the passive copy's data files, you must suspend replication.

It may occasionally be necessary to control the activities of the passive copy. This could be required to perform a reconfiguration, or to correct issues with the server or the database. Halting log replay is also required to perform a physical consistency check of the passive copy. When it is necessary to control database copy updates, replication must be halted for the storage group copy. Replication may also need to be halted when the passive copy's logs are being manipulated. Because a storage group can only contain a single database, actions that affect replay behavior are controlled at the storage group level.

We recommend that all replication activity be halted when the location of the storage group or database is being changed.

For more information about halting replication changes to LCR copies, see How to Halt Replication for a Storage Group Enabled for Local Continuous Replication. For more information about restarting replication changes to LCR copies, see How to Restart Replication for a Storage Group That is Enabled for LCR. For more information about performing an integrity check on the passive copy's transaction logs and database file, see How to Verify a Local Continuous Replication Copy.

Activating the Passive Copy

LCR enables you to recover from corruption of the active copy of a storage group by activating the passive copy of the storage group. If the transaction logs in the active copy of the storage group are not corrupted, no data loss should occur. If the transaction logs from the active copy of the storage group are not available, the recovery can only bring the storage group back to a point in time that is consistent with the last non-corrupted set of changes that the passive copy received. An additional constraint is that there cannot be any missing or corrupted production transaction log files earlier than that point.

Recovery from production storage group corruption is easiest when NTFS file system volume mount points are used for storing the LCR copy. By using volume mount points, you can graft, or mount, a target partition into a folder on another physical disk. Volume mount points are transparent to programs, including Exchange 2007.

Corruption of a transaction log or database file that is part of an LCR copy can be detected either by errors produced through a replay operation or a consistency check. The corrective action to be taken, if any, depends on the nature of the corruption:

  • If the corruption occurs in a log file that has already been replayed, the corrupt log file can be safely ignored. However, if you are taking file system-based backups of the LCR copy, you should first delete all log files that have been replayed.

  • If the corruption is in an active copy's log file that has not been replayed, you must reseed the LCR storage group. Exchange attempts to again copy a log file if it detects corruption. If the automatic copy does not resolve the corruption, you must reseed the storage group. Additionally, we recommend that you verify the integrity of the source transaction logs and database file. Verifying Exchange data files requires that the files be offline and unavailable for access by users.

  • If the database is corrupted, you must reseed the storage group.

For detailed steps that explain how to activate the passive copy of a database, see How to Activate the Passive Copy of a Database.

Assessing Replication Status at the Time of Corruption

After a failure or corruption of a database copy, you need to assess if you want to immediately continue operation using the passive copy. LCR provides key pieces of information to aid in this decision:

  • Health of the copy at the time of failure

  • Replay and copy queues at the time of the failure

  • Last inspected log time at the time of the failure

You can obtain the information by using the Get-StorageGroupCopyStatus cmdlet. For detailed steps about how to obtain this information, see How to View the Status of a Local Continuous Replication Copy.

Note

The last inspected log time provides information about the most recently seen changes from the active copy. This information can help you detect failures that occur when the Microsoft Exchange Replication service is not started because the queue lengths are inaccurate when the Microsoft Exchange Replication service is stopped.

The copy queue length includes the best available information of the active copy at the time of failure. Based on this information and your assessment of the recovery time of the failed database, you must decide if the available copy is to be mounted:

  • If the replay queue length is significant, that means that recovery might take time but is not an indicator that significant data loss will be experienced.

  • If the copy queue length is significant, that means a significant number of logs have been lost. If the database is mounted, it will be restored to a time frame of approximately the last copied log (also provided by the Get-StorageGroupCopyStatus cmdlet).

  • If the last inspected log time is significantly prior to the time of the failure, it is likely that the Microsoft Exchange Replication service is stopped and other queue information is inaccurate.

Note

Due to latencies and communication failures, it is possible for copy queue length to be inaccurate, because the current state of the active copy is asynchronously updated. In general, the inaccuracy is limited to activities around a minute before and after the failure.

Note

A failed database cannot be used to seed a passive copy.