Cleaning lingering objects across the forest with ReplDiag.exe [Part 2 of 4]


Hi, Rob here, with part 2 of 4 in the series on ReplDiag from CodePlex. Last month, we looked at replication troubleshooting, now we can commence actual forest wide cleanup of lingering objects. Hence, the word “forest wide”. We will look at cleaning one partition (or NC “naming context”) in part 4 of the series.

Before we get started, please note that Enterprise Administrator privileges are required to perform cleanup. Also, the forest must be running Windows 2003 or greater on all DCs, and they must be online. The .Net Framework v2.0 SP1 must also be installed on the machine the cleanup will take place from, not necessarily being a domain controller. The latter is always good, but not required.

Let’s dig in… a simple ReplDiag /? To display syntax will be good:

------------------------------------------------------------------------Version:  2.0.3397.24022 --------------------------------------------------------------

Command Line Options:  ReplDiag [/Save] [/CheckForStableReplTopology] [/RemoveLingeringObjects] [/ImportData:<FileName.XML>] [/ShowTestCases] [/OverrideDefaultReferenceDC:"dc=namingcontext,dc=com"]

/UseRobustDCLocation -Query each and every DC for a list of DCs in forest.  Ensures replication instability does not cause any to be missed.
/Save -Save out the data from the current environment to XML.  File is named "ReplicationData.xml" and is located in the current directory.
/ImportData -Import the XML that was saved during a prior execution of this utility.  Run one of the other options to do something with the data.
/ShowTestCases -Show detail about test cases.

Lingering Object Cleanup:
/RemoveLingeringObjects -Use the current forest topology to clean all the NCs in the forest. WILL NOT CLEAN WINDOWS 2000 SYSTEMS!!!
/AdvisoryMode -Check for lingering objects only, do not clean. Must be used with /RemoveLingeringObjects.
/OverrideDefaultReferenceDC -Specify reference DC for a naming context when when removing lingering objects, can be used multiple times for different NCs. Only functional if using /RemoveLingeringObjects.
/OutputRepadminCommandLineSyntax -Output the command line syntax for repadmin. Only active in conjunction with /RemoveLingeringObjects.

Example syntax:
ReplDiag /Save
- Collect the AD replication topology from the environment and save it.
ReplDiag /ImportData:"ReplicationData.xml"
- Load in previously collected data and check replication status.
ReplDiag /RemoveLingeringObjects /OverrideDefaultReferenceDC: <continued below> "cn=Configuration,dc=contoso,dc=com" /OverrideDefaultReferenceDC:"dc=contoso,dc=com"


You will notice that currently, ReplDiag has several switches. The most important for now are /RemoveLingeringObjects and /Advisory mode. The remaining switches will allow advanced functionality, to be discussed in series 3 and 4 of this multi-part blog.

The output will look similar to this:

C:\ReplDiag>ReplDiag.exe /removelingeringobjects /advisorymode
Replication topology analyzer.  Written by
Version:  2.0.3397.24022
Command Line Switch:  /removelingeringobjects
Command Line Switch:  /advisorymode

Enumerating Forest:
        Forest Functional Level:  Windows 2000
Enumerating Domain:  - Found 1 DCs.
        Domain Functional Level:  Windows 2003
Enumerating Domain:        - Found 1 DCs.
        Domain Functional Level:  Windows 2003
Enumerating Domain:
Data collection duration:  0 seconds

Number Complete,Status,Server Name,Naming Context,Reference DC,Duration,Error Code,Error Message
Reference NCs cleaned in 0h:0m:0s.  Cleaning everything else against reference NCs.

The tool behaves in exactly the same way as using repadmin /removelingeringobjects, which calls DsReplicaVerifyObjects. The logic used to build the topology to clean the environment is exactly the same as the steps outlined in Glenn LeCheminant’s blog here.

Often times it is asked how long this will take to run. Unfortunately this is predicated on the number of instances of writable partitions, the size of the partitions, and the speed of the links connecting the DCs. This is optimized as much as possible by multi-threading the actions to clean the partitions. This is an additional advantage of using ReplDiag over doing the work manually as it will clean multiple partitions at one time. Due to needing to scrub the reference DC, it will run one thread per partition initially cleaning the reference DC. Once that is complete for ALL partitions, it will run multiple threads concurrently for all partitions against all DCs until all are cleaned.

Each DC will log a series of NTDS Replication Event IDs to begin telling you what the Lingering Object cleanup API (DsReplicaVerifyObjects function) is doing, hence the reason, the tool will not support Windows 2000 – because the API was not yet available in this now unsupported operating system. We recommend upgrading, even if you do have lingering objects, prior to enabling strict replication, to help clean up the entire forest.

In this particular run, Event ID 1938 and 1942 were logged, reporting what took place. You will have to scour the event logs using your favorite tool to aggregate the data and determine where objects were alongside the second section of the output, which describes various aspects, such as status, server name, NC, the reference used, duration of the check and any error codes or messages associated with the particular NC on the DC it scrubbed. In a number of large environments the number of lingering objects cleaned may be significant and will over run the Directory Service logs and it may be a good idea to increase the size of the logs beforehand if collecting this information is important.

Now that we kicked off a “safe pass”, known as an advisory mode run, we can begin cleanup by simply removing the /advisorymode switch. The process is similar, but much more automated and simpler than using the repadmin command to perform the work. However, there may be cases where a bit of further interaction by the administrator is necessary, hence the reason for the additional switches and perhaps even a few hidden ones too. We’ll discuss more advanced use of the tool in part #3 of 4 in my series on the tool.

So at this point, you may now proceed to test replication again. This can be done using ReplDiag as described in part 1 of this series.

Next month, we’ll take a look at part 3 in the series: “Why does ReplDiag error out with the message that the topology isn’t stable?”

See you soon!