Friday Mail Sack: Drop the dope, hippy! edition
Hi all, Ned here again with an actual back to back mail sack. This week we discuss:
- Running out of USNs and Versions
- DFSR RDC LAN WAN FWIW AOK
- NPS and dotted NetBIOS domain names
- USMT and the case of the failing sourcepriority
- Revisiting NIC teaming
- Weird DFSR files
- MaxConcurrentAPI in depth (elsewhere)
- KB2663685 DFSR goodness
- Other stuff
I was reading an article that showed how to update the computer description every time a user logs on. A commenter mentioned that people should be careful as the environment could run out of USNs if this was implemented. Is that true?
This was a really interesting question. The current USN is a 64-bit counter maintained by each Active Directory domain controller as the highestCommittedUsn attribute on rootDSE. Being an unsigned 64-bit integer, that means 264-1, which is 18,446,744,073,709,551,615 (i.e. 18 quintillion). Under normal use that is never going to run out. Even more, when AD reaches that top number, it would restart at 1 all over again!
Let's say I want to run out of USNs though, so I create a script that makes 100 object write updates per second on at DC. It would take me 54 days to hit the first 1 billionth USN. At that rate, this means I am adding ~6.5 billion USN changes a year. Which means at that rate, it would take just under 3 billion years to run out on that DC. Which is probably longer than your hardware warranty.
My further thought was around Version metadata, which we don't document anywhere I can find. That is an unsigned 32-bit counter for each attribute on an object and again, so huge it is simply not feasible that it would run out in anything approaching normal circumstances. If you were to update a computer’s description every time a user logged on and they only had one computer, at 232-1 that means they have to logon 4,294,967,295 times to run out. Let’s say they logon in the morning and always logoff for bathroom, coffee, meetings and lunch breaks rather than locking their machines – call it 10 logons a day and 250 working days a year. That is still 1.7 million years before they run out and you need to disjoin, rename, and rejoin their computer so they can start again.
That said - the commenter was a bit off about the facts, but he had the right notion: not re-writing attributes with unchanged data is definitely a good idea. Less spurious work is always the right answer for DC performance and replication. Figure out a less invasive way to do this, or even better, use a product like System Center Config Manager; it has built in functionality to determine the “primary user” of computers, involving auditing and some other heuristics. This is part of its “Asset Intelligence” reporting (maybe called something else in SCCM 2012).
Interesting side effect of this conversation: I was testing all this out with NTDSUTIL auth restores and setting the version artificially high on an object with VERINC. Repadmin /showmeta gets upset once your version crosses the 231 line. :) See for yourself (in a lab only, please). If you ever find yourself in that predicament, use LDP's metadata displayer, it keeps right on trucking.
I find replication to be faster with RDC disabled on my LAN connected servers (hmmm, just like your blog said), so I have it disabled on the connections between my hub servers and the other servers on the same LAN. I have other servers connected over a WAN, so I kept RDC enabled on those connections.
By having some connections with RDC enabled and others disabled, am I making my hub server do ‘twice’ the work? Would it be better if I enabled it on all connections, even the LAN ones?
You aren’t making your servers do things twice, per se; more like doing the same things, then one does a little more.
Consider a change made on the hub: it still stage the same file once, compresses it in staging once, creates RDC signatures for it once, and sends the overall calculated SHA-1 file hash to each server once. The only difference will be that one spoke server then receives the whole file and the other spoke does the RDC version vector and signature chunk dance to receive part of the file.
The non-RDC LAN-based communication will still be more efficient and fast within its context, and the WAN will still get less utilization and faster performance for large files with small changes.
I'm trying to get Network Policy Server (RADIUS) to work in my environment to enable WPA-2 authentication from a slick new wireless device. I keep getting the error "There is no domain controller available for domain CONTOSO.COM" in the event log when I try to authenticate, which is our legacy dotted NetBIOS domain name. On a hunch, I created a subdomain without a dot in the NetBIOS name and was able to authenticate right away with any user from that subdomain. Do you have any tricks or advice on how to deal with NPS in a dotted domain running in native Windows 2008 R2 mode other than renaming it (yuck).
I don't even know how to spell NPS (it's supported by our Networking team) but I found this internal article from them. You are not going to like the answer:
Previous versions of IAS/NPS could not perform SPN lookups across domains because it treated the SPN as a string and not an FQDN. Windows Server 2008 R2 corrected that behavior, but now NPS is treating a dotted NetBIOS name as a FQDN and NPS performs a DNS lookup on the CONTOSO.COM name. This fails because DNS does not host a CONTOSO.COM zone.
That leaves you with three main solutions:
There might be some other workaround - this would be an extremely corner case scenario and I doubt we've explored it deeply.
The third solution is an ok short-term workaround, but Win2008 isn’t going to be supported forever and you might need some R2 features in the meantime. The first two are gnarly, but I gotta tell ya: no one is rigorously testing dotted NetBIOS names anymore, as they were only possible from NT 4.0 domain upgrades and are as rare as an honest politician. They are ticking time bombs. A variety of other applications and products fail when trying to use dotted NetBIOS domain names and they might not have a workaround. A domain rename is probably in your future, and it's for the best.
We are using USMT 4.0 to migrate data with the merge script sourcepriority option to always overwrite data on the destination with data from the source. No matter what though, the destination always wins and the source copy of the file is renamed with the losing (1) tag.
This turned out to be quite an adventure.
We turned on migdiag logging using SET MIG_ENABLE_DIAG=migdiag.xml in order to see what was happening here; that's a great logging option for figuring out why your rules aren’t processing correctly. When it got to the file in question during loadstate, we saw this weirdness:
<Pattern Type="File" Path="C:\Users\someuser\AppData\Local\Microsoft\Windows Sidebar [Settings.ini]" Operation="DynamicMerge,<unknown>"/>
Normally, it should have looked like:
<Pattern Type="File" Path="C:\Users\someuser\AppData\Roaming\Microsoft\Access\* [*]" Operation="DynamicMerge,CMXEMerge,CMXEMergeScript,MigXmlHelper,SourcePriority"/>
More interestingly, none of us could reproduce the issue here using the customer's exact same XML file. Finally, I had him reinstall USMT from a freshly downloaded copy of the WAIK, and it all started working perfectly. I've done this a few times in the past with good results for these kinds of weirdo issues; since USMT cannot be installed on Windows XP, it just gets copied around as folders. Sometimes people start mixing in various versions and DLLS, from Beta, RC, and hotfixes, and you end up with something that looks like USMT - but ain't.
Is teaming network adapters on Domain Controllers supported by Microsoft? I found KB http://support.microsoft.com/kb/278431.
(Updated) Maybe! :-D We're still in beta and need to get a final word. Sharp-eyed readers know I was already asked this before. However, I have a new answer for Windows Server: yes, if you use Windows Server "8" Beta.
Whoa, we joined the 1990s! Seriously though, NIC teaming is the bane of our Networking Support group's existence, so hopefully by creating and implementing our own driver system, we stop the pain customers have using third party solutions of variable quality. At least we'll be able to see what's wrong now if it doesn’t work.
For a lot more info, grab the whitepaper. I'm confirming the whole DC-specific aspect here as well. I have heard several stories now and I want to be nice and crisp; check back later. :)
What are the DFSR files $db_dirty$ , $db_normal$ , and $db_lost$ mentioned in the KB article Virus scanning recommendations for Enterprise computers that are running currently supported versions of Windows ? I only see $db_normal$ on my servers (presumably that's a good thing).
$Db_dirty$ exists after a dirty database shutdown and acts as a marker of that fact. $Db_normal$ exists when there are no database issues and is renamed to $db_lost$ if the database goes missing, also acting as a state marker for DFSR between service restarts.
Where is the best place to learn more about MaxConcurrentAPI?
Right here, and only quite recently:
Not a question (new DFSR functionality in KB 2663685)
If you missed it, we released a new hotfix for DFSR last month that adds some long-sought functionality for file server administrators: the ability to prevent DFSR from non-authoritatively synchronizing replicated folders on a volume where the database suffered a dirty shutdown:
Changes that are not replicated to a downstream server are lost on the upstream server after an automatic recovery process occurs in a DFS Replication environment in Windows Server 2008 R2 - http://support.microsoft.com/kb/2663685
DFSR now provides the capability to override automatic replication recovery of dirty shutdown-flagged databases. By default, the following registry DWORD value exists:
StopReplicationOnAutoRecovery = 1
If set to 1, auto recovery is blocked and requires administrative intervention. Set it to 0 to return to the old behavior.
DFSR writes warning 2213 event to the DFSR event log:
The DFS Replication service stopped replication on volume %2.
This occurs when a DFSR JET database is not shut down cleanly and Auto Recovery is disabled. To resolve this issue, back up the files in the affected replicated folders, and then use the ResumeReplication WMI method to resume replication.
1. Back up the files in all replicated folders on the volume. Failure to do so may result in data loss due to unexpected conflict resolution during the recovery of the replicated folders.
2. To resume the replication for this volume, use the WMI method ResumeReplication of the VolumeConfig class.
For example, from an elevated command prompt, type the following command:
wmic /namespace:\\root\microsoftdfs path dfsrVolumeConfig where volumeGuid="%1" call ResumeReplication
For more information, see http://support.microsoft.com/kb/2663685.
You must then make a decision about resuming replication. You must weigh your decision against the environment:
- Are there originating files or modifications on this server? You can use the DFSRDIAG BACKLOG command with this server as the sending member and each of its partners as the receiving member to determine if this server had any pending outbound replication.
- Do you need an out of band backup? You can check you latest backup logs and compare to file contents to see if you should first backup the RFs.
- Are the replicated folders read-only? If so, there is little reason to examine the server further and you can resume replication. It is impossible for the RO RFs to have originated changes in that case.
You then have several options:
- Resume replication. By executing the WMI method listed in the event, the database rebuild commences for all Replicated Folders on that volume. If the database cannot be rebuilt gracefully, DFSR deletes the database and performs initial non-authoritative sync. All data local in those replicated folders is fenced to lose conflict resolutions. Any files that do not match the SHA1 hash of upstream servers move to the circular ConflictAndDeleted folder and, potentially, lost forever.
Wmic /namespace:\\root\microsoftdfs path dfsrVolumeConfig where volumeGuid=" <some GUID> " call ResumeReplication
- Reconfigure replication on RFs to be authoritative. If the data is more up to date on the non-replicating RFs or the RFs are designed to originate data (such as Branch servers replicating back to a central hub for backups), you must manually reconfigure replication to force them to win.
Pandora.com is a great way to find new music; I highly recommend it. It can get a little esoteric, though. Real radio will never find you a string duo that plays Guns and Roses songs, for example.
AskDS reader Joseph Moody sent this along to us:
"Because I got tired of forwarding the Accelerating Your IT Career post to techs in our department, we just had it printed poster size and hung it on an open wall. Now, I just point to it when someone asks how to get better."
My wife wanted to be a marine biologist (like George Costanza!) when she was growing up and we got on a killer whale conversation last week when I was watching the amazing Discovery Frozen Planet series. She later sent me this tidbit:
"First, the young whale spit regurgitated fish onto the surface of the water, then sank below the water and waited.
If a hungry gull landed on the water, the whale would surge up to the surface, sometimes catching a free meal of his own. Noonan watched as the same whale set the same trap again and again. Within a few months, the whale's younger half brother adopted the practice.
Eventually the behavior spread and now five Marineland whales supplement their diet with fresh fowl, the scientist said."
Have you ever wanted to know what AskDS contributor Rob Greene looks like when his manager 'shops him to a Shrek picture? Now you can:
Have a nice weekend folks,