Friday Mail Sack: Now with 100% more words

Hi folks, Ned here again. It’s been nearly a month since the last Mail Sack post so I’ve built up a good head of steam. Today we discuss FRS, FSMO, Authentication, Authorization, USMT, DFSR, VPN, Interactive Logon, LDAP, DFSN, MS Certified Masters, Kerberos, and other stuff. Plus a small contest for geek bragging rights.

Clickity Clackity Clack.

Question

I’ve read TechNet articles stating that the PDC Emulator is contacted when authentication fails - in case a newer password is available - and the PDCE would know this. What isn't stated explicitly is whether the client contacts or the current DC contacts the PDCE on behalf of the client. This is important to us as our clients won’t always have a routable connection to the PDCE but our DCs will; a DMZ/Perimeter network scenario basically.

Answer

Excellent question! We document the password and logon behaviors here rather loosely: https://msdn.microsoft.com/en-us/library/cc223752(PROT.13).aspx. Specifically for the “bad password, let’s try the PDCE” piece, it works like this:

  • I have two DCs and a client.
  • The PDCE is named 2008r2-srv-01 (10.70.0.101).
  • The other DC is named 2008r2-srv-02 (10.70.0.102).
  • The client is named 7-x86-sp1-01 (10.70.0.111).
  • I configured the PDCE firewall to block ALL traffic from the client IP address. The PDCE can only hear from the other DC, like in your proposed DMZ. The non-PDCE and client can talk without restriction.

1. I use some bad credentials on my Windows 7 client (using RunAs to start notepad.exe as my Tony Wang account)

clip_image002[7]

2. Then we see this conversation:

clip_image002[9]

a. Frame 34, the client contacts his 02 DC with a Kerberos Logon request as Twang in the Contoso domain.

b. Frame 40, DC 02 knows the password is bad, so he then forwards the same Kerberos Logon request to the PDCE 01.

c. Frame 41, the PDCE 01 responds back to the 02 DC with KDC Error 24 (“bad password”).

d. Frame 45, the DC 02 responds back to the client with “bad password”.

3. User now gets:

clip_image002[11]

I described the so-called “urgent replication” here: https://blogs.technet.com/b/askds/archive/2010/08/18/fine-grained-password-policy-and-urgent-replication.aspx. That covers how account lockout and password changes processing will work (that’s DC to PDCE too, so no worries there for you).

Question

Can you help me understand cached domain logons in more detail? At the moment I have many Windows XP laptops for mobile users. These users logon to the laptops using cached domain logins. Afterwards they establish a VPN connection to the company network. We have some third party software that and group policies that don’t work in this scenario, but work perfectly if the user logs on to our corporate network instead of the VPN, using the exact same laptop.

Answer

We don’t do a great job in documenting how the cached interactive logon credentials work. There is some info here that might be helpful, but it’s fairly limited:

How Interactive Logon Works
https://technet.microsoft.com/en-us/library/cc780332(v=WS.10).aspx

But from hearing this scenario many times, I can tell you that you are seeing expected behavior. Since a user is logging on interactively with cached creds (stored here in an encrypted form: HKEY_LOCAL_MACHINE\Security\Cache) while offline to a DC in your scenario, then they get a network created and access resources, anything that only happens at the interactive logon phase is not going to work. For example, logon scripts delivered by AD or group policy. Or security policies that apply when the computer is started back up (and won’t apply for another 90-120 minutes while VPN connected – which may not actually happen if the user only starts VPN for short periods).

I made a hideous flowchart to explain this better. It works – very oversimplified  – like this:

   image

As you can see, with a VPN not yet running, it is impossible to access a number of resources at interactive logon. So if your application’s “resource authentication” only works at interactive logon, there is nothing you can do unless the app changes.

This is why we created VPN at Logon and DirectAccess – there would be no reason to make use of those technologies otherwise.

How to configure a VPN connection to your corporate network in Windows XP Professional
https://support.microsoft.com/kb/305550

Where Is “Logon Using Dial-Up Connections” in Windows Vista?
https://blogs.technet.com/b/grouppolicy/archive/2007/07/30/where-is-logon-using-dial-up-connections-in-windows-vista.aspx

DirectAccess
https://technet.microsoft.com/en-us/network/dd420463.aspx

If you have a VPN solution that doesn’t allow XP to create the “dial-up network” at interactive logon, that’s something your remote-access vendor has to fix. Nothing we can do for you I’m afraid.

Question

Can DFSR use security protocols other than Kerberos? I see that it has an SPN registered but I never see that SPN used in my network captures or ticket cache.

image

Answer

DFSR uses Kerberos auth exclusively. The DFSR client’s TGS request does not contain the DFSR SPN, only the HOST computer name. So the special looking DFSR SPN is - pointless. It’s one of those “almost implemented” features you occasionally see. :)

Let’s look at this in action.

Two DFSR (06 and 07) servers doing initial sync, talking to their DC (01). TGS requests/responses, using only the computer HOST name SPNs:

clip_image002[13]

Then DFSR service opens RPC connections between each server and uses Kerberos to encrypt the RPC traffic with RPC_C_AUTHN_LEVEL_PKT_PRIVACY, using RPC_C_AUTHN_GSS_NEGOTIATE and requiring RPC_C_QOS_CAPABILITIES_MUTUAL_AUTH. Since NTLM doesn’t support mutual authentication, DFSR can only use Kerberos:

clip_image002[15]

clip_image002[17] clip_image002[19]

If you block Kerberos from working (TCP/UDP 88), DFSR falls over and the service won’t start:

Event 1202
"Failed to contact domain controller..." with an extended error of  "160 - the parameter is incorrect"

Question

I am using the USMT scanstate /P option to get a size estimate of a migration. But I don’t understand the output. For example:

4096    434405376
0    426539816
512    427467776
1024    428611584
2048    430821376
4096    434405376
8192    446136320
16384    467238912
32768    512098304
65536    587988992
131072    812908544
262144    1266679808
524288    2189426688
1048576    4041211904

Answer

USMT is telling you the size estimate based on your possible NTFS cluster sizes. So 4096 means a 4096-byte cluster sizes will take 434405376 bytes (or 414MB) in an uncompressed store. Starting in USMT 4.0 though the /P option was extended and now allows you to specify an XML output file. It’s a little more readable and includes temporary space needs:

scanstate c:\store /o /c /ue:* /ui:northamerica\nedpyle /i:migdocs.xml /i:migapp.xml /p:usmtsize.xml

<?xml version="1.0" encoding="UTF-8"?>

<PreMigration>

  <storeSize>

    <size clusterSize="4096">72669229056</size>

  </storeSize>

  <temporarySpace>

    <size>151299104</size>

  </temporarySpace>

</PreMigration>

scanstate c:\store /o /c /nocompress /ue:* /ui:northamerica\nedpyle /i:migdocs.xml /i:migapp.xml /p:usmtsize.xml

<?xml version="1.0" encoding="UTF-8"?>

<PreMigration>

  <storeSize>

    <size clusterSize="4096">92731744256</size>

    <size clusterSize="0">92511635806</size>

    <size clusterSize="512">92538449408</size>

    <size clusterSize="1024">92565861376</size>

    <size clusterSize="2048">92620566528</size>

    <size clusterSize="4096">92731744256</size>

    <size clusterSize="8192">92958539776</size>

    <size clusterSize="16384">93413900288</size>

    <size clusterSize="32768">94341398528</size>

    <size clusterSize="65536">96226705408</size>

    <size clusterSize="131072">100214767616</size>

    <size clusterSize="262144">108447399936</size>

    <size clusterSize="524288">125118185472</size>

    <size clusterSize="1048576">159657230336</size>

  </storeSize>

  <temporarySpace>

    <size>158364704</size>

  </temporarySpace>

</PreMigration>

Sheesh, 72GB compressed. I need to do some housecleaning on this computer…

Question

I was poking around with DFSRDIAG.EXE DUMPMACHINECFG and I noticed these polling settings. What are they?

image

Answer

Good eye. DFSR uses LDAP to poll Active Directory in two ways in order to detect changes to the topology:

1. Every five minutes (hard-coded wait time) light polling checks to see if subscriber objects have changed under the computer’s Dfsr-LocalSettings container. If not, it waits another five minutes and tries again. If there is something new, it does a full LDAP lookup of all the settings in the Dfsr-GlobalSettings and its Dfsr-LocalSettings container, slurps down everything, and acts upon it.

image image

2. Every sixty minutes (configurable wait time) it slurps down everything just like a light poll that detected changes, no matter if a change was detected or not. Just to be sure.

Want to skip these timers and go for an update right now? DFSRDIAG.EXE POLLAD.

Question

While reviewing FRS KB266679 I noted:

"The current VV join is inherently inefficient. During normal replication, upstream partners build a single staging file, which can source all downstream partners. In a VV join, all computers that have outbound connections to a new or reinitialized downstream partner build staging files designated solely for that partner. If 10 computers do an initial join from \\Server1, the join builds 10 files in stage for each file being replicated."

Is this true – even if the file is identical FRS makes that many copies? What about DFSR?

Answer

It is true. On the FRS hub server you need staging as large as the largest file x15 (if you have 15 or more spokes) or you end up becoming rather ‘single threaded’; a big file goes in, gets replicated to one server, then tossed. Then the same file goes in, gets replicated to one server, gets tossed, etc.

Here I create this 1Gb file with my staging folder set to 1.5 GB (hub and 2 spokes):

clip_image002[21]

Note how filename and modified are changing here in staging as it goes through one a time, as that’s all that can fit. If I made the staging 3GB, I’d be able to get both downstream servers replicating at once, but there would definitely be two identical copies of the same file:

clip_image002[23] clip_image002[25]

Luckily, you are not using FRS to replicate large files anymore, right? Just SYSVOL, and you’re planning to get rid of that also, right? Riiiiiiiiggghhhht?

DFSR doesn’t do this – one file gets used for all the connections in order to save IO and staging disk space. As long as you don’t hit quota cleanup, a staged file will stay there until doomsday and be used infinitely. So when it works on say, 32 files at once, they are all different files.

Question

Are there any DFSR registry tuning options in Windows Server 2003 R2? This article only mentions Win2008 R2.

Answer

No, there are none. All of the OS non-specific ones listed are still valuable though:

  • Consider multiple hubs
  • Increase staging quota
  • Latest QFE and SP
  • Turn off RDC on fast connections with mostly smaller files
  • Consider and test anti-virus exclusions
  • Pre-seed the data when setting up a new replicated folder
  • Use 64-bit OS with as much RAM as possible on hubs
  • Use the fastest disk subsystem you can afford on hubs
  • Use reliable networks <-- this one is especially important on 2003 R2 as it does not support asynchronous RPC

Question

Is there a scriptable way to change do what DFSUTIL.EXE CLIENT PROPERTY STATE ACTIVE or Windows Explorer’s DFS’ Set Active tabs do? Perhaps with PowerShell?

image

Answer

In theory, they could implement what the DfsShlEx.dll is doing in Windows Explorer:

NetDfsSetClientInfo

Not a cmdlet (not even .NET), but could eventually be exposed  by .NET’s DLLImport and thusly, PowerShell. Which sounds really, really gross to me.

Or just drive DFSUTIL.EXE in your code. I hesitate to ask why you’d want to script this. In fact, I don’t want to know. :)

Question

Are there problems with a user logging on to their new destination computer before USMT loadstate is run to migrate their profile?

Answer

Yes, if they then start Office 2007/2010 apps like Word, Outlook, Excel, etc. portions of their Office migration will not work. Office relies heavily on reusing its own built-in ‘upgrade’ code:

https://support.microsoft.com/kb/2023591

Note To migrate application settings, you must install applications on the destination computer before you run the loadstate command. For Office installations, you must run the LoadState tool to apply settings before you start Office on the destination computer for the first time by using a migrated user. If you start Office for a user before you run the LoadState tool, many settings of Office will not migrate correctly.

Other applications may be similarly affected, Office is just the one we know about and harp on.

Question

I am seeing very often that a process named DFSFRSHOST.EXE is taking 10-15% CPU resources and at the same time the LAN is pretty busy. Some servers have it and some don’t. When the server is rebooted it doesn’t appear for several days.

Answer

Someone is running DFSR health reports on some servers and not others – that process is what gathers DFSR health data on a server. It could be that someone has configured scheduled reports to run with DFSRADMIN HEALTH, or is just running it using DFSMGMT.MSC and isn’t telling you. If you have an enormous number of files being replicated the report can definitely run for a long time and consume some resources; best to schedule it off hours if you’re in “millions of files” territory, especially on older hardware and slower disks.

Question

FRS replication is not working for SYSVOL in my domain after we started adding our new Win2008 R2 DCs. I see this endlessly in my NTFRS debug logs:

Cmd 0039ca50, CxtG c2d9eec5, WS ERROR_INVALID_DATA, To   DC2.mydomain.contoso.com  Len:  (436) [SndFail - rpc call]

Is FRS compatible between Win2003 and Win2008 R2 DCs?

Answer

That type of error makes me think you have some intrusion protection software installed (perhaps on the new servers, in a different version than on the other servers) or something is otherwise altering data on the network (such as when going through a packet-inspecting firewall).

We only ever see that issues when caused by a third party. There are no problems with FRS talking to each other on 2003, 2008, or 2008 R2. The FRS RPC code has not changed in many years.

You should get double-sided network captures and see if something is altering the traffic between the two servers. Everything RPC should look identical in both captures, down to a payload level. You should also try *removing* any security software from the 2 DCs and retesting (not disabling; that does nothing for most security products – their drivers are still loaded when their services are stopped).

Question

When I run USMT 4.0 scanstate using /nocompress I see a catalog.mig created. It seems to vary in size a lot between various computers. What is that?

Answer

It contains all the non-file goo collected during the gather; mainly the migrated registry data.

 

Other Stuff

James P Carrion has been posting a very real look into the MS Certified Masters program as seen through the eyes of a student working towards his Directory Services cert. If you’ve thought about this certification I recommend you read on, it’s fascinating stuff. Start at the oldest post and work forward; you can actually see his descent into madness…

----------

Microsoft uses a web-based system for facilities requests. The folks that run that department are excellent and the web system usually works great. Every so often though, you get something interesting like this…

image
Uuuhhh, I guess I can wait to see how that pans out.

-----------

And finally here is this week’s Stump the Geek contest picture:

image

Name both movies in which this picture appears. The first correct reply in the Comments gets the title of “Silverback Alpha Geek”. And nothing else… it’s a cruel world.

Have a good weekend folks.

- Ned “hamadryas baboon” Pyle