Slow SMB/CIFS performance

Willem Kokke 21 Reputation points
2020-09-17T23:28:11.027+00:00

I've been struggling with the following issue for a few days now and I'm running out of ideas. Any help or ideas what to look at next would be greatly appreciated.

We're a small company (15-20 people) doing high end computer graphics. Our infrastructure is as follows:

Windows 2016 Essentials server which is our domain server, fully up to date.

A linux based 32 TB NAS server which:

  • supports SMB protocol 3.0.0
  • is joined to the domain
  • a share of which is mapped to the Z: drive using a GPO on all client machines.
  • has been rock solid.

Various Win 10 client machines (mostly 1909, some upgraded to 2004, fully up to date)

All are networked via a 1Gbps network.

I recently added 3 new machines (HP Z4 workstations) to the network. The only difference with the older machines that I can think of is that they were upgraded to 2004 BEFORE I joined them to the domain.

After putting them in production I noticed that some of our applications performed really badly on anything that accessed our projects on the Z: drive.

After some experimentation I narrowed it down to file enumeration of projects (mostly up to 15 files/folder, very occasionally up to 200. reasonably deep hierarchy though, 4 or 5 deep)

I wrote a simple console program in c++ that calls PathIsDirectory() from Shlwapi.h 500 times to a known folder on the Z: drive. (\server\company\projects\test). The test folder has about 40 sibling folders.

On the new machines this takes 1.7 seconds, and on all other machines it takes about 0.20 seconds.

This was the same whether i used the mapped Z: drive, \server\, \server.domain.local\ or \xxx.xxx.xxx.xxx\ to rule out mapped drive or dns issues.

Transferring large files is just as quick on both and maxes out the 1 Gbps connection on both.

After searching around I've excluded SMB1 issues (disabled on all machines), SMB Redirector Cache issues (which came up in regards to folders with many files mostly, which I don't have anyway) and driver issues.

I've compared the registry entries of the LanmanWorkstation hive and any other network settings I could think off, but they all match. I've checked all the access permissions I could think of as well even though there are no errors functionality wise. The event logs in event viewer also don't report on anything out of the ordinary.

I did a wireshark trace on both types of system, and checked the protocol specification, and everything is perfectly in sync and on time throughout the Protocol Negotiation, Session Setup, Tree Connect to "\server\company" and FSCTL_VALIDATE_NEGOTIATE_INFO stages of the protocol. Everything is exactly according to the SMB2 dialect 3.0.0 spec, and according to the features advertised by the server.

Then the deviation starts.

The older (correctly functioning) machines issues a Create Request File command to "projects\test", with the create options set to 0x00200000 (Open Reparse Point only), this succeeds and returns the file attributes with only DIRECTORY set to true. and that's it. The other 499 invocations are answered by the SMB Redirector cache client side, and never hit the wire I verified which I verified by setting the registry key DirectoryCacheLifetime to 0 and rerunning the trace. Disabling the cache shows the 500 individual invocations and the time duration goes up to 0.5 seconds (still much faster than the 1,7 seconds on the new machines)

Having seen the normal one, I now look at the slow one.

Instead if a Create Request File command to "projects\test" it attempts it to "projects" instead, the parent folder of which I'm interested in.
It then does a Find Request File with Pattern "*" which returns all sibling projects as well the "test" folder we're interested in.
Then another Create Request File is issues with no file name specified, which then return the final File Attributes with DIRECTORY set to 1.

This then repeats another 499 times with the only difference that the Find Request File returning all the sibling folders is no longer issued and but is cached (once again verified by setting DirectoryCacheLifetime to 0, which takes the runtime up to 3.7 seconds)

The closest I can work it out from this is that on the machines that have issues, PathIsDirectory must take a different path through the networking stack, in combination with the FileInfoCacheLifetime not working properly on the new machines.

I have verified that the networking dll's used on both machine are bytewise identical so if that is the case it must be configuration somewhere that I'm unaware of.

I'm going to give it a while and look at it with fresh eyes. Once again, happy with any and all suggestions!

Windows 10 Network
Windows 10 Network
Windows 10: A Microsoft operating system that runs on personal computers and tablets.Network: A group of devices that communicate either wirelessly or via a physical connection.
2,274 questions
{count} votes

Accepted answer
  1. Gary Nebbett 5,721 Reputation points
    2020-09-25T21:26:10.223+00:00

    Hello Willem,

    I could not respond directly to your response because there is a 1000 character limit on that type of message (which I exceeded) so I am responding via a new "answer".

    I use a variety of tools - you might recognize one of the screenshots that I used as being Microsoft Message Analyzer (now sadly discontinued). The other two screenshots are of tools that I developed myself: one is specialized for file and registry events (which essentially just correlates I/O request and completion events and maintains FileObject/FileKey/FileName relationships) and the other just exploits as much event metadata as possible to informatively present any type of event.

    If you have a copy of Microsoft Message Analyzer then that would be the best tool to use.

    I used the stack tracing functionality of ETW to capture the stack when PathIsDirectory "creates" (opens) the file, when CSC namespace is first referenced and when the SMB deferred open occurs. Here they are:

    fileinfo!FIETWLogFileCreate+0x173
    fileinfo!FIPreCreateCallback+0x2e7b
    fltMgr!FltpPerformPreCallbacks+0x2fd
    fltMgr!FltpPerformPreCallbacks+0x2fd
    fltMgr!FltpCreate+0x2f3
    ntoskrnl!IofCallDriver+0x59
    ntoskrnl!IoCallDriverWithTracing+0x34
    ntoskrnl!IopParseDevice+0x62b
    ntoskrnl!ObpLookupObjectName+0x78f
    ntoskrnl!ObOpenObjectByNameEx+0x201
    ntoskrnl!NtQueryAttributesFile+0x1e6
    ntoskrnl!KiSystemServiceCopyEnd+0x25
    ntdll!NtQueryAttributesFile+0x14
    KernelBase!GetFileAttributesW+0x85
    shlwapi!PathIsDirectoryW+0x52

    csc!CscStorepLowIoCreateFile+0x135:
    csc!CscStorepLowIoCreateFilePostedRoutine+0x88:
    csc!CscStorepLowIoPost+0x68:
    csc!CscEnpFindChild+0x166:
    csc!CscEnpFindOrCreateEntryEx+0x3e0:
    csc!CscEnFindOrCreateEntry+0x56:
    csc!CscStorepFindOrCreateEntryEx+0x1a5:
    csc!CscStoreFindEntryEx+0x46:
    csc!CscCreate+0x5f3b:
    rdbss!RxCollapseOrCreateSrvOpen+0x24d:
    rdbss!RxCreateFromNetRoot+0x7fc:
    rdbss!RxCommonCreate+0x143:
    rdbss!RxFsdCommonDispatch+0x5f4:
    rdbss!RxFsdDispatch+0x86:
    mrxsmb!MRxSmbFsdDispatch+0xf8
    ntoskrnl!IofCallDriver+0x59
    mup!MupiCallUncProvider+0xb8
    mup!MupStateMachine+0x59
    mup!MupCreate+0x1cf
    ntoskrnl!IofCallDriver+0x59
    fltMgr!FltpLegacyProcessingAfterPreCallbacksCompleted+0x15e
    fltMgr!FltpCreate+0x307
    ntoskrnl!IoCallDriverWithTracing+0x34
    ntoskrnl!IopParseDevice+0x62b
    ntoskrnl!ObpLookupObjectName+0x78f
    ntoskrnl!ObOpenObjectByNameEx+0x201
    ntoskrnl!NtQueryAttributesFile+0x1e6
    ntoskrnl!KiSystemServiceCopyEnd+0x25
    ntdll!NtQueryAttributesFile+0x14
    KernelBase!GetFileAttributesW+0x85
    shlwapi!PathIsDirectoryW+0x52

    mrxsmb20!Smb2AttemptDeferredOpen+0x104c2
    mrxsmb20!MRxSmb2Create+0x20b
    mrxsmb!SmbpShellCreateWithNewStack+0x22
    ntoskrnl!KxSwitchKernelStackCallout+0x2e
    ntoskrnl!KiSwitchKernelStackContinue
    ntoskrnl!KiExpandKernelStackAndCalloutOnStackSegment+0x18e
    ntoskrnl!KeExpandKernelStackAndCalloutInternal+0x33
    ntoskrnl!KeExpandKernelStackAndCallout+0x15
    mrxsmb!SmbShellCreate+0x20
    csc!CscCreate+0x6c36
    rdbss!RxCollapseOrCreateSrvOpen+0x24d:
    rdbss!RxCreateFromNetRoot+0x7fc:
    rdbss!RxCommonCreate+0x143:
    rdbss!RxFsdCommonDispatch+0x5f4:
    rdbss!RxFsdDispatch+0x86:
    mrxsmb!MRxSmbFsdDispatch+0xf8
    ntoskrnl!IofCallDriver+0x59
    mup!MupiCallUncProvider+0xb8
    mup!MupStateMachine+0x59
    mup!MupCreate+0x1cf
    ntoskrnl!IofCallDriver+0x59
    fltMgr!FltpLegacyProcessingAfterPreCallbacksCompleted+0x15e
    fltMgr!FltpCreate+0x307
    ntoskrnl!IoCallDriverWithTracing+0x34
    ntoskrnl!IopParseDevice+0x62b
    ntoskrnl!ObpLookupObjectName+0x78f
    ntoskrnl!ObOpenObjectByNameEx+0x201
    ntoskrnl!NtQueryAttributesFile+0x1e6
    ntoskrnl!KiSystemServiceCopyEnd+0x25
    ntdll!NtQueryAttributesFile+0x14
    KernelBase!GetFileAttributesW+0x85
    shlwapi!PathIsDirectoryW+0x52

    Gary

    0 comments No comments

3 additional answers

Sort by: Most helpful
  1. Andy YOU 3,071 Reputation points
    2020-09-18T07:01:24.66+00:00

    HI
    For slow SMB performnace issue, it is hard for us to analyze the cause from forum support level. If this issue is urgent, I would suggest you open a case with Microsoft where more in-depth investigation can be done so that you would get a more satisfying explanation and solution to this issue.

    You may find phone number for your region accordingly from the link below:

    https://support.microsoft.com/en-us/help/4051701/global-customer-service-phone-numbers

    Best Regards,

    Candy

    0 comments No comments

  2. Gary Nebbett 5,721 Reputation points
    2020-09-21T10:27:51.547+00:00

    Hello Willem,

    To get more insight into the reasons for the differences in behaviour, you could try using Event Tracing for Windows (ETW).

    I tried calling PathIsDirectory in a loop whilst collecting trace data from the Microsoft-Windows-Kernel-File and Microsoft-Windows-SMBClient providers. The behaviour that I observed matched that of your older machines.

    The recurring pattern in the Microsoft-Windows-Kernel-File data for each PathIsDirectory was:

    26104-z9.jpg

    One can see checks for Client Side Caching (CSC) in this trace. One can also collect stack traces for each of the events to get even more insight into what is happening.

    This screenshot gives a flavour of the data captured by the Microsoft-Windows-SMBClient provider:

    26088-z9.jpg

    And finally, this screenshot shows what the "unprocessed" (merged provider data, I/O requests and completions not correlated, etc.):

    26171-z9.jpg

    Gary


  3. Willem Kokke 21 Reputation points
    2020-11-23T14:34:48.05+00:00

    Thanks to @GaryNebbett-6715’s excellent help, I finally managed to track this down!

    It turned out to be caused by HP Sure Sense, a new security scanner application that came pre-installed on the new machines, but not on the older ones.

    Uninstalling it immediately fixed it, and all is well with the world.

    I hope your cause is the same @SterchiAndrPhilipp-9830 cause this wasn’t a lot of fun! 😂

    0 comments No comments