Timeouts or delays connecting to WinRM
Ran into an interesting issue the other day where WinRM connections were taking 15-20s per connection to the domain controllers and due to many connections from an automation system, connections would backup and then fail.
The only real hint we had was from the WinRM logs:
logman.exe start winrm -p Microsoft-Windows-Winrm -o winrm.etl –ets
logman.exe stop winrm -ets
tracerpt.exe winrmtrace.etl -of XML -o winrmtrace.xml
tracerpt.exe winrmtrace.etl -of CSV -o winrmtrace.csv
01F8.0548::2014-01-09 20:43:29. 781 [Microsoft-Windows-WinRM]Authorizing the user
01F8.0548::2014-01-09 20:43:47.717 [Microsoft-Windows-WinRM]The authorization of the user was done successfully
18 seconds in between this connection. After taking a look at some network traces and a debugging the svchost instance with wsmsvc.dll loaded ( the winRM service modeule)
tlist -m wsmsvc.dll
c:\windows\system32\wsmsvc.dll - 856 svchost.exe
We can see a call to LookupAccountName()
Child-SP RetAddr Call Site
00000000`022ae698 000007fe`ebfd31cd advapi32!LookupAccountNameW
00000000`022ae6a0 000007fe`ebfd3087 wsmsvc!Catalog::AddWmiGroupAce+0x101
00000000`022ae750 000007fe`ec076c78 wsmsvc!Catalog::CheckWmiGroupTimestamp+0x5b
In this case, it is looking for the group “WinRMRemoteWMIUsers__ " See http://msdn.microsoft.com/en-us/library/aa384295(v=vs.85).aspxfor more information:
Default Group Access
During setup, WinRM creates the local group WinRMRemoteWMIUsers__. WinRM then restricts remote access to any user that is not a member of either the local administration group or the WinRMRemoteWMIUsers__ group. You can add a local user, domain user, or domain group to WinRMRemoteWMIUsers__ by typing net localgroup WinRMRemoteWMIUsers__ /add <domain>\<username> at the command prompt. Optionally, you can use the Group Policy to add a user to the group.
The code here passes an unqualified name “WinRMRemoteWMIUsers” to advapi32!LookupAccountNameW which will then cause the delay because it fails to find it in the local domain and proceeds to try all the trusted domains – there were about a dozen trusted domains to enumerate and it did find it one eventually – it actually makes two calls, one to determine the amount of memory needed to allocate and another after the memory is allocated/
Resolution: created the WinRMRemoteWMIUsers group as a domain group and the delays went away.
In order to avoid situations like this where some code makes a plea for a name resolution without specifying any domain or machine – customers should implement LsaLookupRestrictIsolatedNameLevel - see http://support.microsoft.com/default.aspx?scid=kb;EN-US;818024
In addition, customers should consider implementing the ‘NeverPing’ value in HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Netlogon\Parameters
Theory on how it happened? WinRM was installed on a member server and the local group created – then the member server was promoted and the local group was lost as it became a DC.
EDIT: Thanks to another engineer I was made aware that this can also cause hi CPU in LSASS.EXE
Another error which you can hit is this one:
System.Management.Automation.Remoting.PSRemotingTransportException: Connecting to remote server SERVERFOO.CONTOSO.CON failed with the following error message : The WinRM client received an HTTP server error status (500), but the remote service did not include any other information about the cause of the failure. For more information, see the about_Remote_Troubleshooting Help topic.
This is because it could randomly write another ACE for the SID resolved from WinRMRemoteWMIUsers__
The process is something like this:
- Check for group - get SID
- Check security descriptor for the SID
- For WMI provider: CD to WSMan:\localhost\Plugin\WMI Provider
- SID not exist -> go write the ACE
- Next time - repeat since the SID returned is random based on SID lookup outside of the forest it may try to write yet another ACE w/ "new" SID for WinRMRemoteWMIUsers__
- Buffer allocated for SD is too small and we get back 0xc0000023 - return in the ETL trace shows "The authorization of the user failed with error 122"
01FC.14AC::08/04/2014-17:01:12.065 [Microsoft-Windows-WinRM/Analytic ] User CONTOSO\Administrator authenticated successfully using CredSSP authentication 01FC.14AC::08/04/2014-17:01:12.065 [Microsoft-Windows-WinRM/Analytic ] Authorizing the user 01FC.14AC::08/04/2014-17:01:24.246 [Microsoft-Windows-WinRM/Analytic ] An error was encountered while processing an operation. Error Code: 122 Error String:<f:WSManFault xmlns:f="http://schemas.microsoft.com/wbem/wsman/1/wsmanfault" Code="122" Machine="FOOSERVER.CONTOSO.COM"><f:Message>The data area passed to a system call is too small. </f:Message></f:WSManFault> EXTRADATA: 122, <f:WSManFault xmlns:f="http://schemas.microsoft.com/wbem/wsman/1/wsmanfault" Code="122" Machine="FOOSERVER.CONTOSO.COM"><f:Message>The data area passed to a system call is too small. </f:Message></f:WSManFault>, , , , 01FC.14AC::08/04/2014-17:01:24.246 [Microsoft-Windows-WinRM/Operational ] The authorization of the user failed with error 122 01FC.14AC::08/04/2014-17:01:24.246 [Microsoft-Windows-WinRM/Analytic ] Sending HTTP error back to the client due to a transport failure. The HTTP status code is 500
keyword: LookupAccoutName slow NeverPing LsaLookupRestrictIsolatedNameLevel