Kerberos Authentication Problem with Active Directory
Recently I had to work with Kerberos and we faced the following problems.
1. Kerberos consistently NOT work for some user(s) throwing “400 Bad Request” error.
2. Kerberos works intermittently. In other words, user will not be authenticated on Kerberos (falls back to NTLM) for 5 minutes or so (no definite period) and then automatically Kerberos will start working for that user. However this won’t happen for all users at the same time.
3. “temp-id” always works for everybody.
Due to this problem, some of the applications heavily dependent on Kerberos such as Federated search, SAP Integration, and RSS Viewer Web Parts will fail; as Kerberos authentication falls back to NTLM.
After research, we found out the problem is in AD (Active Directory), as user belongs to many groups.
1. Joe Doe (for the purpose of the blog, will go with this id) is member of 123 groups in Active Directory.
2. Other users’ regular id’s such were member of 10 groups in Active directory.
a. Note: These 10 groups have nested groups – so it is not really 10 groups, it could be more.
3. “temp-id” which always works for everybody including Joe Doe is a member of 2 groups (Domain users and SharePoint developers)
Scenario 1: Kerberos consistently NOT working for some user(s) throwing “400 Bad Request” error
Why Joe Doe’s regular id did not work
Joe Doe’s regular id did not work because of his token size and header size. His token size was about 11K. However his “temp- id’ token size is 200 bytes. Hence, it is problem with token size and header size. In below section, I will try to explain what that it means and how it is related to Kerberos problem.
What does header size means
HTTP requests/responses contain two parts: the header and the body. The header contains most of time technical information exchanged between the client and the server, the body contains user-oriented information like the content of a webpage or a file to download, for example. The error message ere above is therefore generated by the server because the request sent by the client contains a header that is simply too large compared to what the server expects.
The factors that makes header section large will depends on how browser was configured (and the underlying OS as well in some case), but most of time, the culprits of larger header are cookies (header: Cookie) and authentication information (Header: Authorization).
When this error is experienced on an Internet site, there is not much we can do except cleaning up our cookies and hoping it will work afterwards while.
When this error shows up in an intranet environment when web servers are running IIS and possibly SharePoint, SQL Server Reporting Services or Exchange with OWA on top of it, this is caused by a combination of multiple factors which are the following:
· The web server (or web site) was configured to use Integrated Windows Authentication and Kerberos in particular.
· The client is able to authenticate using Kerberos (the client system is member of an AD forest, the user too)
· The size of user’s security token is large. By default, token size is 12000 bytes. However, Joe Doe’s token size was around 14k bytes . This can be caused by
o The user being member of many AD groups (hundreds of groups)
o The user’s object in AD contains SID (Security Identifier) history information as consequence of a domain migration/consolidation.
o The group the user is member of is also affected by SID history, just like the user.
o IIS is configured by default.
· The client sends Kerberos –based authentication AND authorization information. Unlike NTLM and Basic which only send authentication information, therefore smaller, Kerberos includes information such as group membership and SID history information in the request’s header.
Hence depending on the user’s group membership and SID history information, some users may be affected and other not.
How is it related to Kerberos Problem (specifically Joe Doe)
As mentioned above, Joe Doe is having more than 100 hundreds group, his regular id is not working for him.
Solution for Joe Doe’s problem
Hopefully, there are many solutions to work around this problem but all of them have their trade-offs:
a. Use IP address in the URL instead of host names
Since Kerberos solely works with host names, using UIP addresses will automatically force negotiation to NTLM instead. Of course, this does not only degrades security but it also involves hard coding IP’s, which is rarely practically and also means that this can work if you only host one and only one web site on their IIS, finally, if application uses “delegation” of credentials, which is a Kerberos feature, it will not work anymore (I am thinking about SQL SRS in particular in this case, or even Exchange OWA when used in FE-BE scenario’s)
Since most of the companies will have more than one web application, we cannot go with this approach.
b. Configure IIS to use NTLM only
As part of the integrated windows Authentication setup, you can simply configure IIS to use NTLM only; the following MS KB article will show you how to do so: https://support.microsoft.com/kb/215383.
Although this configuration is less impacting that the first one, it is still left with lower security as well as compatibility issue with Kerberos is required by its application.
Kerberos depended application such as Federated Search, SAP Integration, Rss Web part won’t work. Federated Search may work with NTLM by setting up with one ID (hard-coding Id) and will miss the security trimming, which is big.
c. Configure IIS to accept larger headers
You can do so by configuring IIS in registry. It is important to note that this configuration will apply to all web sites running on the system running IIS because this settings is used at kernel-component level (http.sys), it will therefore impact all ECM applications on that server.
MS KB article explains all those settings: https://support.microsoft.com/kb/820129
The 2 registry keys to fix this issue are:
Default Value: 16384
Min – Max Value to set: 64 - 65534 (64kb) bytes
Sets an upper limit for each header. See MaxRequestBytes. This limit translates to approximately 32k characters for a URL.
Default value: 16384
Min-Max value to set: 256 - 16777216 (16MB) bytes
Determines the upper limit for the total size of the Request line and the headers.
Its default setting is 16KB.
If this value is lower than MaxFieldLength, the MaxFieldLength value is adjusted.
As KB article suggest, this has to be done after very careful design and thought, because this will increase the memory used by the system (kernel memory) to handle requests. On a “busy” (read: getting a lot of requests, not fewer large requests) 32-bit system, this can exhaust kernel memory. If the boot.ini switch /3GB is used (possibility combined to /USERVA), the situation can get worse since less memory is available to the kernel. On a 64-bit system, this configuration is harmless, even if the application is running in 32-bit mode, since this is handled in kernel mode. Of course, take care of what ECM application is doing with those headers too.
d. Cleanup AD users
by reducing (read optimizing) their group membership and removing SID history information from both user’s and group’s AD object attribute. Though this solution will be profitable in all scenario and not only web authentication (faster logon, less memory usage on application servers, Exchange mailbox servers…), you need to implement with a careful impact assessment.
MS KB article explains how to automate this task: https://support.microsoft.com/kb/295758
Token Size Problem
This is another problem with user having larger (more than 70) groups in AD.
The Kerberos token has a fixed size. If a user is a member of a group directly or through group nesting (which is mostly likely case here) the SID for that group is added to the user’s token. Once a SID is added to the users token it is passed via the Kerberos token during each authentication. If the required SID information exceeds the size of the token, authentication does not succeed.
How the Access Token Limitation Problem Can Occur
Any entity that can be authenticated by the security system in an Active Directory environment is referred to as a security principal. A user is an example of a security principal. A security context is information that describes the identity and capabilities of a security principal on a computer. In Windows Server 2003 all activities take place in a security context.
The security context of a security principal is represented by an access token. The access token includes a list of security identifiers (SIDs) and there is a limit (1,024) to the number of SIDs the token can contain. If this limit is exceeded, a denial of service, such as a user not being able to log on, can occur.
This section describes the following:
· How access tokens are created
· How the access token limit is reached.
· Symptoms that indicate that the access token limitation has been reached.
How Access Tokens Are Created
An access token is created whenever a user or any security principal logs on to a computer, or attempts to access a resource, as part of the authentication process. An access token contains information about the identity and privileges associated with the security principal (user, group, computer, or domain controller). Every process has a token that describes the security context of the principal's account associated with the process.
A security identifier (SID) is a unique value that identifies a security principal. A SID is issued to every security principal when it is created. Security groups are also security principals, and therefore are uniquely identified by SIDs. A user security principal can be a member of multiple security groups. Consequently, a user’s access token includes SIDs of all groups to which the user is a member.
In the following example, during Windows-based authentication, an access token is created when a user logs on in the following manner:
1. When a user logs on interactively or tries to make a network connection to a computer running Windows, the user’s logon credentials are authenticated.
2. If authentication is successful, the logon process returns a SID for the user and a list of SIDs for the user’s security group membership.
3. The Local Security Authority (LSA) on the computer uses this information to create an access token that includes the SIDs returned by the logon process. The token also includes a list of privileges assigned by local security policy to the user and to the user’s security groups. The LSA uses process called “Token evaluation” to determine which security groups to include in the token.
Note: Specific protocols like NTLM and Kerberos use different processes to create an access token.
This process of acquiring the SIDs for the user and user's group memberships is called the "token evaluation process."
Factors Affecting Token Evaluation
Several factors can affect the outcome of the token evaluation process, including the following:
· Whether the token is issued for logon purposes or for resource access.
· The groups that the principal is a member of, including direct and transitive memberships.
· The types of groups involved.
· There are two types of groups in Active Directory: distribution groups and security groups. Distribution groups are not included in the principal’s token, but all security groups are included. All group scopes (universal, global, domain local, machine local, and built-in) are included in the token evaluation.
· The functional level (for Windows server 2003)
The token evaluation process evaluates groups’ recursively. For example, if User A is a member of Group 1 and Group 1 is a member of Group 2, then a token generated for User A contains SIDs representing both Group 1 and Group 2. In native mode and higher domains, universal, global, and domain local groups are all evaluated recursively. Universal security groups do not exist in mixed mode domains.
How SIDs Are Added to a Token
The examples in this section show how SIDs are added to a user's token in two instances:
· When the user logs on
· When the user accesses a resource.
For each of these instances, the process is described for both NTLM and Kerberos authentication in the following sections.
How SIDs Are Added When the User Logs on to a Network
The following figure shows how SID(s) is added to a user's token when the user attempts to log on with NTLM authentication.
When the user attempts to log on to a network with NTLM authentication, the following process occurs:
1. The workstation collects the user’s credentials and passes them to a domain controller in the account domain.
2. The domain controller in the account domain adds global groups to the user’s token and passes the updated token list to the account domain global catalog server.
3. The workstation receives the list of SIDs and retrieves all of the local groups. The resulting union is the SIDs in the user token.
When the user attempts to log on in an environment with Kerberos authentication, the following process occurs:
1. The Kerberos client on the workstation uses the credentials from the user to request a Ticket Granting Ticket (TGT) from the Kerberos Key Distribution Center (KDC) in the user's domain.
2. The KDC obtains the list of the user's SIDs from a domain controller in the user's account domain. The KDC also queries the global catalog server and obtains any universal groups that include the user or the user's domain security groups. The KDC adds the user's SIDs and the SIDs from any applicable universal groups to the list in the TGT's authorization data field, and returns the TGT to the computer.
3. Once the TGT is received, the Kerberos Client requests a service ticket for access to the local workstation
4. The KDC copies the contents of the TGT's authorization data field to the service ticket's authorization data field. The service ticket is the token, and there can be no more than 1,024 SIDs in the token.
How the Access Token Limit Is Reached
When a user logs on and authentication is successful, the logon process returns a SID for the user and a list of SIDs for the user’s security groups and these comprise the access token. SID history can add additional SIDs to the token. The SIDs in an access token includes:
· The security principal's SID, including SIDs from the SID history of the principal.
· The SID from each domain local group that the principal is directly or transitively a member of, for the domain of the workstation or resource.
· The SID for each global group that the principal is directly or transitively a member of, including SIDs from the SID history of the group.
· The SID for each universal group that the principal is directly or transitively a member of, including SIDs from the SID history of the group.
· The SID for each built-in group the principal is directly or transitively a member of.
· The SID for each local group that the principal is directly or transitively a member of.
Due to a system limitation, the field that contains the SIDs of the principal's group memberships in the access token can contain a maximum of 1,024 SIDs. If there are more than 1,024 SIDs in the principal's access token, the Local Security Authority (LSA) cannot create an access token for the principal during the logon attempt. If this happens, the principal cannot log on or access resources.
In environments that use SID history, each security principal can have two or more SIDs. An additional SID is optionally added to the sIDHistory attribute when a security principal is migrated. Since groups, as well as users, can have SID history, the token of a migrated user with migrated groups can potentially have double the number of SIDs compared to a user that is not migrated.
Note : To reduce the token size of migrated users, ensure that your migration plans include security translation and retirement of the sIDHistory attribute, when possible.
There are two common ways in which the access token limit is exceeded:
· Large fan-out group structure, where a principal is directly a member of many groups, or is a member of a group that is directly a member of many groups.
· Deep nesting group structure, where a principal is a member of a group that results in a large number of transitive memberships.
Either of these structures is possible when an administrator creates groups to carry out legitimate authorization requirements of an organization.
Large Fan-out Group Structure
Large Fan-out Group structure scenario is applied to very few people (Example: Joe Doe). At this time, as we don’t know how much user(s) are affected with this structure, this has been described here.
The large fan-out group structure involves principals being members of many different account and resource groups. This can happen due to legitimate business needs. For example, consider the following characteristics:
· Operations in multiple regions.
· Activities that span multiple specialties.
· A large number of principals that access a large number of resources.
In order to address business requirements such as these, administrators (Joe Doe) might create hundreds of account and resource groups and use group nesting to facilitate required access for all principals in the organization. In this instance, taking into account group nesting, it is possible that a principal may end up being a member of more than 1,024 groups.
The following figure illustrates a large fan-out group structure.
Deep Nesting Group Structure
The deep nesting group structure involves creating groups that are nested within other groups. The following figure illustrates a deep nesting structure.
Since group membership is evaluated recursively, if a user is transitively a member of a group that is nested at 50 levels, that user is also a member of every other group in that hierarchy. The user is also a member of any groups that those groups are members of.
Who Can Cause the Problem
For Active Directory in Windows Server 2003, there are two types of administrative responsibilities:
· Service administrators are responsible for maintaining and delivering the directory service, including domain controller management and directory service configuration.
· Data administrators are responsible for maintaining the data that is stored in the directory service and on domain member servers and workstations.
Service administration accounts and groups have the most widespread power in a network environment and require the most protection. They are responsible for directory-wide settings, installation and maintenance of software, and application of operating system service packs and updates on domain controllers.
In a typical Active Directory environment, the following service administrator groups are capable of creating groups and potentially causing access token limitation problems:
· Default groups in the Builtin container:
b. Server Operators
c. Backup Operators
d. Account Operators
e. Print Operators
· Default groups in the Users container:
a. Enterprise Admins
b. Schema Admins
c. Domain Admins
Administrators, Enterprise Admins, and Domain Admins, have the broadest range of permissions. Schema Admins can change the default security descriptor of the group class and thereby give write permissions to anyone in the forest. Account Operators have write permissions to any group in the domain and therefore can modify membership of any group.
In addition, delegated data administrators with the following permissions can create groups or modify memberships that can potentially result in users reaching the access token limitation:
· Any individual who has any of the following permissions in Active Directory on a container or OU or on the domain:
a. Full control
b. Modify owner
c. Modify permissions
d. Create containers
e. Create OUs
f. Create groups
· Any individual specifically delegated with any of the following permissions:
a. Create objects of type Group.
b. Write permissions to the member attribute of a security group.
c. Write permissions to the group-type attribute of a distribution group and write permissions to the member attribute of that group.
For recommendations regarding delegating Active Directory administration, see the topic Best Practices.
How to calculate token size
Following formula to determine whether it is necessary to modify the MaxTokenSize value or not
TokenSize = [12 X number of user rights] + [token overhead] + [40 X number of group memberships] + 8s
This formula uses the following values:
· d: The number of domain logical groups a user is a member of plus the number of universal groups outside the user’s account domain plus the number of groups represented in SID history.
· s: The number of security global groups that a user is a member of plus the number of universal groups in a user’s account domain.
· User rights include rights such as “Log on locally” or “Access this Computer from the network”. The only user rights that are added to an access token are those user rights that are configured on the server that hosts a secured resource. Most of the users are likely to have only two or three user rights on the Exchange server. Administrators may have dozens of user rights. Each user right requires 12 bytes to store it in the token.
· Token overhead includes multiple fields such as the token source, expiration time, and impersonation information. For example, a typical domain user has no special access or restrictions; token overhead is likely to be between 400 and 500 bytes.
· Estimated value for ticket overhead can vary depending on factors such as DNS domain name length, client name and other factors.
· Each group membership adds the group SID to the token together with an additional 16 bytes for associated attributes and information. The maximum possible size for SID is 68 bytes. Therefore, each security group to which a user belongs typically adds 44 bytes to the user’s token size.
In scenarios in which delegation is used (for example, when users authentication to a domain controller), Microsoft recommends to double the token size.
Default token size is 12000.
Token Memory allocation
If a token is less than 4 KB, the amount of kernel memory that is allocated for it is exactly what is required to hold the token.
By using the formula this mentioned in the “How to calculate token size” section, my token will be about 2040 bytes .
But if a token is even slightly larger than 4 KB (4096 bytes) the amount of memory that is allocated per copy will jump to exactly 8 KB (8192 bytes). If a token is even slightly larger than 8 KB, the memory allocation will jump to exactly 12 KB. Therefore every time the token sizes crosses one of these critical 4-KB boundaries, there is a sudden jump in the use of paged pool memory and user will have intermitted results.
How to fix the token size problem (Solution)
A registry parameter is available to increase the Kerberos token size. For example, increasing the token size to 65 KB allows a user to be present in more than 900 groups. Because of the associated SID information, this number may vary.
To use this parameter:
1. Start Registry Editor (Regedt32.exe).
2. Locate and click the following key in the registry:
3. If this key is not present, create the key. To do so:
a. Click the following key in the registry:
b. On the Edit menu, click Add Key.
c. Create a Parameters key.
d. Click the new Parameters key.
4. On the Edit menu, click Add Value, and then add the following registry value:
Value name: MaxTokenSize
Data type: REG_DWORD
Value data: 65535
5. Quit Registry Editor.
The default value for MaxTokenSize is 12000 decimal. Microsoft recommendation is to set this value to 65535 decimal, FFFF hexadecimal. If the value set incorrectly to 65535 hexadecimal (an extremely large value) Kerberos authentication operations may fail, and programs may return errors.
MS KB article explains all those settings: https://support.microsoft.com/kb/263693.
Note: To test this scenario I changed registry setting on KFTUSOKTULSPS35 and it worked.
General information on large token (Reference only)
The way IIS handles headers and therefore authorization information is one thing; the way Windows system, in general, handle large token is another.
Make sure you keeps IIS and windows configuration consistent so that authentication is successful end-to-end.
Scenario 2: Kerberos working inconsistently for users
Users in this scenario have intermitted problem on Kerberos authentication. After doing research with Microsoft Premier Support, with various network traces and analysis. We were able to identify that Client is not requesting Kerberos call, but it when this problem occurs, it is requesting for only NTLM.
After a period of inactivity on client workstation/laptop such as sleep, standby or successful unlocking of workstation and purging the Kerberos tickets OR after a client’s Kerberos token expires, the client will always start using NTLM authorization token while trying to access web application. Result of NTLM fallback, Kerberos depended applications such as RSS-Feed, Federated Search and SAP Integration are failing.
Steps to reproduce the problem
Steps to reproduce the problem
1. Purge all Kerberos tickets by Kerbtray or Klist (Available at c:\windows\System32).
2. Do IISReset on SharePoint Server – as Rss Web part and Federated search caches for 2 hours.
3. Lock the workstation
4. Unlock the workstation
5. Purge all Kerberos tickets using Kerbtray or KList (Available at c:\windows\System32).
6. Open IE and type “Kerberos Web application (say “RSS Web Part”).
7. You will see the error “Authentication Feed error” on Rss web part.
This has been identified as a bug in Windows XP – Service Pack 2 (Fixed on Windows Xp – Service Pack3). Hot fix has been identified and tested in my laptop and it worked.
KB Article: https://support.microsoft.com/kb/939850
Ntdsutil.exe is a command-line tool that provides management facilities for Active Directory™, the Microsoft® Windows® 2000 and Microsoft® Windows® 2003 directory service.
You can use the Group Membership Evaluation task of the Ntdsutil.exe tool (By default, Ntdsutil is installed in the Winnt\System32 folder) to help recover from an access token limitation problem, such as a user not being able to log on. The purpose of this task is to generate data that will help you identify the source of the problem.
Note: The Group Membership Evaluation task does not directly identify the group that led to the problem for you. It produces a report that will help you with your analysis.
For simplicity’s sake, I use the word “Kerberos” in this document, when talking about authentication protocol between client and web server. The actual protocol is SPNEGO or “Negotiate”, which is a wrapper for multiple authentication protocols.
Refer to Wikipedia for the details: https://en.wikipedia.org/wiki/SPNEGO
· For more information about Logon and Authentication Technologies, see the Windows Security Collection of the Windows Server 2003 Technical Reference on the Microsoft Web site (https://go.microsoft.com/fwlink/?LinkId=48827).
· For more information about Authorization and Access Control Technologies, see the Windows Security Collection in the Windows Server 2003 Technical Reference on the Microsoft Web site (https://go.microsoft.com/fwlink/?LinkId=48979).
· For more information about Active Directory users and groups, see the Active Directory Collection in the Windows Server 2003 Technical Reference on the Microsoft Web site