Security infrastructure related performance problems

Talking about SharePoint performance is like talking about rocket-science. SharePoint is not a closed product, its more like a LEGO set that you can use in your own constructions. There are huge amounts of variables that play some role, many of them have no easy instruments to be measured and are a bit abstractly documented. This also means that we can have multiple root-causes for bad performance: code that is not optimized, infrastructure problems, SQL performance, data structure, hardware boundaries...

In the last couple of months I was confronted with one type of performance problem that is particularly tricky to troubleshoot: security infrastructure complexity causing bad performance. Although SharePoint 2010 has a number of improvements that will solve many of those problems ( enforced thresholds, better queries...) we still see problems happening now and then. SharePoint 2007 is particularly exposed, especially as we see a lot of 2007 installation reaching maturity, managing large amounts of data and playing a central role in the business.

There are a number of blogs that talk about this like:

http://technet.microsoft.com/en-us/library/cc262787(v=office.12).aspx

http://technet.microsoft.com/en-us/library/cc262787.aspx

http://blog.bugrapostaci.com/2011/08/01/how-to-fix-64bit-acl-limit-exceeded-problem-on-sharepoint/

http://blogs.msdn.com/b/arvind7in/archive/2010/05/26/what-is-64k-acl-limit.aspx

but, I found that although the limits are clearly stated we still miss some efficient way of measuring those and there is some confusion around the terms we use in our documentation. So... one more time with clarity:

Security Scopes - Every time we break inheritance on a list item, folder or sub-web, we create a new security scope that will cover that item and its children until we reach a new item with broken inheritance.

ACLs - (Access Control Lists) - Each Security Scope will have an ACL, this is basically an data structure that will contain a list of ACEs.

ACE - (Access Control Entities) - Each ACE contains an identifier for an user or group and a list of permissions that this user has.

Associating a security scope with an ACL we basically describe all the permissions that are given on that security scope.

So how do we get from here to the performance problems we are talking about. Well... imagine we have a very large list (tens of thousands of items) and we are trying to get some information about all the items in the list. Part of the stored procedures that are run will be to get only the items that the users is allowed to see. This means that we need to process the ACLs for all the security scopes on that list: the more security scopes we have, the more processing we need to do, also, the larger the ACLs are the more processing we have to do. Because we query very large lists, we are mostly also confronted with lock-escalation on SQL. This means we have one resource intensive ( and also time consuming) query that will keep AllUserData table locked causing other queries to also perform badly. All in all... this is a problem that could bring the portal down.

How do you know that you have this kind of problems:

- Certain operations are taking very long: viewing a list in the UI, displaying certain items from the list.

- SQL locks are blocking the portal.

- Office clients are affected by bad performance

 

In order to avoid these problems we suggest the following limitations:

ACL size

select  ScopeUrl,datalength(Acl) as DataLenght ,cast(dbo.fn_ChangeEndian4(SUBSTRING(Acl,1,16))as int) as AceCount from  Perms with (nolock) order by 2 desc

AceCount should be under 900

Number of security scopes / list

SELECT alld.listid, alll.tp_title, allw.FullUrl, COUNT(DISTINCT alld.ScopeId) as countscopes FROM  ALLDOCS AS ALLD WITH(NOLOCK)

INNER JOIN   ALLLISTS AS ALLL ON  ALLD.LISTID =  ALLL.TP_ID

inner join Webs as allw on alld.WebId = allw.Id

WHERE LISTID IS NOT NULL

Group by alld.ListId, alll.tp_Title, allw.FullUrl

order by  countscopes desc

Count should be under 1000

Number of ad users / ad groups in an SharePoint group

 

select COUNT(m.MemberId) as [Count], g.Title as [Group-Title]

from GroupMembership m with(nolock)

join Groups g with(nolock) on m.GroupId = g.ID and m.SiteId = g.SiteId

group by g.Title

order by [Count] desc

~2500

 

 

 

 

Running these queries against an live database should be done with somebody from Microsoft Support.

The limits might vary based on the overall performance of the hardware or the complexity of the solution, the number mostly mean that being over these limits you will almost certainly have problems sooner or later.

You will notice that we always have the largest number of security scopes at site collection level. That is expected (by design) and although we should make sure we keep the ACL size for the site collection under 32K it should not affect performance that much, as we mostly run our queries at list level.

How can we fix these problems:

- first off, please plan your security infrastructure before starting the project, this can be a major issue later on

- add AD groups to SharePoint groups

- group all people with the same permissions in an AD group and add that group to a SharePoint group

- as far as possible do not give permissions directly to the users - this is not a problem per se, but if your whole solution is centered around such an approach it will result in some kind of problems sooner or later

 

In the end one more thing. From the experiences of countless cases dealing with performance, the root cause for a performance problem is never to be found in just one place. Fixing the permission infrastructure will improve performance, but mostly when we have this type of problems we also have: large lists, large databases, many sub-webs. All together will cause bad performance. The only way to prevent this is: PLANNING.