SharePoint 2010: Resolving A UserProfile Service Application connecting to WCF endpoint: System.TimeoutException error

Scenarios:

A) ...  Every morning UPA worker process becomes unresponsive or web requests get queued for minutes and subsequently time out. Users of consuming farm experiences errors on their site pages while loading sites which require information from UPA.

B)... Users experiencing slowness while site load. SharePoint server with UPS SA service running, shows large number of connection requests queued in UPS SA worker process. These connection requests gets timeout after every 20 seconds. 

C).... ULS shows timeout for any connection requests made to UPS SA (May be a request from client accessing sites, consuming farm or Newsgator social sites)

We see the following messages in the ULS logs when we try to access the site from the server that has the User Profile Service running.

Exception occurred while connecting to WCF endpoint: System.TimeoutException: The request channel timed out while waiting for a reply after 00:00:20. Increase the timeout value passed to the call to Request or increase the SendTimeout value on the Binding. The time allotted to this operation may have been a portion of a longer timeout. ---> System.TimeoutException: The HTTP request to 'https://servername:32843/UPSSCEndPointGUID/ProfilePropertyService.svc' has exceeded the allotted timeout of 00:00:20. The time allotted to this operation may have been a portion of a longer timeout. ---> System.Net.WebException: The operation has timed out      System.Net.HttpWebRequest.GetResponse() 

   
System.ServiceModel.Channels.HttpChannelFactory.HttpRequestChannel.HttpChannelRequest.WaitForReply(TimeSpan timeout)     --- End of inner exception stack trace ---    

System.ServiceModel.Channels.HttpChannelUtilities.ProcessGetResponseWebException(WebException webException, HttpWebRequest request, HttpAbortReason abortReason)    
System.ServiceModel.Channels.HttpChannelFactory.HttpRequestChannel.HttpChannelRequest.WaitForReply(TimeSpan timeout)   
System.ServiceModel.Channels.RequestChannel.Request(Message message, TimeSpan timeout)     --- End of inner exception stack trace ---    Server stack trace:    
System.ServiceModel.Channels.RequestChannel.Request(Message message, TimeSpan timeout)    

System.ServiceModel.Channels.SecurityChannelFactory`1.SecurityRequestChannel.Request(Message message, TimeSpan timeout)     at System.ServiceModel.Dispatcher.RequestChannelBinder.Request(Message message, TimeSpan timeout) 
System.ServiceModel.Channels.ServiceChannel.Call(String action, Boolean oneway, ProxyOperationRuntime operation, Object[] ins, Object[] outs, TimeSpan timeout)   
System.ServiceModel.Channels.ServiceChannelProxy.InvokeService(IMethodCallMessage methodCall, ProxyOperationRuntime operation)    
System.ServiceModel.Channels.ServiceChannelProxy.Invoke(IMessage message)    Exception rethrown at [0]:      at System.Runtime.Remoting.Proxies.RealProxy.HandleReturnMessage(IMessage reqMsg, IMessage retMsg)    System.Runtime.Remoting.Proxies.RealProxy.PrivateInvoke(MessageData& msgData, Int32 type)    
Microsoft.Office.Server.UserProfiles.IProfilePropertyService.GetProfileProperties()    
Microsoft.Office.Server.UserProfiles.ProfilePropertyServiceClient.<>c__DisplayClass1.<GetProfileProperties>b__0(IProfilePropertyService channel)    

Microsoft.Office.Server.UserProfiles.MossClientBase`1.ExecuteOnChannel(String operationName, CodeBlock codeBlock)

UserProfileApplicationProxy.InitializePropertyCache: Microsoft.Office.Server.UserProfiles.UserProfileException: System.TimeoutException     at Microsoft.Office.Server.UserProfiles.MossClientBase`1.ExecuteOnChannel(String operationName, CodeBlock codeBlock)     Microsoft.Office.Server.UserProfiles.ProfilePropertyServiceClient.ExecuteOnChannel(String operationName, CodeBlock codeBlock)     at Microsoft.Office.Server.UserProfiles.ProfilePropertyServiceClient.GetProfileProperties()     Microsoft.Office.Server.Administration.UserProfileApplicationProxy.RefreshProperties(Guid applicationID)

Microsoft.Office.Server.Utilities.SPAsyncCache`2.GetValueNow(K key)     

Microsoft.Office.Server.Utilities.SPAsyncCache`2.GetValue(K key, Boolean asynchronous)   

Microsoft.Office.Server.Administration.UserProfileApplicationProxy.InitializePropertyCache()

Cause

We have added  too many accounts (Users/Groups) to User Profile Service Application  Administrators group. Each item in this ACL increases the time taken for ACL resolution and results in WCF endpoint timeout if ACL resolution takes more than 10-15 seconds. Default WCF end point timeout is 20 seconds.

Customer has multiple domains (child + Parent) and has to add too many users to  User Profile Service Application Administrators group to grant required permissions for consuming farm. 

Resolution

WorkAround 1

Use universal group in UPA Admin ACL in publishing farm. Nest all other groups and accounts in this Universal group. Add the universal group  to User Profile Service application ACL.

Details:

Go to SharePoint Central administration, Manage Service Applications > Highlight UserProfile Service Apllication > Select Administrators from ribbon > Remove users/groups found in this ACL. 

Create universal group (UG) in parent domain of SharePoint server. Add all removed users/groups to this UG. Add this Universal group to UPA SA Admin ACL in SharePoint. 

Workaround 2

Increase WCF EndPoint timeout from 20 seconds to 60 seconds.   Also, start User profile Service on multiple servers to load balance WCF requests. This should help reducing connection timeouts. For permanent fix, customer should remove user/group items and add few Universal groups only to UPS SA administratot ACL. 

More Information

Use PowerShell script to check timeout when resolving UPS SA ACL on problem app servers . This script will tell the time taken for validating items in UPS SA Administrators ACL. If the value is more than 4-6 seconds then it can cause WCF timeouts at large connection requests from client. 

The script as given below

# $upa   must have your desired service application name. 

Add-PSSnapin microsoft.sharepoint.powershell -erroraction silentlycontinue
$upa = get-spserviceapplication | ? {$_.TypeName.ToString().Equals("User Profile Service Application")}
$starttime = [system.DateTime]::Now
$acl = $upa.GetAdministrationAccessControl()
$accessRules = $acl.accessRules
$endTime = [system.DateTime]::Now
$elapsedTime = $endTime.subtract($startTime).seconds

Write-Host "Start Time: $StartTime End Time:$endTime Duration: $elapsedTime Seconds"
Write-Host ""