Mystery of the SharePoint "White Screens"

Randomly users start receiving "white screen" responses when browsing a SharePoint site.  They essentially browse a URL and the response completes (IE shows Done in the status bar), but the response is an empty HTML page.  We found that this would only happen to a single server at a time and the symptom started after a process recycle.  Since the problem would continue until a 2nd process recycle, we grabbed a memory dump of the application pool in question.  Analysis of the memory dump indicated that there were HttpContext objects, but they were not running on threads yet they were not complete.  To get this data using Sos.dll, you find the HttpContext objects on the heap using !dumpheap -stat:

0:038> !dumpheap -stat
000007fef1c00bb0 105 35280 System.Web.HttpContext

The first value is the MethodTable for teh HttpContext object.  With this, you can run !dumpheap again and pass the MethodTable to  return all the HttpContext objects on the managed heap:

0:038> !dumpheap -mt 000007fef1c00bb0
Heap 0
Address MT Size
00000000ff3ae0f8 000007fef1c00bb0 336
00000000ff3ba2b0 000007fef1c00bb0 336
00000000ff3cfb70 000007fef1c00bb0 336
00000000ff3e1f70 000007fef1c00bb0 336

Now that you have an address for a specific HttpContext object, you can dump out the HttpContext and investigate the properties.  Let's dump out the first one in the list.  You can see that the errors property has a value set indicating that there is an exception associated with the request.  You can also see that the timeout value was set, however, the timeout has not been reached.  However, the Thread property is not set indicating that the request is not associated with a thread yet.  Other pieces that are odd is _configurationPath, _currentHandler, _handler, and _appInstance are all null.  [alot of the properties were removed to keep this as short as possible]

0:038> !do 00000000ff3ae0f8
Name: System.Web.HttpContext
MethodTable: 000007fef1c00bb0
EEClass: 000007fef1862378
Size: 336(0x150) bytes
MT Offset Type Value Name
000007fef1c036d8 8 ...IHttpAsyncHandler 0000000000000000 _asyncAppHandler
000007fef1c03208 10 ...b.HttpApplication 0000000000000000 _appInstance
000007fef1c03748 18 ....Web.IHttpHandler 0000000000000000 _handler
000007fef1c03b88 20 ...m.Web.HttpRequest 00000000ff3ae248 _request
000007fef1c03f08 28 ....Web.HttpResponse 00000000ff3ae398 _response
000007fef1c04420 30 ...HttpServerUtility 0000000000000000 _server
000007fef78fdef8 48 ...ections.Hashtable 000000019f753ae8 _items
000007fef78fd488 50 ...ections.ArrayList 000000019f753a80 _errors
000007fef7936ad8 138 System.DateTime 00000000ff3ae230 _timeoutStartTime
000007fef78f5770 11d System.Boolean 1 _timeoutSet
000007fef79369d8 140 System.TimeSpan 00000000ff3ae238 _timeout
000007fef78f7040 a0 ....Threading.Thread 0000000000000000 _thread
000007fef78f5770 11e System.Boolean 0 _isAppInitialized
000007fef1c03748 f0 ....Web.IHttpHandler 0000000000000000 _currentHandler


This was pretty odd, so we looked at the exception linked to the HTTPContext object.  You do this by running !do on the address next to _errors.

0:038> !do 000000019f753a80
Name: System.Collections.ArrayList
MethodTable: 000007fef78fd488
EEClass: 000007fef7501ea0
Size: 40(0x28) bytes
MT Field Offset Type VT Attr Value Name
000007fef78e4390 400094c 8 System.Object[] 0 instance 000000019f753aa8 _items
000007fef78fd810 400094d 18 System.Int32 1 instance 3 _size
000007fef78fd810 400094e 1c System.Int32 1 instance 3 _version
000007fef78f5e90 400094f 10 System.Object 0 instance 0000000000000000 _syncRoot
000007fef78e4390 4000950 388 System.Object[] 0 shared static emptyArray
>> Domain:Value 000000000315f0d0:000000019f3a1230 00000000031d5540:000000019f3ac958 <<

Since _errors of an ArrayList, you have to dump out the _items collection to get to the actual exceptions:

0:038> !dumparray 000000019f753aa8
Name: System.Object[]
MethodTable: 000007fef78e4390
EEClass: 000007fef74feb18
Size: 64(0x40) bytes
Array: Rank 1, Number of elements 4, Type CLASS
Element Methodtable: 000007fef78f5e90
[0] 000000019f72e768
[1] 000000019f756400
[2] 000000019f75b928
[3] null

From here, you can run !PrintException on the first one.  You can continue running !PrintException on the InnerException, until you find the last InnerException.  In this case, they would all point to the following as the root exception:

0:038> !PrintException 000000019f72dd50
Exception object: 000000019f72dd50
Exception type: System.IO.FileLoadException
Message: Could not load file or assembly 'Microsoft.SharePoint.intl, Version=, Culture=neutral, PublicKeyToken=71e9bce111e9429c' or one of its dependencies. Either a required impersonation level was not provided, or the provided impersonation level is invalid. (Exception from HRESULT: 0x80070542)
InnerException: <none>
StackTrace (generated):
SP IP Function
0000000005D1CE90 0000000000000001 mscorlib_ni!System.Reflection.Assembly._nLoad
0000000005D1CE90 000007FEF77CAB91 mscorlib_ni!System.Reflection.Assembly.InternalLoad
0000000005D1CF20 000007FEF77CAD57 mscorlib_ni!System.Reflection.Assembly.InternalLoad
    0000000005D1CF80 000007FEF7832704 mscorlib_ni!System.Reflection.Assembly.Load
0000000005D1CFC0 000007FF00327FCF Microsoft_SharePoint!Microsoft.SharePoint.CoreResource..cctor

StackTraceString: <none>
HResult: 80070542


The message makes it look like the authentication isn't quite working or that impersonation is not configured.  If you run err.exe on the HResult, you get ERROR_BAD_IMPERSONATION_LEVEL.   Considering a process recycle clears out this error, those are both unlikely, but to make sure we captured some network traffic to confirm that Kerberos is working and authentication is working. 

We then setup a live debug session and !token was indicating that the context of the users browsing were Impersonation tokens.  We captured another memory dump while the process was in a bad state, ran IISReset and captured a memory dump of the good process for comparison.  This didn't really shed any light on this particular problem as it was confirming the HttpContext objects were in a bad state which we already know.  We were unable to reproduce the issue by using the requests in the IIS logs and the users listed in the log.  I then remembered that IIS logs data when the request is complete.  Since the requests showing in the memory dump were not complete, they would not be in the log file.

I then sorted the HttpContext objects by the Execution Time in Excel to see which requests came in first.  You can use a script from Tess Ferrandez to get this type of data.  This opened this problem wide open.  The first requests were always web service calls to Lists.asmx and were coming from Biztalk.  We then worked with the Biztalk team that owns the application making the requests and we were able to reproduce the problem.  We would recycle the application pool, then the Biztalk team would send the request and all requests after this would return blank HTML pages.  Since the Biztalk request was triggering this behavior and none of the other requests were causing this problem, we took a look at the config file for the Biztalk web service call.  They were configured to make the call via WCF and they had a client configuration file.  The behariors section of the file looked like:

<behavior name="EndpointBehavior" />

Since the behaviors section is not specifying the clientCredentials, the allowedImpersonationLevel is set to Identification.  This means that the server can get the Identity of the user, but it is unable to impersonate the user.  This lines up with the exception message from the memory dumps.  We then changed the configuration to use Impersonation as the allowedImpersonationLevel, and confirmed the white screens no longer occurred.

<behavior name="ImpersonationBehavior">
<windows allowedImpersonationLevel="Impersonation" />

MSDN Information on the allowedImpersonationLevel setting: