ASP.NET Case Study: Lost session variables and appdomain recycles

Last night I got a question from one of the readers of the blog that went like this:

“We are facing a problem that i cannot understand, every now and than i see that my app domain is recycled (i have a log in the application_end), I check the IIS logs and i don't see a restart of IIS and i know that no one is changing configuration (web.config).

I wanted to know if you know of any way that i can pinpoint the reason for that app domain to die?

The application pool that i am using only have the recycle every 20 minutes of idle time enabled“

..and I thought that since I haven’t written for a while (due to a really nice and long vacationJ) and this is a pretty common scenario I would write a post on it…

Before we go into the details of how to figure out why it is recycling I want to bring up two things

  1. What happens when an application domain is recycled
  2. What are the reasons an application domain recycles

What happens when an application domain is recycled?

In ASP.NET each individual asp.net application resides in its own application domain, so for example if you have the following website structure

WebSite root

          /HrWeb

          /EmployeeServices

          /FinanceWeb

          /SalesWeb

…where each of the subwebs HrWeb, EmployeeServices etc. are set up as an application in the internet service manager, you will have the following application domains (appdomains) in your asp.net process

System Domain

Shared Domain

Default Domain

Root

HrWeb

EmployeeServices

FinanceWeb

SalesWeb

Apart from the first three domains (in italic) which are a bit special, each of the other ones contain the data pertinent to that application (Note: this is a bit simplified for readability), specifically they contain these things worth noting…

  1. All the assemblies specific to that particular application
  2. A HttpRuntime object
  3. A Cache object

         

When the application domain is unloaded all of this goes away, which means that on the next request that comes in all assemblies need to be reloaded, the code has to be re-jitted and the cache including any in-proc session variables etc. are empty. This can be a pretty big perf-hit for the application so as you can imagine it is important to not have the application domain recycle too often.

Why does an application domain recycle?

An application domain will unload when any one of the following occurs:

  • Machine.Config, Web.Config or Global.asax are modified
  • The bin directory or its contents is modified
  • The number of re-compilations (aspx, ascx or asax) exceeds the limit specified by the <compilation numRecompilesBeforeAppRestart=/> setting in machine.config or web.config (by default this is set to 15)
  • The physical path of the virtual directory is modified
  • The CAS policy is modified
  • The web service is restarted
  • (2.0 only) Application Sub-Directories are deleted (see Todd’s blog http://blogs.msdn.com/toddca/archive/2006/07/17/668412.aspx for more info)

There may be some reasons in 2.0 that I have missed but hopefully this should cover most scenarios.

Specific issues

I want to pay a bit more attention to a few of these, which seem to be especially popularJ

Unexpected config or bin directory changes

You swear on all that is holy that no-one is touching these, but still when we start logging (as I’ll show later) the reason for the app domain recycle is a config change… how the heck can that be?

Elementary, Dr. Watson… something else is touching them… and that something else is usually a virus scanning software or backup software or an indexing service. They don’t actually modify the contents of the files, but many virus scanners etc. will modify attributes of files which is enough for the file changes monitor to jump in and say “aha !, something changed, better recycle the appdomain to update the changes”.

 

If you have a virus scanner that does this, you should probably consider removing the content directories from the real-time scan, of course after carefully making sure that no-one can access and add any virus software to these directories.

Web site updates while the web server is under moderate to heavy load

Picture this scenario: You have an application with 10 assemblies in the bin directory a.dll, b.dll, c.dll etc. (all with the version number 1.00.00). Now you need to update some of the assemblies to your new and improved version 1.00.12, and you do so while the application is still under heavy load because we have this great feature allowing you to update assemblies on the go… well, think again...

Say you update 7 of the 10 assemblies and for simplicity lets say this takes about 7 seconds, and in those 7 seconds you have 3 requests come in… then you may have a situation that looks something like this…

Sec 1. a.dll and b.dll are update to v 1.00.12 - appdomain unload started (any pending requests will finish before it is completely unloaded)

Sec 2. Request1 comes in and loads a new appdomain with 2 out of 7 of the dlls updated

Sec 3. c.dll is updated - appdomain unload started (any pending requests will finish before it is completely unloaded)

Sec 4. d.dll is updated

Sec 5. Request2 comes in and loads a new appdomain, now with 4 out of 7 dlls updated

Sec 6. e.dll and f.dll is updated - appdomain unload started (any pending requests will finish before it is completely unloaded)

Sec 7. f.dll is updated

Sec 8. Request3 comes in and loads a new appdomain with all 7 dlls updated

So, many bad things happened here…

First off you had 3 application domain restarts while you probably thought you would only have one, because asp.net has no way of knowing when you are done. Secondly we got a situation where Request1 and Request2 were executing with partially updated dlls, which may generate a whole new set of exceptions if the dlls depend on updates in the other new dlls, I think you get the picture… And thirdly you may get exceptions like “Cannot access file AssemblyName because it is being used by another process” because the dlls are locked during shadow copying. http://support.microsoft.com/kb/810281

In other words, don’t batch update during load…

So, is this feature completely worthless? No… if you want to update one dll, none of the problems above occur… and if you update under low or no load you are not likely to run into any of the above issues, so in that case you save yourself an IIS restart… but if you want to update in bulk you should first take the application offline.

There is a way to get around it, if you absolutely, positively need to update under load, and it is outlined in the kb article mentioned above…

In 1.1 we introduced two new config settings called <httpRuntime waitChangeNotification= /> and <httpRuntime maxWaitChangeNotification= />.

The waitChangeNotification indicates how many seconds we should wait for a new change notification before the next request triggers an appdomain restart. I.e. if we have a dll updated at second 1, and then a new one at second 3, and our waitChangeNotification is set to 5… we would wait until second 8 (first 1+5, and then changed to 3+5) before a new request would get a new domain, so a request at second 2 would simply continue using the old domain. (The time is sliding so it is always 5 seconds from the last change)

The maxWaitChangeNotification indicates the maximum number of seconds to wait from the first request. If we set this to 10 in the case where we update at second 1 and 3, we would still get a new domain if a request came in at second 8 since the waitChangeNotification expired. If we set this to 6 however, we would get a new domain already if a request came in at second 7, since the maxWaitChangeNotification had then expired. So this is an absolute expiration rather than a sliding… and we will recycle at the earliest of the maxWaitChangeNotification and waitChangeNotification.

In the scenario at the beginning of this section we could have set the waitChangeNotification to 3 seconds and the maxWaitChangeNotification to 10 seconds for example to avoid the problems.

(I know this explanation might have been a bit confusion but I hope you catch the drift)

A few things are important if you fiddle with these settings

  1. They default to 0 if not set
  2. maxWaitChangeNotification should always be >= waitChangeNotification
  3. If these settings are higher than 0 you will not see any changes until the changeNotifications expire. i.e. web.config changes and dll changes etc. will appear cached.

Re-compilations

A common scenario here is that you have a set of aspx pages (containing some news items and what not) and you have a content editor that goes in periodically and updates the news with some new articles or other new content. Every time you update an aspx page it has to be recompiled, because again, asp.net has no way of knowing if it was a code update or just update of some static text… all it knows is that someone updated the files.

If you have followed some of my previous posts you know that assemblies can not be unloaded unless the application domain is unloaded, and since each recompile would generate a new assembly there is a limit to how many recompiles you can do, to avoid generation of too many assemblies (and thus limiting the memory usage for these). By default this limit is 15.

If the contents of the page is constantly updated I would recommend to dynamically get the content from a database or file rather than actually modifying the aspx pages. Or alternatively using frames with HTML pages for this content.

How do you determine that you have application recycles?

If you experience cache or session loss, it is probably a good bet, but to make sure you can look at the perfmon counter ASP.NET v…/Application Restarts.

How do you determine what caused an appdomain restart?

In ASP.NET 2.0 you can use the built in Health Monitoring Events to log application restarts along with the reason for the restart. To do this you change the master web.config file in the C:WINDOWSMicrosoft.NETFrameworkv2.0.50727CONFIG directory and add the following to the <healthMonitoring><rules> section

                <add name="Application Lifetime Events Default" eventName="Application Lifetime Events"

                    provider="EventLogProvider" profile="Default" minInstances="1"