The Case of the Low Hanging Filter Driver Fruit
Written By Jeff Dailey:
Not all our cases are crashes, leaks, or high CPU. Sometimes the problems we are faced with are purely a question of why a given application runs slow on a particular version of Windows versus another version of windows. In other cases an application may just start running slow for no reason. OK, not likely. There is ALWAYS SOME RESASON. Something changed! In this case, the customer reported that an application started running slow when booted into “Normal Mode”, but when the OS was booted in safe mode, the application would run fast. In this particular case the customer reported that a given operation went from taking just a few seconds (safe mode) to several minutes (normal mode). Further research found that the problem was related to accessing the registry and registry performance in general. At this point I’m already thinking, “Registry Access?” and “Safe Mode”. What could affect registry access that does not run in safe mode? Well lots of services DO NOT start in safe mode. What kind of services could affect registry calls? Antivirus? Maybe… Let’s look deeper.
One of the first things I typically do in such cases is to ask for a kernrate log of the slow and fast scenario. http://download.microsoft.com/download/8/e/c/8ec3a7d8-05b4-440a-a71e-ca3ee25fe057/rktools.exe Kernrate is a sampling profiler. It basically checks the location of the instruction pointer at regular intervals and stores the results in a hash table. We can then get a breakdown of the %time spent in each module that is executing. Even better you can zoon in to each module. Zooming in shows utilization at a function level within the module and requires symbols to be present in a flat symbol directory on the machine being profiled. I recommend downloading the symbol pack for this (http://www.microsoft.com/whdc/devtools/debugging/symbolpkg.mspx) or use symchk.exe (included in the debugging tools) to download the symbols. We’ll talk more about symbols and symchk.exe in an upcoming post.
In a lot of cases kernrate data is only a starting point. We will find some code that is running a lot longer in one case verses another and that in turn requires a follow up code review and multiple debugs to further isolate the problem. This case however was different. The following is output from beyond compare that shows a comparison between the module execution time in kernel. The slow test run is on the right, and the fast test run is on the left. Keeping in mind that I was looking for something different between safe mode and normal mode, I simply started by looking at the modules listed on the slow side (Right) there were not on the fast side (Left). What was loaded during the normal run that was not a factor during the safe mode run. Right away FILTERDRVXYZ just above the HAL jumped off the page. (Some names were changed to protect the innocent. J) I did a http://www.live.com search to find out what this driver was. It was a file system filter driver for an antivirus program.
To understand why a filter driver can slow things down you need to understand that a filter driver basically installs itself between the application and the underlying file system and intercepts nearly every call being made to the underlying I/O subsystem. Most of the time filter drivers are just fine and run with no problems at all. However, what do they do? An antivirus filter driver may have several tasks that it needs to do for each I/O. This may involve looking up information about a process or file when it’s being accessed. If the data source associated with this information changes it can change the amount of time that these operations take. The interesting thing is that the timing change may only be a hundred or so extra milliseconds per call to the registry or file system. Under most cases a user would not even notice this. If however you have some aspect of your application needs to make many thousands of calls and they each take 250+ milliseconds vs. 10 milliseconds, all of a sudden the problem compounds. The classic example of this is a VIRUS PROFILE gets uploaded or pushed to servers and all of a sudden the customer starts seeing performance problems because the data set or comparison for the filter driver changes. In this case I simply recommended that the customer remove the suspect filter driver to see if it was the root of the problem. IT WAS… They contacted the product vendor to get an update and the case was resolved.
Good luck and happy debugging….