Steps to help mitigate Excessive Paging and Working Set trimming issues on Exchange 2007 servers
Previously, I wrote a blog about excessive paging on Exchange 2007 servers due to a system wide working set trimming problem at http://blogs.technet.com/mikelag/archive/2007/12/19/working-set-trimming.aspx. Over time, I have been asked to consult on many cases that had to do either with high memory usage or a server that was paging excessive causing performance problems.
I've moved the recommendations here to make it easier to reference. Find below a listing of some of the top reasons why you may run in to these issues
Top Reasons for these issues
- The amount of Available RAM on the server reaches 64MB causing the Operating System to issue a low memory notification if QueryMemoryResourceNotification fails. This in turn could causes a system wide working set trim across ALL applications on any given server
- Other Applications/Services consuming System Cache over time on the server could eventually lead to the low memory notification being triggered.
- Memory or handle leaks caused by other applications. This is not to say that Exchange is exempt from having its own memory leaks, but this is merely stating what I have found throughout troubleshooting these issues
- Improperly configured servers for sufficient RAM to support the amount of roles and current user load being generated
- Other applications being run directly on the Exchange Server leading to problems listed in issue #1.
Exchange Server Update Recommendations - Last updated (05-12-10)
Find below a list of recommendations to help alleviate the working set trimming issues that you may be running in to on Exchange 2007 Mailbox servers
Apply Windows 2003 SP2 along with the recommended hotfixes in http://support.microsoft.com/kb/935640 to get on the latest binaries
Apply latest Forefront roll-ups if applicable. See http://support.microsoft.com/kb/954941 for one such example
Make sure that the paging file is set correctly to the recommended settings. See http://technet.microsoft.com/en-us/library/aa996719.aspx
- If the server has 8GB of RAM or less, set the paging file to RAM times 1.5
- If the server has more than 8GB of RAM, set the paging file to RAM+10MB.
IMPORTANT!! If .NET 2.0 SP2 is installed immediately followed by 959209 as listed on the SP download page, OOF and Free/Busy Lookups will then start to fail. 958934 discusses this issue and is mitigated by applying 952883.
If an HP ILO Driver (CPQCIDRV.sys) is being used, ensure that the latest driver is installed. See http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?objectID=c00688313 for more information.
Update all Network card drivers to the latest versions as there are some versions that are known to call MMAllocateContiguousMemory (Broadcom) excessively.
Install http://support.microsoft.com/kb/956572 as a replacement for 956341, 954337, 938486 and 953600 to get the latest kernel OS fixes to help with the amount of pages that are trimmed, when they happen. Important: This update introduces some changes in which tracing can be enabled to find the offending process or application attempting to request large blocks of memory. Steps to enable this tracing can be received by calling Microsoft Product Support Services. Note: You need to have the QFE versions installed and be on version 5.2.3790.4449 or later of the kernel files to provide this tracing. This tracing in not available in the GDR releases.
Apply 905865 http://support.microsoft.com/kb/905865
The sizes of the working sets of all the processes in a console session may be trimmed when you use Terminal Services (RDP) to log on to or log off from a computer that is running Windows Server 2003
Apply 948496 http://support.microsoft.com/kb/948496 or disable advanced TCP features of the Scalable networking Pack and then apply the Scalable Networking pack rollup in http://support.microsoft.com/kb/950224.
Ensure Antivirus Exclusions are set appropriately per http://technet.microsoft.com/en-us/library/bb332342.aspx
Apply 940349 http://support.microsoft.com/kb/940349 is VSS backups are being taken to help address known memory leak issues. Also apply 969219 and 967551 to get on the latest versions of the VSS/volsnap drivers
If Symantec Altiris Agents are installed, make sure you are on the latest bits as older version have known handle leaks. See https://kb.altiris.com/ Article ID: 35454
Backup Exec Remote Agent (Beremote.exe ) not releasing memory with GRT enabled after backups are completed. See http://seer.entsupport.symantec.com/docs/289695.htm
Apply http://support.microsoft.com/kb/954903 to address known memory leak issues with SCOM and its agents. Also see http://support.microsoft.com/kb/891605 for a known issue with MOM and McAfee VirusScan Enterprise 8.0i. To address known issues with possible memory leaks or CPU spikes with the agents, apply the following additional two patches on Windows 2003 servers:
Quest ChangeAuditor for Exchange taking up excessive memory and processor time. For immediate relief, stop the Quest Compliance Agent service and then contact Quest to request update for Solution SOL53092 which will be rolled in to ChangeAuditor v4.9.
Add the /basevideo switch to the boot.ini file and switch the display driver to Standard VGA to help lower the memory pressure for contiguous memory below the 4GB memory mark. Note: Adding the /basevideo switch to the boot.ini file is simply not enough to help with this one, you must also switch the video driver to Standard VGA. This change can also help memory situations with other drivers such as storport.sys that make use of memory under the 4GB memory mark.
Apply http://support.microsoft.com/kb/970838 to help resolve a memory/handle leak when applications query performance counters via PDH on Windows 2008 servers.
If the above updates do not resolve the issue, find below some recommendations on where to go at this point.
Outlook Client Recommendations
Disable unneeded Outlook Add-ins. Some Add-ins have been known to increase overall session usage and opening unnecessary messages/folders on the Exchange Server. This in turn causes increased memory and connection usage which could lead to 9646 events in the application log.
- One such example is the add-ins installed with Apple Itunes which allow calendar syncing with IPod and IPhone devices. If you have Itunes installed and do not have any of these devices, I highly recommend disabling both the "Outlook Change Notifier" and the "ITunes Outlook Addin" under the COM Add-ins settings in Outlook to prevent this problem. <Real World Scenario>, disabling these add-ins on Outlook clients in one organization prevented the Exchange server from having to be rebooted every 2-3 weeks </Real World Scenario>.
Monitor applications that perform buffered I/O on the server such as Content Indexing that make use of the file cache to ensure that this is not causing excessive memory growth within the system cache which is outside of the DBCache. Log shipping uses unbuffered I/O, so we will not make use of the system cache here.
Check for any scheduled tasks that might be running on a periodic basis to see if this might be causing this problem. There have been reports that scheduled ScanMail jobs cause working set trims to occur. We have confirmed that Scanmail uses memory mapped IO with the files they create to do their real time scanning, so this will affect overall system cache sizes.
Manage Exchange 2007 servers remotely and not locally on the server itself via an RDP session. Meaning, install the Management tools on your workstation and remotely administer the server. The reason for this is that the Management tools take a considerable amount of memory to load, sometimes 100-300MB, and in low memory conditions, this could trigger a system wide working set trim when the EMC is launched.
If VSS snapshots are being taken from the Active Node in a CCR configuration, try switching to backing up the passive node instead. Moving the backups to the passive node is a recommend best practice and will also conserve Memory/CPU on the server. As always, if you must run backups on the Active Node, ensure that the backup process does not overlap your Online Maintenance processes. There is one caveat with moving the backups to the passive node is that the active node DB's integrity is no longer checked. You can overcome this by setting a registry key (Online Maintenance Checksum) that is mentioned in http://technet.microsoft.com/en-us/library/bb676454(EXCHG.80).aspx.
Stagger online maintenance schedules for each mailbox store to limit the amount of overlapping. Online maintenance is aggressive in nature and touches a lot of pages in memory to perform it's work. This increases overall memory usage and you could see a 4GB swing upward in memory during this time. Once online maintenance finishes, the overall memory usage should return to normal. It is important to watch the Available Memory on an Exchange server because if we get close to the 64MB available memory mark, a system wide memory trim could occur causing excessive paging and huge performance hits. The operating system has a built-in low memory indicator to signal to the OS that we need to release memory from working sets of processes running on the server.
One additional piece in this area is to check if your OLD window can be decreased from the default values if your Read:Freed ratio is 100:1 or greater. We don't need OLD to run to completion every night with SP1 installed and can be spread across multiple days or weeks as long as your Read:Freed ratio is within certain values. This ratio can be calculated by viewing MSExchangeDatabase -> Online Defrag Pages Freed/sec and MSExchangeDatabase -> Online Defrag Pages Read/sec in Performance Monitor. See http://msexchangeteam.com/archive/2007/12/06/447695.aspx for more information.
Monitor overall System I/O and Database I/O on the server with relationship to the amount of hard page faults (Transition pages repurposed/sec) that are occurring. If the amount of I/O that the databases is incurring is causing hard page faults, the DBCache size will grow. If the database is doing less I/O and not incurring hard page faults, then the size of the DBCache should decrease over time.
If System Cache is consuming all available memory on a server causing a system wide working set trim, there are 2 ways to overcome this:
The Sysinternals utility CacheSet can be used to set a maximum on this File Cache and steps to set this are listed in http://blogs.technet.com/mikelag/archive/2008/07/18/how-to-set-system-cache-upon-startup-of-a-windows-2003-server.aspx.
More details on why too much cache can be an issue is talked about in greater detail at http://blogs.msdn.com/ntdebugging/archive/2007/11/27/too-much-cache.aspx.
Using the Windows Dynamic Cache service that is mentioned in the post at http://blogs.msdn.com/ntdebugging/archive/2009/02/06/microsoft-windows-dynamic-cache-service.aspx. This is essentially a service that can automatically trim back the System Cache if it gets above certain thresholds.
Large file copies (Copying .bkf files to remote drives) are also known to exhibit growth in system cache as shown in http://support.microsoft.com/kb/920739. Depending on the file copy utility used, it is important to understand if it uses Buffered or Unbuffered IO. Unbuffered IO is what you want to strive for since this does not affect system cache. Use the CacheSet method listed above if you fall in to this category. Use Eseutil.exe to perform unbuffered File Copies to work around this problem.
Temporarily disable Online Maintenance Database Scanning (Checksumming or ESE-Zeroing) to see if the problem is corrected. See http://technet.microsoft.com/en-us/library/bb676537(EXCHG.80).aspx.
Upgrade the server to Windows 2008 as memory management has be redesigned to handle these type situations much better. Since there is no upgrade method, this would essentially be a migration to another Exchange 2007 server running on Windows 2008.
Setting DBCache Recommendations (Updated 08/29/09)
There are extreme cases where setting DBCache makes sense when other applications on the server are competing for memory resources. This should only be done after exhaustive troubleshooting and under the supervision of Customer Support Services (CSS). Exchange is very good at handling its memory utilization, but other applications put a strain on Exchange’s Dynamic Buffer Allocation that could cause Exchange to not shrink it’s DBCache quick enough to prevent a working set trim operation.
If you need to set this for any reason, you can follow the steps in http://technet.microsoft.com/en-us/library/bb691304.aspx. Important: There have been cases where the value listed in the Technet documentation have been put on Exchange servers with negative performance effects as a direct result since the value did not fit their Exchange server configuration. It is very important that you calculate the amount of required DBCache following the recommendation in the documentation based on user load/profile and overall RAM on the server. You can look at Planning Memory Configurations to help calculate this. If you want a base to start without calculating the value, you can start at setting DBCache to 90% of overall RAM. You can then increment down in 10% increments down to 70%. If you are still having problems at 70%, you need to call Support Services for assistance to help you understand what other processes are causing memory problems with Exchange.