How fragmentation on incorrectly formatted NTFS volumes affects Exchange
Recently we have been seeing some gnarly performance issues in Exchange 2007 along with an added splash of database operation failures. This doesn’t sound enticing at all, but this blog post is going to discuss what these issues are and how to resolve them. This post is targeted mainly for Exchange 2007, but you can also apply the same methodology to Exchange 2010 as this is where the original problem was seen.
Before going in to this, here is a highlight of some of the issues that you may see:
- Databases failing with an Out of Memory condition
- Extremely slow log replay times on CCR/SCR replica copies (High replay queue lengths)
- High amount of split I/O’s occurring on any given LUN/Volume.
- Slowly rising RPC requests until the Information Store service goes unresponsive
Here are some examples of the out of memory condition that would be written to the application log on the affected Exchange server.
Event Type: : Error
Event Source: : MSExchangeIS
Event Category: : None
Event ID : 1160
Database resource failure error Out of memory occurred in function JTAB_BASE::EcUpdate while accessing the database "CCRName\SGName".
Windows 2003 based error
Event Type: Error
Event Source: ESE
Event Category: General
Event ID: 482
MSExchangeIS (9228) DBName: An attempt to write to the file "F:\Data\DBName.edb" at offset 530157682688 (0x0000007b6fdc4000) for 8192 (0x00002000) bytes failed after 0 seconds with system error 1450 (0x000005aa): "Insufficient system resources exist to complete the requested service. ". The write operation will fail with error -1011 (0xfffffc0d). If this error persists then the file may be damaged and may need to be restored from a previous backup.
Windows 2008 based error
Log Name: Application
Event ID: 482
Task Category: General
Information Store (8580) DBNAme: An attempt to write to the file "F:\Data\DBName.EDB" at offset 315530739712 (0x0000004977190000) for 32768 (0x00008000) bytes failed after 0 seconds with system error 665 (0x00000299): "The requested operation could not be completed due to a file system limitation ". The write operation will fail with error -1022 (0xfffffc02). If this error persists then the file may be damaged and may need to be restored from a previous backup.
So just what is this Insufficient system resources exist to complete the requested service error? The explanation will come later….
Here is an example of very high Split I/O operations (purple line) leading up to high RPC requests (green Line) until the server went unresponsive. In the below case, we were trying to extend the size of the database and couldn’t because of the underlying cause which I will explain shortly.
Another clear sign that you might be running in to this problem is when all I/O requests for that particular database instance goes to zero while RPC requests continue to climb and Version Buckets plateaus
This particular problem is not an obvious one and requires a few levels of explanation what is going on and a little bit of terminology to get you going. At the lowest layer, an exchange database resides on an NTFS partition which is setup when the server is first configured. This initial setup has some specific guidelines around how to properly partition and format the volumes and is referenced in http://technet.microsoft.com/en-us/library/bb738145(EXCHG.80).aspx for Exchange 2007 and http://technet.microsoft.com/en-us/library/ee832792.aspx for Exchange 2010. The two most important factors are proper partition alignment and NTFS Allocation unit size.
Below is a table of recommendations for use with Exchange.
Storage Track Boundary
64K or greater. (1MB recommended)
NTFS allocation unit/cluster size
64KB (DB and Log Drives)
RAID Stripe size
256KB or greater. Check with your storage vendor for best practices
NTFS allocation unit size
Before we go in to discussing this area, we need to take a step back and take a look at how NTFS operates. This is where you need to do a little homework by reading the following 2 references:
- Read the first two sections (NTFS Architecture and NTFS Physical Structure) of How NTFS Works inhttp://technet.microsoft.com/en-us/library/cc781134(WS.10).aspx
- Read The Four Stages of NTFS File Growth in http://blogs.technet.com/b/askcore/archive/2009/10/16/the-four-stages-of-ntfs-file-growth.aspx
Now that we went over what the basic concept of what a File Attribute List (ATTRIBUTE_LIST) is and how files are actually stored on disk, we can continue on with why this is so important here. Let’s say that we have a disk that is formatted with a file allocation unit size of 4K or 4096 which is the default in Windows 2003 for any partition that is greater than 2GB in size. With Exchange 2007’s ESE page size of 8k, we will need to make two writes for a single page. These writes may or may not be contiguous in nature and could be spreading data across various sections of the disk and this is where fragmentation can begin for larger files on disk. As the File Attribute List (FAL) size grows outside of the MFT along with the database file sizes, the size of the FAL will continually grow to accommodate the fragmentation and the overall increase in database file sizes.
NTFS does have it’s limitations with the overall size of this attribute list per file and can have roughly around 1.5 million fragments. This is not an absolute maximum, but is around the area when problems can occur. The FAL size will never shrink and will continually keep growing over time. The maximum supported size of the ATTRIBUTE_LIST is 256K or 262144. If you were to reach this upper limit, you could no longer expand the size of your database and we would be doing a lot more smaller I/O operations and a lot more seeking around the drive to find the data we are looking for. This is where the “out of memory” error comes from along with the “Insufficient system resources exist to complete the requested service” error. File management APIs will start failing with ERROR_FILE_SYSTEM_LIMITATION in Windows 2008 or later and ERROR_INSUFFICIENT_RESOURCES for windows versions earlier than that when the absolute maximum has been reached. The out of memory error is a much higher level error that was bubbled up caused by NTFS not being able to increase the size of the FAL anymore. This is why it is not an obvious error and was ultimately found by Eric Norberg troubleshooting over many tirelessly nights and through long debugging sessions by EE extraordinaire Dave Goldman.
This fragmentation issue is actually referenced in the following article:
A heavily fragmented file in an NTFS volume may not grow beyond a certain size
This scenario is seen more on servers with smaller NTFS cluster sizes such as 4k, large databases that are 2 times the recommended 200GB maximum and low available disk space. The combination of those 3 variables can get you in to a very bad situation.
NTFS cluster sizes can be obtained by running the fsutil command as shown below for any given partition:
In Exchange 2007, you can check if you are running in to this issue by downloading and running Contig.exe from Sysinternals at http://technet.microsoft.com/en-us/sysinternals/bb897428.aspx
C:\>Contig.exe -a f:\data\DBName.edb
Contig v1.55 - Makes files contiguous
Copyright (C) 1998-2007 Mark Russinovich
Sysinternals - www.sysinternals.com
f:\data\DBName.edb is in 1.46698e+006 fragments
Number of files processed : 1
Average fragmentation : 1.46698e+006 frags/file
In the above example, we are extremely close to the 1.5 million approximate maximum amount of fragments that you can have for any given file. This particular database will eventually be problematic and is a ticking time bomb waiting to happen.
For Exchange 2010 SP1, you can dump the same type information similar to contig.exe using eseutil.exe as shown below.
C:\>eseutil /ms f:\data\DBName.edb
Extensible Storage Engine Utilities for Microsoft(R) Exchange Server
Copyright (C) Microsoft Corporation. All Rights Reserved.
Initiating FILE DUMP mode...
Error: Access to source database 'f:\data\DBName.edb' failed with Jet error -1032.
File Name: f:\data\DBName.edb
Volume Name: Drive2
File System: NTFS
Cluster Size: 4096 bytes
Attribute List Size: 180 KB
Extents Enumerated: 1157172
Operation terminated with error -1032 (JET_errFileAccessDenied, Cannot access file, the file is locked or in use) after 0.78 seconds.
Even though the command errors out due to the database being online, we are still able to obtain similar data. Eseutil allows you to look in to the actual FAL size, NTFS cluster size and how many extents have been created for that file due to excessive fragmentation if run locally on the server. With that, we can deduce that the NTFS cluster size is 4KB, the FAL size is 180KB and the Extents Enumerated is over 1.1 million fragments. A general rule of thumb is to not have a FAL size greater than 150KB in size and to have sufficient available disk space.
This fragmentation is also seen on CCR/Replica copies as the log files are shipped and then played in to the database. The end result is that log replay will slow to a crawl and you could have some very high replay queue lengths due to excessive Split I/Os occurring. Even with the fastest disks and improperly configured NTFS cluster sizes and disk alignments, you will still see this problem. You must fix the root of this problem to successfully resolve this issue.
So how do you mitigate this? Well, there are various ways to do this…
- If you determine that only a single database is affected by this issue, the quickest mitigation method to get you back in business is the following:
- Dismount the database
- Make a copy of the database to another drive with sufficient space. IMPORTANT: This cannot be on the same drive as we need to write this file out contiguously to another drive. This mere act of copying the file defrags the file for you.
- Delete the original copy of the database file
- Copy the database back to the original location
- Using this method does not resolve the issue long term if the NTFS cluster sizes are too small. It is only meant as a stop gap to buy you some time to resolve the issue long term.
- If on a CCR/SCR cluster, you have some options to fix this longer term.
To resolve the NTFS cluster sizes on the non-active node or SCR target for any particular volume such as F:, use the following command to format the disk with a 64KB block size which is the recommended value for optimal performance.
Format F: /q /y /fs:ntfs /v:VolumeName /a:64K NOTE:
This command wipes out any files that currently resides on the F: drive, so make sure that no other files or applications reside on this drive other than the database and log files. I would hope that you are dedicating these drives exclusively to Exchange and not sharing with any other applications. Exclusivity is what makes recovering from this much easier.
Verify that the disk was formatted properly by running the following command:
Once the disk has been reformatted, go ahead and reseed the databases that previously existed on the drive.
You may ask yourself, if the file is so fragmented, why can I not simply do an offline defrag of the file? The answer is that if you defrag the file itself, you have a high possibility of bloating the FAL size since we are causing the fragments to move around which causes the FAL size to grow. This is the primary reason why Exchange does not recommend running defrag on volumes which host database files. The only way to remove the attribute list for this file is to completely copy the file off to another drive, delete the original copy and then copy the copied file back to the original location. When this is done, the file is written to the disk contiguously leaving literally no fragments in the file. Life is good once again.
Once you have resolved these underlying issues, overall Exchange performance should be that much better and you can sleep better at night knowing you have increased throughput on your Exchange servers.
Note that it is still not recommended to run disk defragmentation software on Exchange server volumes, but there are times where file level fragmentation can cause significant performance problems on a server merely by the way data is being written to the disk. If optimal and/or recommended settings are not used when creating the volumes, this file fragmentation issue can occur much quicker. The majority of Exchange files are in use so running any regular disk defragmentation programs on the server will not help with this situation. If necessary, the only way to resolve this is to take all Exchange resources offline to ensure none of the files are in use and then defragment the disk to make the files contiguous on the disk once again.
In Exchange 2010 SP1 or later, logic was added to detect when the FAL would be exhausted (80% of max); and event accordingly. There is no NTFS event for this behavior. The following event is an example that would be logged for a problematic database during online maintenance.
Log Name: Application
Event ID: 739
Task Category: General
Information Store (5652) EXSERVER MBX Store 001: The NTFS file attributes size for database 'C:\DB\DB001\PRIV001.EDB' is 243136 bytes, which exceeds the threshold of 204800 bytes. The database file must be reseeded or restored from a copy or backup to prevent the database file from being unable to grow because of a file system limitation.
Update (3/8/2011): Exchange 2007 SP3 RU3 now has a fix that is referenced in http://support.microsoft.com/kb/2498066 that will increase the default extent size from 8MB to 64MB similar to that of Exchange 2010. Increasing the extent size helps reduce the amount of fragments that will be created for any given database. The 739 event has also been added so that monitoring software can alert on potential problems.
Reasonable Volume sizes and database sizes go a long way to protect yourself from fragmentation (the more competing files which extended/created on a volume, the greater the fragmentation of those files will be).
- Keep your volume sizes at or below 2TB (why MBR partitions are recommended for E2K7). Exchange 2010 can have GPT volumes greater than 2TB, but the recommendation is to ensure that DB sizes are under 2TB in size.
- Limit the number of databases hosted/volume. 10/volume is the absolute maximum we would recommend; where 5/volume is much better.
- Do not place write intensive non-Exchange workloads on the same volume as an Exchange database.
I hope this sheds some light on why certain failures on Exchange servers could prevent you from doing various operations.
Thanks go to Matt Gossage, Tim McMichael, Bryan Matthew, Neal Christiansen and Luke Ibsen for reviewing this blog entry before posting