SQL Server fragmentation
Recently we started shredding “Oslo” daily builds into the “Oslo” Repository. We did this to help with internal development but also to get realistic operational data to validate our design and get realistic traces to use for stress and performance testing. We are learning a lot and have found a major bug that sometimes leads to deadlocks. One of the great pieces of data we now have relates to database fragmentation. So far, the design is holding up and we have only one fragmentation issue to deal with. However, since database fragmentation is a major cause of poor performance I thought a discussion of how to minimize and deal with database fragmentation in SQL Server might be generally useful.
Essentially, three types of fragmentation affect SQL Server:
· File (disk) fragmentation. SQL Server is optimized to access a data file’s logical pages sequentially. File fragmentation means that many logically sequential pages are not physically sequential and time is wasted doing disk seeks.
File fragmentation can be fixed with normal windows defragmentation utilities. File fragmentation is most frequently caused by autogrowing files instead of initially allocating sufficient space. This is why I tend to oppose efforts to initialize the Repository to a small size and then autogrow it. If accounts are setup correctly then fast file initialization means that there is only a minimal setup performance penalty to allocating sufficient space. If autogrow is needed then it should be in large chunks.
Putting log and data files on separate physical disk arrays is a best practice for reliability and performance reasons but it also helps minimize file fragmentation.
· Data file pages that are out-of-order. Because pages are out-of-order, time is wasted doing disk seeks during, say, an index scan and disk caches are not used efficiently. Write performance can also suffer since, typically, the page to update needs to be found.
Out of order pages can be fixed by reorganizing the index with an alter index statement or with a SQL Server maintenance plan. Out of order pages are most frequently caused by poor index key choices relative to insert and delete behavior. For example, an externally generated GUID based key will mean that inserts are at random positions in the index leading to frequent page splits. Since most extents (i.e. 8 contiguous pages) will be filled up, the new page will have to go in another extent and the index pages will be out of order. Natural keys also frequently suffer from this problem since they are inserted out-of-order.
· Data file pages with a lot of free space. Because pages are not full, time is wasted reading more pages than necessary during, say, an index seek or an index scan. Write performance can also suffer since, typically, the page to update needs to be found. However, sometimes partially full pages can improve performance because when data is added to an already full page it will be split which is often expensive. The index fill factor option can be used to fine tune how full an index leaf page is initially. By default, SQL Server fills leaf pages.
Poor page utilization can be fixed by rebuilding the index with an alter index statement or with an SQL Server maintenance plan. Poor page utilization is most frequently caused by poor index key choices relative to insert and delete behavior. For example, an externally generated GUID based key will mean that inserts are at random positions in the index leading to frequent page splits and poor page utilization. Natural keys also frequently suffer from this problem since they are inserted out-of-order.
My advice for minimizing page fragmentation, in SQL Server, is to:
1. Understand how SQL Server indexes are structured (see http://msdn.microsoft.com/en-us/library/ms190969.aspx, http://msdn.microsoft.com/en-us/library/ms180978.aspx and http://blogs.msdn.com/sqlserverstorageengine/archive/2006/06/23/644607.aspx).
2. Understand how data will be inserted and deleted and expected data volumes
3. Make sure you database files are sized right and autogrowth is set to a suitable number
4. Try to structure indexes and keys so most inserts will be at the end of the table/index and deletes will be in contiguous chunks. Typically, this means avoiding natural keys and externally allocated GUIDs.
5. Monitor fragmentation
6. Set up a maintenance plan to reorganize/rebuild indexes that are prone to fragmentation
To find any database fragmentation you can run the various Disk Usage reports in the database right click in SSMS. Alternatively, you can use various SQL Server management views and functions. For example:
select S.name as [Schema], O.name as [Object], I.name as [Index],
round(P.avg_fragmentation_in_percent, 1) as [Percentage Out-of-Order],
round(100.0 - P.avg_page_space_used_in_percent, 1) [Percentage Free Space],
P.page_count * 8 [Size (Kb)]
from sys.dm_db_index_physical_stats(db_id(N'DailyBuilds'), null, null, null, N'detailed') as P
inner join sys.objects as O on O.object_id = P.object_id
inner join sys.indexes as I on I.index_id = P.index_id and I.object_id = P.object_id
inner join sys.schemas as S on S.schema_id = O.schema_id
where P.avg_fragmentation_in_percent >= 10 or
P.avg_page_space_used_in_percent <= 80
order by [Schema], [Object], [Index]