Top 10 Hidden Gems in SQL Server 2005
By Cihan Biyikoglu
Technical Reviewers: Lindsey Allen, Peter Scharlock, Burzin Patel, Eric Hanson, Mark Souza, Sanjay Mishra, Michael Thomassy
SQL Server 2005 has hundreds of new and improved components. Some of these improvements get a lot of the spotlight. However there is another set that are the hidden gems that help us improve performance, availability or greatly simplify some challenging scenarios. This paper lists the top 10 such features in SQL Server 2005 that we have discovered through the implementation with some of our top customers and partners.
The order in the list does not have much significance except the specific instances we used them and the impact we saw. I will use a practical analogy; I started with the utility-knife size features that can help make life very easy at the right moment and build up to chain-saw size features that can help you implement a full scenario.
Table Difference tool allows you to discover and reconcile differences between a source and destination table or a view. Tablediff Utility can report differences on schema and data. The most popular feature of tablediff is the fact that it can generate a script that you can run on the destination that will reconcile differences between the tables. TableDiff.exe takes 2 sets of input;
Connectivity - Provide source and destination objects and connectivity information.
Compare Options - Select one of the compare options
Compare schemas: Regular or Strict
Compare using Rowcounts, Hashes or Column comparisons
Generate difference scripts with I/U/D statements to synchronize destination to the source.
TableDiff was intended for replication but can easily apply to any scenario where you need to compare data and schema.
You can find more information about command line utilities and the Tablediff Utility in Books Online for SQL Server 2005.
Triggers for Logon Events (New in Service Pack 2)
With SP2, triggers can now fire on Logon events as well as DML or DDL events.
Logon triggers can help complement auditing and compliance. For example, logon events can be used for enforcing rules on connections (for example limiting connection through a specific username or limiting connections through a username to a specific time periods) or simply for tracking and recording general connection activity. Just like in any trigger, ROLLBACK cancels the operation that is in execution. In the case of logon event that means canceling the connection establishment. Logon events do not fire when the server is started in the minimal configuration mode or when a connection is established through dedicated admin connection (DAC).
The following code snippet provides an example of a logon trigger that records the information about the client connection.
CREATE TRIGGER connection_limit_trigger ON ALL SERVER FOR LOGON AS BEGIN INSERT INTO logon_info_tbl SELECT EVENTDATA() END;
You can find more information about this feature in updated Books Online for SQL Server Services Pack 2 un the heading “Logon Triggers”.
Boosting performance with persisted-computed-columns (pcc).
Btree Indexes provide great compromise for tuning queries vs redundant storage of data and added cost of modifying data (insert/update/delete). A less known capability for tuning in SQL Server 2005 is persisted computed columns (PCC). Computed columns can help you shift the runtime computation cost to data modification phase. The computed column is stored with the rest of the row and is transparently utilized when the expression on the computed columns and the query matches. You can also build indexes on the PCC’s to speed up filtrations and range scans on the expression.
The following sample can demonstrate the benefits of a persisted computed column applied to a complex expression. The same TSQL query run against the following table schema with and without the DayType column will demonstrate the effect of the transparent expression matching with persisted computed columns. The output from the sys.dm_exec_query_stats DMV also shows the difference in the IO and CPU characteristics of the query.
SELECT [Ticker] ,[Date] , [DayHigh] ,[DayLow] ,[DayOpen] ,[Volume] ,[DayClose] ,[DayAdjustedClose], CASE WHEN volume > 200000000 and dayhigh-daylow /daylow > .05 THEN 'heavy volatility' WHEN volume > 100000000 and dayhigh-daylow /daylow > .03 THEN 'volatile' WHEN volume > 50000000 and dayhigh-daylow /daylow > .01 THEN 'fair' ELSE 'light' END as [DayType] FROM dbo.MarketData WHERE CASE WHEN volume > 200000000 and dayhigh-daylow /daylow > .05 THEN 'heavy volatility' WHEN volume > 100000000 and dayhigh-daylow /daylow > .03 THEN 'volatile' WHEN volume > 50000000 and dayhigh-daylow /daylow > .01 THEN 'fair' ELSE 'light' END = 'heavy volatility'
CREATE TABLE [dbo].[MarketData]( [ID] [bigint] IDENTITY(1,1) NOT NULL, [Ticker] [nvarchar](5) NOT NULL, [Date] [datetime] NOT NULL, [DayHigh] [decimal](38, 6) NOT NULL, [DayLow] [decimal](38, 6) NOT NULL, [DayOpen] [decimal](38, 6) NOT NULL, [Volume] [bigint] NOT NULL, [DayClose] [decimal](38, 6) NOT NULL, [DayAdjustedClose] [decimal](38, 6) NOT NULL, -- PERSISTED COMPUTED COLUMN -- [DayType] AS ( CASE WHEN volume > 200000000 and dayhigh-daylow /daylow > .05 THEN 'heavy volatility' WHEN volume > 100000000 and dayhigh-daylow /daylow > .03 THEN 'volatile' WHEN volume > 50000000 and dayhigh-daylow /daylow > .01 THEN 'fair' ELSE 'light' END) PERSISTED NOT NULL ) ON [PRIMARY]
Output From The Sys.Dm_Exec_Query_Stats Dynamic Management View (DMV)
In the above picture, the output from sys.dm_exec_query_stats dynamic management view shows the difference in CPU and IO statistics between the same query hitting MarketData_Computed and MarketData tables. Line 1 represents the query run against the table with the persisted computed column. Line 2 is the table without the persisted computed column. With the complex expression pre-calculated in the DayType column, total worker time and overall elapsed time is lower compared to the table without the DayType persisted computed column.
•Another way to verify that the persisted computed column is utilized, is to use the execution plan and look at the scan or the seek operator for the table with the computed column and check the output list, which should contain the column. In the example below you can see the DayType, the name for the PCC, in the output list under #9.
DEFAULT_SCHEMA setting in sys.database_principles
SQL Server provides great flexibility with name resolution. However name resolution comes at a cost and can get noticeably expensive in adhoc workloads that do not fully qualify object references. SQL Server 2005 allows a new setting of DEFEAULT_SCHEMA for each database principle (also known as “user”) which can eliminate this overhead without changing your TSQL code. Here is an example;
In SQL Server 2005, the following query when executed by user1 that has a DEFAULT_SCHEMA of ‘dbo’ will directly resolve to dbo.tab1, instead of the extra search for user1.tab1.
SELECT * FROM tab1
Whereas the same query will search for ‘user1.tab1’ in SQL Server 2000 and if that does not exist it will resolve to ‘dbo.tab1’.
This setting can be especially useful for databases upgraded from SQL Server 2000 to SQL Server 2005. To preserve the original behavior, databases upgraded from SQL Server 2000 will get the username as the DEFAULT_SCHEMA for each database principle. That means, in a database upgraded from a previous version to SQL Server 2005, ‘user1’ will get a DEFAULT_SCHEMA values of ‘user1’. To take advantage of the performance benefits, administrators can set the DEFAULT_SCHEMA through ALTER USER command and change it to the schema that most of the of the objects reside. Be aware this may break queries that may be utilizing objects in other schemas than the one set in the DEFAULT_SCHEMA setting and has not qualified the object names.
DEFULT_SCHEMA is documented in Book Online under the “CREATE USER (Transact-SQL)” heading.
Parameterization allows SQL Server to take advantage of query plan reuse and avoid compilation and optimization overheads on subsequent executions of similar queries. However there are many applications out there that, for one reason or another, still suffer from ad-hoc query compilation overhead. For those cases with high number of query compilation and where lowering CPU utilization and response time is critical for your workload, force parameterization can help.
Force parameterization forces most queries to be parameterized and cached for reuse in subsequent submissions. Forced parameterization will remove the literal values and replaces them with parameters. This minimizes the compilation overhead for queries that are the same except the literal values in the query text. Forced parameterization is typically enabled at the database level. However it is also possible to hint FORCED PARAMETERIZATION on individual queries.
In a number of cases, we have witnessed improvements in performance up to 30% due to forced parameterization. However forced parameterization can cause inappropriate plan sharing in cases where a single execution plan does not make sense. For those cases, you can utilize features like plan guides or query hints.
You can find more information on Forced Parameterization in Books Online.
Vardecimal Storage Format
In Service Pack 2, SQL Server 2005 adds a new storage format for numeric and decimal datatypes called vardecimal. Vardecimal is a variable-length representation for decimal types that can save unused bytes in every instance of the row. The biggest amount of savings come from cases where the decimal definition is large (like decimal(38,6)) but the values stored are small (like a value of 0.0) or there is a large number of repeated values or data is sparsely populated.
SQL Server 2005 also includes a stored procedure that can estimate the savings before you enable the new storage format.
To enable vardecimal storage format, you need to first allow vardecimal storage on the database;
exec sys.sp_db_vardecimal_storage_format N'databasename', N'ON'
Once the database option is enabled, you can then turn on vardecimal storage at a table level using the following procedure;
exec sp_tableoption 'tablename', 'vardecimal storage format', 1
Vardecimal storage format presents an overhead due to the complexity inherent in variable length data processing. However in IO bound workloads, savings on IO bandwidth due to efficient storage can far exceed this processing overhead.
If you would like more information on this topic, updated SQL Server 2005 Books Online for Service Pack 2 contains extensive information on the new vardecimal format.
Indexing made easier with SQL Server 2005
The new Dynamic Management Views have improved monitoring and trouble shooting greatly. A few of the dynamic management views (DMVs) deserve special attention.
Through sys.dm_index_usage_stats you can find out how much maintenance and traversal you have for each index. Indexes with high maintenance numbers and low traversal numbers can be considered as good candidates for dropping.
Through sys.dm_db_missing_index_* collection of DMVs, you can get recommendations on what new indexes could benefit the queries running on your server. The recommendations come with a estimate on how much improvement you can expect from the new index.
If you’d like to automate creation and dropping of indexes, SQL Server Query Optimization Team has blogged about how to automate index recommendations into actions: http://blogs.msdn.com/queryoptteam/archive/2006/06/01/613516.aspx
Figuring out the most popular queries in seconds
Another great DMV that can help save you a lot of work is sys.dm_exec_query_stats. In previous version of SQL Server to find out the highest impact queries on CPU or IO in system, you had to walk through a long set of analyses steps including getting aggregated information out of the data you collected from profiler.
With sys.dm_exec_query_stats, you can figure out many combinations of query analyses by a single query. Here are some of the examples;
Find queries suffering most from blocking –
(total_elapsed_time – total_worker_time)
Find queries with most CPU cycles –
Find queries with most IO cycles –
(total_physical_reads + total_logical_reads + total_logical_writes)
Find most frequently executed queries –
You can find more information on how to use dynamic management views for performance troubleshooting in the “SQL Server 2005 Waits and Queues” whitepaper located at: http://www.microsoft.com/technet/prodtechnol/sql/bestpractice/performance_tuning_waits_queues.mspx
Scalable Shared Databases
Scalable Shared Databases provide an alternative scale out mechanism for Read-Only environments. Through Scalable Shared Databases one can mount the same physical drives on commodity machines and allow multiple instances of SQL Server 2005 to work off of the same set of data files. The setup does not require duplicate storage for every instance of SQL Server and allows additional processing power through multiple SQL Server instances that have their own local resources like cpu, memory, tempdb and potentially other local databases. However this type of setup does limit the IO bandwidth since all instances point to the physical set of files.
Book Online for SQL server 2005 contains details on Scalable Shared Databases.
Highly concurrent workloads hit a contention point around global state they maintain at some point. That point in many cases happen to be ‘8’. One way around this contention has been to eliminate the global state and create hierarchies. NUMA architectures allow us to eliminate the contention around global resources by moving main resources closer to each other and forming nodes. SQL Server 2005 recognizes the NUMA architecture and self manages allocation of resources to adhere and take advantage of the hardware NUMA setup at the time of startup. By aligning with the HW setup SQL Server partitions its internal management to improve throughput.
Some workloads benefit greatly from the partitioning concept, especially mixed workloads that have to run varying characteristics of data access concurrently (example: OLTP and Reporting). Soft-NUMA allows partitioning configuration to be extended into the software level and defined either on top of a NUMA enabled environment to further divide the hardware partitions into smaller chunks or on a machine that does not utilize NUMA concepts to enable partitioning for the configuration. By configuring partitions through Soft-NUMA, the administrator can control the allocation of schedulers and memory managers for each node and can configure specific TCP/IP ports for the nodes. Then, clients can be configured to connect using the specific ports to access specific partitions.
Soft-NUMA topic is extensively covered in Book Online. You can also read more about the details of Soft-NUMA at Slava Oks’s Weblog at http://blogs.msdn.com/slavao/