Upgrading to SQL Server 2005 and DEFAULT_SCHEMA setting.
Upgrading to SQL Server 2005 has many benefits . As we look back at the past year and the deployment of SQL Server 2005 with our ISV partners, one important but under-advertised feature in SQL Server 2005 became increasingly visible; the ability to control the dynamic name resolution behavior. This capability applies to workloads where object names are NOT fully-qualified with a two or a three part name (for example: “database.schema.object”). If you already fully qualify your object references, which is a SQL Server best practice, then this blog does not apply to you.
In this post, I will first explain the behavior in SQL Server 2000 and expand on what changed in SQL Server 2005. Then I’ll point out how to identify the issue in your workloads and how to change the DEFAULT_SCHEMA setting to take advantage of the benefits.
Dynamic Name Resolution - SQL Server 2000 vs. 2005
A query such as "select * from table1" in SQL Server 2000 goes through a set of steps to resolve and validate the object references before execution. The search first looks at the identity of the connection executing the query. Let’s assume we have a user connection through "user1". For “select * from table1”, SQL Server 2000 searches for the object "user1.table1" first. If object does not exist, the next step is to search for "dbo.table1". These searches complete very quickly but can still be visible to the naked eye in cases where you have a workload with many ad-hoc queries (with today’s hardware this typically means over 100 queries per second) that contain not-fully-qualified object references. There is also a chance that, with the new flexibilities introduced in SQL Server 2005 around separation of users and schemas (http://msdn2.microsoft.com/en-us/library/ms190387.aspx), you may see a higher overhead in name resolution compared to SQL Server 2000. In extreme cases, we have seen the impact of name resolution to be 10-15% on throughput.
However SQL Server 2005 provides a mechanism to allow finer control over name resolution to the administrators. By manipulatng the value of the default_schema_name columns in the sys.database_principals Dynamic Management View, you can change the way name resolution is performed. You can assign a default schema for each database principal (a.k.a user) to go directly to the correct schema if an object name is not fully qualified, and bypass the unnecessary steps.
Reusing the example from above; with DEFAULT_SCHEMA set to 'dbo' in a SQL Server 2005 database, user1 executing 'select * from table1' will directly resolve immediately to 'dbo.table1' without searching for 'user1.table1'.
In fact, if you use the new CREATE USER syntax from SQL server 2005, you will get ‘dbo’ as the DEFAULT_SCHEMA by default. So if you are developing in a new database under SQL Server 2005, you are already taking advantage of this name resolution shortcut. However, the behavior for upgraded databases is different . During upgrade from previous versions of SQL Server, to preserve the same behavior, SQL Server 2005 sets the DEFAULT_SCHEMA to the name of the user. Example: ‘user1’ gets DEFAULT_SCHEMA of 'user1' when you upgrade a database from an earlier version to SQL Server 2005 by attaching or any other method. The reality is, in most instances, applications don't use anything but the ‘dbo’ schema. Nevertheless, it is close to impossible to detect whether your database and queries contain references to only objects in a specific schema.
Why? Well, it may be that some of your objects and queries get generated dynamically and only parsed right before execution. Obviously preserving the backward compatible behavior also means that you still pay for the added cost of the late bound name resolution.
This was exactly the reason why a few of our partners saw higher CPU cycles after the upgrade and experienced slower throughput in SQL Server 2005. In those cases, DEFAULT_SCHEMA setting got us back the throughput we were expecting from SQL Server 2005. The improvements have been as much as 15% in throughput.
Obviously the best practice still stands: You should fully qualify all object names and not worry about the name resolution cost at all. The reality is, there are still many imperfect applications out there and this setting help great for those cases.
How do you know if this is an issue for your workload?
Lets take a look at ways we can tell if this setting could benefit your workload.
The brute force method is to look at your app code or SQL Server 2005 Profiler for the queries that you are executing. If you do not have fully qualified names and you are executing them through a database principal (user) that has a DEFAULT_SCHEMA value that does not have the intended schema name for the not-fully-qualified objects in your queries, then you have the problem.
Reiterating the example from above; user1, with DEFAULT_SCHEMA='user1', executing 'select * from table1' and resolving to 'dbo.table1' could benefit from setting DEFAULT_SCHEMA to 'dbo'.
The other option is to use sys.dm_exec_cached_plans or sys.dm_exec_query_stats to look at the cached plans. Again you can go after the text and look for name qualification. You can also look for the results from the following query.
You can use the following query to see the cached plans that may indicate an issue with not-fully-qualified object names;
SELECT * FROM
SELECT e.[text], e.[user_id], e.[dbid]
FROM sys.dm_exec_cached_plans AS a
OUTER APPLY sys.dm_exec_sql_text(a.plan_handle) AS b
OUTER APPLY sys.dm_exec_plan_attributes(a.plan_handle) AS c
) AS d
PIVOT (MAX(d.value) FOR d.attribute IN ("user_id", "dbid")) AS e
) AS f
WHERE [dbid] not in (32767) -- leave out the resource database
AND [user_id] not in (-2,1,4) -- leave out the generic plans, plans that belong to dbo and sys
If the resultset is non-empty, your application may be a good candidate for the DEFAULT_SCHEMA setting consideration. This is a good shortcut but is not a 100% guarantee. Be aware that sys.dm_exec_cached_plans contains only the cached plans. The cache fluctuates under memory pressure so some queries may get pushed out of the cache due to aging and pressure or, in some cases, may not get cached at all due to cost or security restrictions.
One additional thing to watch for; If you do not see a range of user_ids in the resultset, the benefits from setting the DEFAULT_SCHEMA may not be significant due to other optimizations and shortcuts in the engine around plan caching.
How do you change the DEFAULT_SCHEMA?
Assuming your database is using objects only in the dbo schema here is what you would run for each user:
ALTER USER <Username/> WITH DEFAULT_SCHEMA=dbo
You can validate the current DEFAULT_SCHEMA settings by looking at the default_schema_name column in the sys.database_principals Dynamic Management View.
Cross Posted from http://blogs.microsoft.com/mssqlisv