Database Administrator’s Guide to SQL Server Database Engine .NET CLR Environment
Writer: Kimberly L. Tripp, Founder, SQLskills.com
Applies To: Microsoft® SQL Server™ 2005
Microsoft® SQL Server™ 2005 provides the database application programmer with a rich, new development platform by hosting the Microsoft .NET Framework Common Language Runtime (CLR) environment. With new capabilities come new roles and responsibilities for the database administrator (DBA). This white paper helps the DBA determine the appropriate use of this new feature and also provides guidance about when other alternatives may provide better performance, flexibility, or capabilities.
This white paper also offers guidance about suitable uses for the Database Engine .NET Framework Programming API. It also suggests code, change, and release management processes that should be tailored to each DBA’s circumstances in order to ensure a professional and safe deployment.
On This Page
About This Paper
Finding the Right Tool for the Job
Introduction to .NET Framework Programming in the Database Engine
Choosing the Right Tool for the Job
Source Code Management
Beyond SQL Server 2005
Additional SQL Server 2005 Resources
About This Paper
The features and plans described in this white paper are the current direction for the next version of the SQL Server. They are not specifications for this product and are subject to change. There are no guarantees, implied or otherwise, that these features will be included in the final product release.
For some features, this document assumes that the reader is familiar with SQL Server 2000 features and services. For background information about SQL Server features and services, see the official product Web site at http://www.microsoft.com/sql/ or the SQL Server 2000 Resource Kit that is available from Microsoft Press.
This white paper provides information that helps database administrators ensure successful, risk free, and stress-free adoption of Microsoft .NET Framework programming in the Database Engine. Thus, the audience for this white paper is the database administrator. For a developer perspective of the .NET Framework programming for the SQL Server 2005 Database Engine, see the white paper on MSDN titled Using CLR Integration in SQL Server 2005
Finding the Right Tool for the Job
Microsoft® SQL Server™ 2005 provides a broad set of programming interfaces that enable developers to build robust database applications with greater ease, performance, and reliability than before. Along with this breadth of programming options comes the need to consider which set of tools is appropriate for each task. Although many tasks can be accomplished in multiple ways, each has pros and cons. Thus, finding the best tool for the job is critical for an application to perform and scale with load and growing business usage. Some of the questions the DBA needs to ask include the following:
Should the system handle this data as XML or should it be shredded and stored relationally?
Should this process, and all its complex pieces, be handled synchronously or asynchronously?
Should this business logic, this calculation, or this added security option be handled in the client application, the middle-tier, or the back-end database?
Should data analysis be handled in the relational database or through the Business Intelligence engine?
Should the data transformation occur with the Integration Services ETL engine or in the database using transforms built with Transact-SQL?
Should complex business logic, traditionally running on middle-tier servers, remain in the middle-tier or migrate to the SQL Server platform?
What mix of clients and servers are running in the infrastructure. Is there a need to support Windows clients, Unix clients, or both?
In most database development projects, the role of technology selection and the structural design of components that interact with the database falls on the database administrator (DBA). This is the person with final responsibility for managing and recovering that business data. Most DBAs adopt conservative attitudes to new technology. This is a natural instinct because, along with the benefits offered by new functionality, new technology can introduce new risks to stability and integrity. The professional DBA usually manages the risk/benefit by ensuring that there is full testing and that they understand the new technology. And, further, by taking the time to identify where it adds the most value and, perhaps more importantly, where it should not be used. As a result of this natural conservatism, the DBA may ask, “How do I turn that feature off until I understand it?” The good news is that unlike previous releases of SQL Server, in this release many new features are off by default.
Instead of leaving all features off permanently, a prudent DBA will tend to learn enough about the technology to determine where its use is appropriate and where its application makes the most sense. There is no need to understand every line of code in every language that the developer might use, but there needs to be enough confidence to be able to provide great operational support, maintenance, and troubleshooting. Across many of these new features, the key to proper usage is understanding, impact isolation, and strong control.
Introduction to .NET Framework Programming in the Database Engine
The SQL Server 2000 database programmer has the following options when coding against SQL Server database tables and views:
Use Transact-SQL to write code that runs within the database. Code can be written as stored procedures, user-defined functions, and/or triggers that can be regarded as stored procedures that are invoked on data change.
Use Microsoft® Visual C++® to write code, an extended stored procedure, which runs within the database. Code that is written as an extended stored procedure appears to users as a stored procedure and is executed in the same way. Parameters can be passed to extended stored procedures and they can participate in transactions and return both results and return status.
Use the sp_OA* (Object Access) system stored procedures to load and interact with COM objects.
Use other languages and middleware, such as ADO and ADO.NET, to write code that executes outside of the database and that passes in queries or invokes stored procedures and functions to access data.
Each of these options has issues when the solution demands that data be integrated with functionality supplied by external libraries. For example, such as those provided with the .NET Framework, or that nontrivial mathematical operations be applied to the data, or if the requirement is for something more complex, such as a custom aggregation of data or a true user-defined data type.
Each of the four options has limitations:
Transact-SQL is excellent for set-based operations such as comparisons between tables but, due to the interpreted nature of the language, it can struggle to deliver good performance for computationally heavy tasks. Another limitation is that unlike modern programming languages, Transact-SQL does not have support for private/public data encapsulation, so it is harder to implement clean interfaces between modules. Finally, SQL Server 2005 introduces improved error handling within Transact-SQL. However, it is still susceptible to “untrappable” errors caused by missing objects or bad syntax that is easily handled by .NET Framework languages.
Extended stored procedures are by their nature written in unmanaged code and execute within the context of the SQL Server process. A greater level of programming competency is required to create code that does not inadvertently leak memory or generate unhandled exceptions that can crash the entire SQL Server process. Extended stored procedures cannot provide in-process access to the Microsoft .NET Framework libraries without placing the server in an unsupported state. For more information, see the knowledge base article titled Using extended stored procedures or SP_OA stored procedures to load the CLR in SQL Server is not supported.
The sp_OA* system stored procedures place limitations on the COM object. This requires that its interface be implemented in a compatible way and have further restrictions on the amount of data that can be passed to the COM object in a single call. They can encourage inappropriate use of components that are not designed to be used in high-throughput scenarios, or that do not support multiple invocations by a single process. In the worst case, the component can attempt to display an error message window or other dialog on the SQL Server.
External code can cause performance problems because data must leave the SQL Server process space and flow to the calling application. This data marshalling can be expensive for large volumes of data.
None of the current options can be used to create first-class, custom aggregate functions or custom data types where first-class means running within the database as if it were a SQL Server primitive function or data type.
With these limitations in mind, SQL Server 2005 integrates the .NET Framework Common Language Runtime (the execution environment for managed code). Thus, it enables database developers to place managed application code inside the SQL Server that is safe, secure, scalable, and feature rich. Code can be written as follows:
User-defined functions (scalar or table valued)
The mapping of user-defined functions, stored procedures, and triggers to objects written in managed code is fairly intuitive. The CLR programs are accessed and execute in the same way as their Transact-SQL equivalents. However, user-defined aggregates and types are less intuitive and extend the options of the database programmer in new ways:
User-defined aggregates allow the programmer to build custom aggregate functions (used in conjunction with the GROUP BY clause). This enables complex statistical and data analysis in the database engine.
User-defined types provide the programmer with the ability to define new types with custom behaviors. Combined with the power of the .NET Framework and third-party libraries, this new capability will allow strongly typed objects to be created instead of forcing a relational representation.
High Performance Implementation
SQL Server 2005 delivers high-performance access to managed code that runs inside the database server process. Unlike other database technologies that have provided a degree of integration with the .NET Framework, SQL Server 2005 hosts the runtime environment (CLR) in the database engine’s process space. This delivers higher performance when transitioning between the SQL Server query execution environment and the CLR. The integration is designed to avoid conflicting memory and CPU demands between database queries and programs. Additionally, the SQL Server and .NET Framework software engineers worked to make the CLR safe and performant within the SQL Server process:
The CLR requests memory from SQL Server, not directly from Windows.
CPU-intensive CLR memory garbage collection is controlled by SQL Server.
An in-process version of the managed SQL Server client passes SQL requests straight into the SQL Server query processor, thus avoiding costly network interaction.
CLR application domains are created and managed by SQL Server.
All this engineering is designed to ensure that a runaway CLR program cannot compromise the stability of the SQL Server.
Secure By Design, Default, and Deployment
Microsoft strives continuously to deliver secure products to its customers. For example, the Trustworthy Computing Initiative lead to SQL Server 2000 SP3, designed to be the most secure release of SQL Server 2000. This initiative continues to impact SQL Server as Microsoft moves further in this release by enhancing the “off by default” security of its products.
Note The Database Engine .NET Framework Programming API is off by default and the DBA must make the deliberate decision to activate the feature.
SQL Server 2005 introduces the Surface Area Configuration tool that empowers the DBA to control which features are enabled. This change in philosophy ensures that potentially unused features are not enabled and left in an unprotected state.
Step 1: Open the Surface Area Configuration Tool
The shortcut to the tool is installed in the Start/All Program menu within the SQL Server 2005 program group in the Configuration Tools sub-group.
Select the SQL Server Surface Area Configuration option and then select Surface Area Configuration for Features.
This will open the dialog below that enables the selection of a SQL Server instance and then the selection of each locked down option.
Step 2: Enable the Database Engine .NET Framework Programming API
Selecting Surface Area Configuration for Features opens the dialog below. Two views of the options are available: the default (by instance) enables control by SQL Server instance, and the second enables control across components such as the Database Engine and the Reporting Services.
Before enabling features that increase the surface area of the SQL Server, it is recommended that the DBA ensure that their systems are:
At the latest service pack and critical hotfix level (obtained from Microsoft Update)
Configured according to their recommendations for secure systems (these may be informed by Microsoft and other third-party vendor advice on server and infrastructure configuration)
The Database Engine .NET Framework Programming API is part of the Database Engine feature set and is referred to as CLR Integration in the user interface.
The user interface provides an easy way to view and set SQL Server 2005 instance level permissions. The SQL Server 2005 instance level features can also be controlled programmatically by:
Transact-SQL sp_configure command
Managed code using the Server Management Objects (SMO) Server object Configuration class
To Enable the API with Transact-SQL
-- Enable & Check the Database Engine .NET Framework Programming api sp_configure N'clr enabled', 1 go reconfigure go SELECT sc.* FROM sys.configurations AS sc WHERE sc.[name] = N'clr enabled'
Choosing the Right Tool for the Job
Decisions about tool and API selection depend on many other factors such as the skills of your internal staff, the recommendations of the third-party software vendors that deliver the systems that run your company, the requirements to deliver a brand-new system, and many others.
The illustration depicts both the dilemma and the choice that most DBAs will face. Some decisions are simple while other decisions are extremely complex, especially where there are many ways to implement the functionality and there is no absolute direction between the technology options.
In these cases, prototyping becomes more important. A quick implementation using two or more competing options can make the choice much clearer.
As previously stated, the professional DBA is conservative. As such, new technologies are not implemented lightly; they must first be understood and then carefully tested before they become integrated into production environments. To do otherwise would risk destabilizing the database and lead to user distrust of the integrity of the systems that the database hosts, and possibly threaten the job safety of the DBA. A further assumption is that developers are more willing to take risks and to adopt new tools that increase their productivity and extend the functional domain of the solutions they can offer their users. There is no doubt that the Database Engine .NET Framework programming model is powerful and that many developers will appreciate the ability to write in-database code in first-class programming languages.
So, where do these two conflicting character types meet in order to ensure that their systems remain stable and that new productivity and functionality gains are realized? The answer is in finding balance, understanding the opportunities, and more importantly defining choices rationally, based on strict criteria for technology selection. This section suggests some guidelines that can be followed to achieve this middle ground.
The first set of points provides guidance about where it might be a mistake to use the new functionality:
Heavy relational data access
Don’t move away from Transact-SQL for simple query execution. Transact-SQL set-based access will be faster than paying the transition cost for moving data in/out of the .NET Framework, especially if the set-based query is replaced by cursor-like behavior in the program. The section, Troubleshooting, contains information about this potential cost. Note that the opposite applies if complex computations are taking place within the query. In this case, moving the logic into a .NET Framework program where the computation will be fully compiled can improve performance. The transition overhead between the Transact-SQL and .NET Framework execution environments is more noticeable for simple computations and basic relational data access. In this case, it is likely that Transact-SQL will outperform the CLR.
Long running, external calls
While it is tempting to use the new functionality to further integrate existing business systems, it is important to take the time to ensure that the end-user experience does not get negatively impacted by calls to external APIs and external systems. These impacts can be especially visible in a user-defined function that might get called for every row of a table within a query. An external call that costs one second per row suddenly becomes unusable in an online system when applied to a modest 10,000-row table.
Unnecessary use of user-defined types
When the object’s data can be easily mapped to one or more relational data types, you should stay or go relational. Be aware that user-defined types have the following:
An 8-K size limitation. They must fit on a single SQL Server data page.
All data within the UDT is read and then rewritten if updated.
The same size restriction applies to user-defined aggregates so care should be taken especially when concatenating large string objects.
User-defined aggregates and online reporting
User-defined aggregates cannot be used in combination with the SQL Server Indexed Views, so it is not possible to automatically pre-aggregate data for online report performance. If stale data is acceptable, then a separately created and maintained table that periodically caches the aggregated results can be used in place of an indexed view.
Compatibility with earlier versions of SQL Server
If your application must support earlier versions of SQL Server, you cannot use this functionality.
Appropriate use of the technology
The Database Engine .NET Framework programming API introduces a large range of new possibilities for the database programmer. However, you should avoid using this new technology hastily unless you can state a clear rationale for its use.
These points might seem like a strong case for not deploying this technology. However, there are many compelling scenarios where it can provide considerable benefits. These include the following:
Leveraging the power of the .NET Framework and the Visual Studio programming environment
This is where the most gains can be made in terms of developer productivity and new possibilities.
With the 2005 generation of SQL Server and Microsoft® Visual Studio®, the Integrated Development Environment and the .NET Frameworks are now in their third release and second generation. This has allowed a significant amount of developer feedback. This means that the user-interface enables rapid application development, and the class libraries provide a rich set of objects and methods that the developer can employ to avoid having to develop their own code to do many generic tasks.
The first-class programming languages of the .NET Framework (C#, Managed C++, and Visual Basic.NET) offer the programmer more control over error handling and provide for better diagnostics in terms of call stacks and other debug information.
Access to most of the local functionality is possible without granting high levels of privilege. This means that rich XML, string, regular expression, localization, and data manipulation functionality is readily available without compromising data security.
In addition, because the code is compiled and then converted to machine code at invocation, it is possible to code business logic that executes an order of magnitude faster than Transact-SQL.
Examples where this might be done include the following:
Connecting to a remote Web service to access data in tabular format (realizing the integration promises of Web services at the database level)
Calling a third-party vendor’s control to add an order to an ERP system, based on a change inside the database
Using a set of code libraries that offers data types and functions specific to your industry segment, be it research, financial, manufacturing, or sales
Replacing extended stored procedures (XPs)
Prior to the arrival of the Database Engine .NET Framework programming API, the only way to provide access to the external world was through extended stored procedures and the sp_OA* stored procedures. However, as documented earlier, these methods present a high risk to database stability even when used by experienced developers.
It is recommended that the DBA who has systems with extended stored procedures providing extended business logic, or who is using object model manipulation through the sp_OA* stored procedures, consider rapid adoption of this new technology, because the SQL CLR is safer.
Using the SQL CLR has these advantages:
There is no possibility of managed, user-code generated access violations making SQL Server crash.
There is no possibility of managed user-code memory leaks making SQL Server slow down and hang.
There is better performance and scalability through SQL Server’s memory manager controlling system resources.
There are no security issues, because security is fully integrated with both the SQL Server and .NET Framework environments.
These advantages may not apply to assemblies registered as unsafe, because they are capable of calls to unmanaged code or “process-threatening” libraries (such as dialog generating code, thread creation code, or code that interferes with the process context), or because they use COM automation. Furthermore, they may still have an impact on the SQL Server instance by causing access violations or by leaking memory.
Most extended stored procedures can be replaced, especially considering that Managed C++ is available as a coding option.
Data validation on updates
Enforcing a common set of business rules when changing data across multiple clients is a perennial problem for some systems that can have complex middle-tier validation logic. The new API allows this logic to move to triggers in the database tier to ensure that all updates are consistent.
An example might require that data be entered in a specific order and some of it in systems and technologies that are not normally directly accessible to Transact-SQL triggers. The trigger can check that a new customer is first entered and accredited into a remote system on an IBM mainframe that manages credit risk across all the company’s customers.
Network traffic reduction
Some algorithms require all or a large percent of the data to produce results. However, moving large volumes of data between servers can be very expensive in terms of CPU. Thus, placing this algorithm in the database can avoid the data marshalling CPU and network cost. This will likely result in better three-tier performance for well-designed applications.
Examples where this effect might be seen include the following:
Statistical calculations that require all the data in order to produce their output.
Where a data set from SQL is being joined onto data not held in a relational database, and is not accessible to a linked server, and in which the volume of data in SQL is much greater than that of the remote system.
Writing general purpose functions
A general-purpose function has the following characteristics:
Data is passed in as function arguments
There is little or no additional data access within the function
Complex computation is applied through cursor-like code to the data within a loop that processes one row at a time
This is an area where the benefits of the compiled nature of the CLR execution environment can greatly outweigh the costs of transitioning data between the environments. In performance tests, the SQL Server team has seen benefits as low as three integer operations per call.
Examples might include closing balance calculations that “number crunch” through the day’s transactions looking for fraud or any unusual data patterns.
Implementing scalar, user-defined types
Although most data can be mapped to the relational model, there are many examples where a user-defined type makes considerable sense:
The type wraps external behavior in order to present it within SQL Server. An example might be a date type that implements UTC functionality.
The type uses encapsulation to protect its contents and where the data is usually read and updated together. An example might be an implementation of a spatial data type or a complex number.
Using the power of custom, user-defined aggregations
Many industries crunch data into custom groupings based on all/subsets of input data where the native aggregation operators, such as SUM, AVG, and MIN, are not sufficient. Examples might be computing a Fourier transform or an actuarial predictive calculation. The implementation of user-defined aggregates permits fan out/fan in (parallel execution) across multiple threads and, thus, should scale well across multiple processors.
High performance table-valued, user-defined functions
In saving pre-instantiation by supporting partial results from streaming table-valued, user-defined functions, many algorithms do not require a complete list of all items from an external data source. Some examples of this include “get the latest stock price”, “get the latest event from the event log”, and “get the first item in the queue”. The user-defined function infrastructure supports streaming of data. Data is requested on demand instead of at a single time, thus avoiding having to read large results sets into memory.
This is a capability that exceeds that of the Transact-SQL table-valued, user-defined function, which must instantiate all its data when called. In situations where partial requests are common, it could considerably outperform the traditional user-defined function.
This list of good and bad uses for this technology should serve as a starting point to the development and systems management communities for understanding where this new functionality fits in their toolkit.
Developers author code within Visual Studio. This is the optimum tool for productive programming using the Database Engine .NET Framework programming API. It is possible to use other tools to create programs, such as the Express toolset or even Windows Notepad, but these lack the Database Engine wizards, integration with MSDN and Visual Source Safe, and other team-development tools. Additionally, Visual Studio offers a multiplicity of deployment, tests, and debugging tools that a proficient developer can exploit to deliver high-end applications.
The paradigm is simple: After developers have completed their solution, they use the .NET Framework compilers to build an assembly (a .dll file stored on the target file server) and then either manually or automatically deploy the binary assembly into the database.
After the assembly has been loaded into the database, it is now independent of the original file (the .dll). This means that the database can be backed up, moved, and even restored without worrying about external object dependencies, because they are all preloaded into the database.
Note that the original source code must still be maintained and, ideally, all changes tracked.
The following recommendations are made in order to maximize the productivity of developers:
Developers should be provided with a full set of professional tools
Visual Studio 2005 Team System and MSDN
SQL Server 2005 Express for occasional Database Engine .NET Framework API uses and some limited forms of testing and development
SQL Server 2005 Developer Edition for regular Database Engine .NET Framework API users
Visual SourceSafe or an equivalent source control management system for team access to source code and release management
Developers need access to production-like data
A scaled-down version of the production database, with a reasonable representation of relative table sizes and with sensitive customer data obfuscated, should be accessible in order to test and prototype accurately. An important note is that other tools, such as SQL Server 2005 Integration Services, can be used to create a rich developer version of your production database with obscured credit card, name, address, and social security numbers.
Access to test Web services for testing integration with other systems
There are some restrictions of the .NET Framework infrastructure that predicate “a database on every (developer’s) desktop”. This includes when using Visual Studio debuggers to trace through the execution of new code that the CLR engine single threads. Debugging should never take place against a production system, except as a last resort.
The Developer Edition of SQL Server 2005 is recommended for professional developers because it provides them with the SQL Server Management Studio that is fully integrated with the source control API. This edition also includes the SQL Server Profiler for tracing the execution of SQL Server commands. There is a wealth of other tools that overlap the DBA and developer roles that will make developers more productive when creating and scripting database objects such as SQLCMD for scripting and automation. For more information about tools, see the resources and links listed at the conclusion of this white paper.
One of the main duties of a diligent DBA is tracking the objects that make up a service such as which tables are in which databases, what stored procedures and functions are required, what scripts must run daily, weekly, monthly, and so on. To aid in tracking all these objects, there are a number of tools from Microsoft and other software vendors. These include Microsoft® Visio® for the data model, Microsoft SQL Server Management Studio for the database, and many others.
The integration of SQL Server 2005 with the outside world introduces many new objects that both supplement and extend the relational system catalog of earlier releases. This section documents how to use the changed catalog to obtain an accurate list of these new database objects, as well as the additional metadata for each new object type within a database.
-- SYS.ASSEMBLIES -- Name, Assembly ID, security and “is_visible” flag SELECT * FROM sys.assemblies -- SYS.ASSEMBLY_FILES -- Assembly ID, name of each file & assembly contents SELECT * FROM sys.assembly_files -- SYS.ASSEMBLY_MODULES -- Sql ObjectID, Assembly ID, name & assembly method SELECT * FROM sys.assembly_modules -- SYS.ASSEMBLY_REFERENCES -- Links between assemblies on Assembly ID SELECT * FROM sys.assembly_references -- SYS.MODULE_ASSEMBLY_USAGES -- Partial duplicate of SYS.ASSEMBLY_MODULES -- Links SQL Object ID to an Assembly ID SELECT * FROM sys.module_assembly_usages
These views are defined as system catalog views and provide information about the containers of these new assemblies. These assemblies (or .dll files) are the result of .NET Framework program compilation and registration in the database.
The sys.assemblies “is visible” flag can be used to hide an assembly from having its public methods registered. This can be used to conceal helper assemblies such as the .XmlSerializers generated for assemblies that use Web services to access data.
In the sys.assembly_modules view, a null assembly method indicates an assembly that will be used as a user-defined aggregate. Note that there are also two system stored procedures that provide access to these tables, as shown in the following.
EXEC sys.sp_assemblies_rowset N'<AssemblyName>' EXEC sys.sp_assembly_dependencies_rowset <AssemblyID>
-- Read CLR Stored Procedure Metadata SELECT schema_name(sp.schema_id) + '.' + sp.[name] AS [Name] , sp.create_date , sp.modify_date , sa.permission_set_desc AS [Access] , sp.is_auto_executed FROM sys.procedures AS sp INNER JOIN sys.module_assembly_usages AS sau ON sp.object_id = sau.object_id INNER JOIN sys.assemblies AS sa ON sau.assembly_id = sa.assembly_id WHERE sp.type_desc = N'CLR_STORED_PROCEDURE'
-- CLR Trigger Metadata -- Note that Trigger parent object is always in the same schema SELECT schema_name(so.schema_id) + '.' + tr.[name] AS [Name] , schema_name(so.schema_id) + '.' + object_name(tr.parent_id) AS [Parent] , te.type_desc AS [Fired On] , te.is_first , te.is_last , tr.create_date , tr.modify_date , sa.permission_set_desc AS [Access] , tr.is_disabled , tr.is_not_for_replication , tr.is_instead_of_trigger FROM sys.triggers AS tr INNER JOIN sys.objects AS so ON tr.[object_id] = so.[object_id] INNER JOIN sys.trigger_events AS te ON tr.[object_id] = te.[object_id] INNER JOIN sys.module_assembly_usages AS mau ON tr.object_id = mau.object_id INNER JOIN sys.assemblies AS sa ON mau.assembly_id = sa.assembly_id WHERE tr.type_desc = N'CLR_TRIGGER'
-- List CLR Scalar Functions SELECT schema_name(so.schema_id) + N'.' + so.[name] AS [Name] , so.create_date, so.modify_date , sa.permission_set_desc AS [Access] FROM sys.objects AS so INNER JOIN sys.module_assembly_usages AS sau ON so.object_id = sau.object_id INNER JOIN sys.assemblies AS sa ON sau.assembly_id = sa.assembly_id WHERE so.type_desc = N'CLR_SCALAR_FUNCTION'
Table Valued Functions
-- List CLR Table Functions SELECT schema_name(so.schema_id) + N'.' + so.[name] AS [Name] , so.create_date, so.modify_date , sa.permission_set_desc AS [Access] FROM sys.objects AS so INNER JOIN sys.module_assembly_usages AS sau ON so.object_id = sau.object_id INNER JOIN sys.assemblies AS sa ON sau.assembly_id = sa.assembly_id WHERE so.type_desc = N'CLR_TABLE_VALUED_FUNCTION'
-- List CLR Aggregate Functions SELECT schema_name(so.schema_id) + N'.' + so.[name] AS [Name] , so.create_date , so.modify_date , sa.permission_set_desc AS [Access] FROM sys.objects AS so INNER JOIN sys.module_assembly_usages AS mau ON so.object_id = mau.object_id INNER JOIN sys.assemblies AS sa ON mau.assembly_id = sa.assembly_id WHERE so.type_desc = N'AGGREGATE_FUNCTION'
-- User Defined Type Metadata -- Includes base assembly information SELECT st.[name] AS [Type Name] , st.max_length , st.[precision] , st.scale , st.collation_name , st.is_nullable , sa.create_date , sa.[name] AS [Assembly Name] , sa.permission_set_desc AS [Access] FROM sys.types AS st INNER JOIN sys.type_assembly_usages AS tau ON st.user_type_id = tau.user_type_id INNER JOIN sys.assemblies AS sa ON tau.assembly_id = sa.assembly_id
One of the potential drawbacks of integrating environments with distinct security models is that without careful design the aims of one can be compromised by the other. For example, an extended stored procedure must connect back into the SQL Server to access data and must enlist the active transaction to protect the integrity of data it changes. Failure to do either can result in data corruption and elevation of privilege security holes.
The architects of the Database Engine .NET Framework Programming API have been careful to ensure that both security models are respected. The SQL Server model using the GRANT command to provide specific object access is respected as is the .NET Framework’s Code Access Security that controls the interaction between modules of higher/lower privilege.
A non-administrative user must be granted execute privileges on routines before they can be used.
To simplify the code access security model (the permissions that define what the assembly can do), permissions are grouped into three sets:
The assembly’s methods can do no more than an equivalent Transact-SQL stored procedure or function. The code runs under the caller’s credentials.
The assembly’s method can perform file and network input/output. External code runs with the SQL Server service account’s privileges, so impersonation should be used to ensure that the caller’s privileges are used when controlling access to external resources specified by user input.
This permission extends the external set of permissions and allows the assembly to call unmanaged and, hence, uncontrolled code.
Note that even with UNSAFE mode, the integration code prevents a number of .NET Framework libraries from being used. For example, compilers, window managers, enterprise service libraries, and others are blocked because their use within the SQL Server does not make sense.
It is recommended that the DBA restrict the use of External/Unsafe code and that they apply stricter requirements in terms of code-review and permissions design if their developers request these higher levels of permission for their code.
Source Code Management
CREATE ASSEMBLY loads the binary assembly into the SQL Server database. There is no way of loading the source at the same time, because it is left to the DBA to manage the source code that was used to create the assemblies. It is likely that third-party software will supply only binary codes, and equally likely that the third party will provide professional technical support in the case of issues with their software. However, it is important that the DBA establish a firm process for managing source code created by their developers before its absence becomes a problem.
The Programming Paradigm section in this paper contained recommendations about the tools provided to application developers. If these recommendations are followed, it becomes relatively easy for the DBA to enforce solid change control policies and procedures, and to truly delineate between development/test systems and those that run the production system. Using SQL Server 2005, new security features and DDL triggers can restrict access and commands (or at least audit changes) to better enforce change control policies of the production environment. If desired, DDL Triggers can be used to prevent all DDL (Data Definition Language) statements, or specific statements or groups of statements, within a database.
Note that code should never pass from development to production without recompilation to ensure that code reviews with all relevant stakeholders have reviewed the correct code and all dependencies have been supplied.
Although it is recommended that developers exploit Visual Studio’s auto-deployment features in testing for maximum productivity during the development coding cycle, after completion developers should pass only source code and compilation command files to their DBAs for deployment.
Note that the compilers are freely redistributable with the .NET Framework, so there is no requirement for a DBA to install Visual Studio unless they intend to use its features for code review/debugging purposes.
DBAs should take a copy of the source code and place it “in escrow” after code review and recompilation has taken place (using the compilation command file). This protects the code even if it is accidentally deleted from development source systems.
It is also recommended that a source copy be associated with each database that has loaded the assembly. This means that a single copy of the source code might exist in the development code tree and several (one per database) in the DBA tree. After the assembly has been loaded into the database with the CREATE ASSEMBLY command, any supporting files, such as source codes and documentation, can also be loaded by using the ALTER ASSEMBLY command as shown in the following.
ALTER ASSEMBLY LeastSquaresSqlFunction ADD FILE FROM 'D:\Production System\Source Code\Least Squares\LS1.cs';
Note that the ALTER ASSEMBLY command supports the loading of multiple files with a single statement.
Both the DBA and the developer should annotate their source control systems when checking in code in order to provide an audit log of changes.
The WinDiff tool (or Visual SourceSafe Compare) is excellent for comparing source code to identify changes. Reviewing only the differences can shorten the code re-review process when passing patches forward from development to production. For more information about using WinDiff, see the WinDiff topic in MSDN.
When Visual Studio is used to build programs that exploit the SQL Server Database Engine .NET Framework Programming API, the build output pane contains the compiler command line that is being used. This can be cut and pasted into a release command file that the DBA can use to recompile the program.
After source code is protected by professional source control and after developers become used to providing source and compilation command files, release management becomes much more resilient to problems caused by code and schema mismatches. For more information about why this is important, see the section, Beyond SQL Server 2005, later in this paper.
The SQL Server 2005 toolset makes it easy for DBAs and developers to work together within a common source code repository. The SQL Server 2005 Management Studio application can load and save scripts from source control applications such as Microsoft Visual SourceSafe and many others.
DBAs can be given read access to developer source code projects in order to copy releases forward into projects that contain the scripts and source codes that relate to a specific system or release. Another option is for Release Managers to extract the appropriate versions of the source codes and place them in a folder that the DBA can access in order to recompile and load into the shared SQL Servers.
Controlling code promotion is a key role for DBAs. Allowing developers to deploy directly into shared systems will almost inevitably result in assemblies without controlled source code and errors when the code is finally deployed into production. This is either immediately as the recompile picks up the wrong version of source code and creates code that was never tested, or eventually as problems cannot be reproduced in test systems because the production assembly cannot be debugged when the source does not match.
The following sample illustrates a Microsoft® C# program compile command using the freely distributable csc.exe (the C# Compiler, distributed with the .NET Framework).
Csc.exe /noconfig /nowarn:"1701;1702" /warn:4 /define:DEBUG;TRACE /reference:C:\WINDOWS\Microsoft.NET\Framework\v2.0.50215\System.Data.dll /reference:C:\WINDOWS\Microsoft.NET\Framework\v2.0.50215\System.dll /reference:C:\WINDOWS\Microsoft.NET\Framework\v2.0.50215\System.Web. Services.dll /reference:C:\WINDOWS\Microsoft.NET\Framework\v2.0.50215\System.XML.dll /debug+ /optimize- /out:obj\Release\MainframeWebService.dll /target:library IBMSystemCode.cs AssemblyInfo.cs "Web References\com.Mainframe.webservices\Reference.cs"
In the case of a program that invokes a Web service, a further step is required to generate the serialization code. This is illustrated in the following sample. The Sgen program, installed with the .NET Framework 2.0 SDK and found in C:\Program Files\Microsoft Visual Studio 8\SDK\v2.0\Bin, will create an assembly called “MainframeWebService.XmlSerializers.dll” that can be registered in SQL Server 2005 by using the CREATE ASSEMBLY command or user interface.
Sgen.exe /n /f MainframeWebService.dll
Use the following Transact-SQL to register both of the assemblies.
CREATE ASSEMBLY [MainframeWebService] AUTHORIZATION dbo FROM 'C:\MWS\bin\Release\MainframeWebService.dll' WITH permission_set = external access go CREATE ASSEMBLY [MainframeWebService.XmlSerializers] AUTHORIZATION dbo FROM 'C:\MWS\bin\Release\MainframeWebService.xmlserializers.dll' WITH permission_set = safe go
The previous code compiled the code supplied by the developer, generated serialization code, and then loaded both assemblies into SQL Server 2005 for use in CREATE FUNCTION/PROCEDURE etc statements.
The serialization code is loaded after its parent assembly. Reverse the order and the parent will be auto-loaded in stealth mode, requiring an ALTER ASSEMBLY ... WITH VISIBILITY = ON to uncloak it for use in CREATE statements.
By allowing developers to reach maximum productivity with professional tools and database development environments that are populated with a good sample of production data together with controlled code promotion, the DBA can deploy the CLR technology without risk of compromising system stability.
If DBAs install SQL Server 2005 and then start the Windows System Monitor (PERFMON.EXE), they might be disappointed to see only a single SQL Server CLR counter (SqlServer:CLR “CLR Execution”), giving CPU statistics summed across the server.
However, this is misleading because the majority of counters live in the .NET CLR group of counters at the top of the performance object list and can be filtered by the sqlservr process name.
A server running multiple SQL Server instance processes, each with active CLR code, will not be easily distinguished in this list. Thus, it is suggested that these be combined with the instance-specific CLR Execution to identify the active instance by total CPU activity.
There are multiple sets of counters that aid in understanding the health and activity of programs running in the SQL Server hosted run time:
.NET CLR Exceptions – The Exceptions/Sec counter can provide an early warning of problems in application code. If the value is larger than usual (developers can use exceptions for normal functionality, not just for error handling), application failures may be happening and further investigation might include reviewing logs for errors.
.NET CLR Loading – SQL Server isolates code between databases by using an AppDomain, a CLR concept that provides a self-contained run-time environment for programs. This set of counters enables monitoring of the number of AppDomains and the number of assemblies loaded in the system. Too great a number might indicate “procedure cache” like pressure on the system and require action to merge databases to share assemblies and AppDomains.
.NET CLR Memory – Provides detailed information about the three types of CLR heap memory as well as garbage collection. These counters can be used to monitor CLR memory usage and to flag alerts if the memory used gets too large. This might indicate code problems with copying or sizing/volumes of data, thus requiring code redesign to use less memory.
.NET Data Provider for SQL Server – Provides information on the number of connects and disconnects per second. This is useful for capturing activity at the database level (for code that connects back to the SQL Server instead of returning external or processed parameters), because this counter breaks down to the database level (by AppDomain) as shown in the following.
In addition to the System Monitor counters, it is also possible to get a considerable amount of information from the SQL Server Dynamic Management Views (DMV) that present SQL Server operating system (OS) data. Following are some useful sample queries.
-- Loaded Assemblies (run in each database) SELECT sa.[name] , ad.[appdomain_name] , clr.[load_time] FROM sys.dm_clr_loaded_assemblies AS clr INNER JOIN sys.assemblies AS sa ON clr.assembly_id = sa.assembly_id INNER JOIN sys.dm_clr_appdomains AS ad ON clr.appdomain_address = ad.appdomain_address
-- SQL CLR Memory Usage SELECT mo.[type] , sum(mo.pages_allocated_count * mo.page_size_in_bytes/1024) AS N'Current KB' , sum(mo.max_pages_allocated_count * mo.page_size_in_bytes/1024) AS N'Max KB' FROM sys.dm_os_memory_objects AS mo WHERE mo.[type] LIKE '%clr%' GROUP BY mo.[type] ORDER BY mo.[type]
-- SQL CLR Wait Statistics SELECT ws.* FROM sys.dm_os_wait_stats AS ws WHERE ws.wait_type LIKE '%clr%'
-- Requests that are currently in SQL CLR SELECT session_id, request_id, start_time, status, command, database_id, wait_type, wait_time, last_wait_type, wait_resource, cpu_time, total_elapsed_time, nest_level, executing_managed_code FROM sys.dm_exec_requests WHERE executing_managed_code = 1
-- Query performance and time spent in SQL CLR. SELECT (SELECT text FROM sys.dm_exec_sql_text(qs.sql_handle)) AS query_text, qs.* FROM sys.dm_exec_query_stats AS qs WHERE qs.total_clr_time > 0 ORDER BY qs.total_clr_time desc
-- Obtaining CLR Execution performance counter values. SELECT object_name, counter_name, cntr_value, cntr_type FROM sys.dm_os_performance_counters WHERE counter_name LIKE '%CLR%'
SQL Server 2005 and Visual Studio 2005 are tightly integrated. The SQL Server tools address the needs of the DBA and Business Intelligence communities, and the Visual Studio Integrated Development Environment targets the professional developer, whether they are working with client, Web, middle-tier, or database software systems.
This functional separation points to the roles that the DBA and the developer play in the debugging process:
The DBA uses SQL Server Profiler against test SQL Servers to trace database activity.
The developer uses the Visual Studio debugger to trace code execution as well as SQL Server Profiler on their personal SQL Server (developer edition) to trace database activity. The debugger enables end-to-end debugging where execution can be traced from client to Web server, to business object through to Transact-SQL and Database Engine .NET Framework Programming API code running in the SQL Server. This opens up the code and its state and thus enables extremely productive debugging during development.
This combination of tools enables the DBA to focus on the operation of the SQL Server, understanding and optimizing the query workload, as with all previous versions of SQL Server, and provides the developer with a set of tools that enable rapid application development.
The debugging environment for the Database Engine .NET Framework Programming API does not support multiple concurrent debugging. Thus, it is recommended that developers be equipped with their own development SQL Server instances in order to make full use of the Visual Studio debuggers.
This section suggests a few guidelines to aid the DBA in troubleshooting and reviewing code written to the Database Engine .NET Framework Programming API. However, this is not a complete list. It is recommended that development and DBA teams get together to agree on their own standards, with current Transact-SQL standards as a starting point, to ensure agreement on what makes a good program versus a bad program.
Lack of Set-based Operations (Loops within Loops)
Developers who are unfamiliar with SQL Server syntax have been known to replace JOIN clauses and sub SELECT clauses with complex nested cursors. There are times when this is justified, such as when each row of the outer table controls which other tables are required for further processing. This code usually looks like the following:
getdata() foreach (datarow in datatable) getdata() foreach (datarow in datatable2) etc
and is usually replaced with a simple INNER JOIN.
This behavior is easily captured in SQL Server Profiler because the inner loops can generate large numbers of query bursts against the server.
CPU-Intensive Operations in the CLR
Extremely complex logic can be delivered in the .NET Framework languages with greater ease than the rather clumsy Transact-SQL constructs. This means that the SQL Server could soon become an unscalable bottleneck, unlike middle-tier services that can usually be load balanced and scaled out. The cost column in the sys.dm_clr_appdomains table can be used to monitor CLR CPU costs.
If the usage suggestions in earlier sections of this document are observed, this should not be a major issue. Thus, the migration of function to the middle-tier is a fairly trivial coding task.
Instantiating Large Objects in Memory
Transact-SQL in SQL Server 2000 does not offer many ways to create large objects, probably the worst being the ability to pass large IMAGE or N/TEXT parameters into a stored procedure. Most of these spool into the TempDB as @TABLE variables or #TEMP tables. However, this is not so in the SQL Server 2005 .NET Framework run-time environment where ADO.NET dataset and home-grown user collections can place large and potentially unscalable demands on heap memory. You should use the System Monitor counters to monitor memory usage because code flows through from development to production to ensure that poorly written code does not make it through.
Solutions to this problem are similar to those that would be observed in the middle-tier:
Use paging to get subsets of data into memory.
Ensure that data is passed by reference rather than by value.
Move aggregated data around instead of detail-level data.
The Database Engine .NET Framework Programming API supports state of the art error handling from the various supported languages, so there is simply no reason for poor error handling. Any external operation, such as database, file system, or Web service access, or operations that risk overflow and underflow exceptions, should be wrapped in well-structured error handling code.
Errors can be passed back to SQL Server by executing the RAISERROR command with SQLContext.Pipe.ExecuteAndSend(). Note that Transact-SQL error handling is considerably improved in SQL Server 2005, so developers should be able to handle RAISERRORS from their code.
Raising an exception is safer than returning an error code because many developers can forget to check the @@ERROR object after a call to generic code.
Unsafe Code Issues
The DBA should be extremely reluctant to register code as UNSAFE, especially in shared instances, because it allows the developer to call unmanaged code. This in turn can perform unverified tasks. In the other code security modes, SQL Server looks into the assembly that is being registered and ensures that only authorized classes and calls are being made.
Unsafe code and unmanaged code has full access and runs at a high privilege level. Thus, it should be fully code reviewed before deployment. Particular attention should be paid to all parameters to confirm validation and verification and also to any connections back to the SQL Server. This is to ensure that transaction boundaries are being respected so that the system is not left in a partially updated state. Finally the code should be checked to ensure that it does not display error message dialogs, which obviously make no sense on the server.
User-defined Type Issues
User-defined types should be used with caution. In addition to the 8-KB size limitation, they are other limitations:
They are read and written as an atomic object.
The implementation code is schema-bound once in use, so the cost of change to an existing type is potentially high.
Beyond SQL Server 2005
SQL Server 2005 hosts the version of the .NET Framework run-time environment that simultaneously shipped with Visual Studio 2005. It is anticipated that service packs and hotfixes will not change the version. However, this is not a good assumption for future versions of SQL Server. As the capabilities of the .NET Framework evolve, it is likely that future SQL Server releases will switch to make use of these new facilities. With any major version change, there are both new and deprecated features, so a wise DBA should prepare to cope with this future change.
The following table can be used by application developers to test the version of the .NET Framework in use by SQL Server. Note that this query will return an empty string if the CLR is not activated. Generally, the major version is all that will need to be checked, unless the application requires features introduced by minor releases.
-- Will return the version if the .NET Framework has been used SELECT p.[value] FROM sys.dm_clr_properties AS p WHERE p.[name] = N'version' -- Will return the version even if the .NET Framework is unused -- Test the version of the Microsoft .NET Runtime Execution Engine SELECT lm.product_version FROM sys.dm_os_loaded_modules AS lm WHERE lm.[name] LIKE N'%\MSCOREE.DLL'
At the time of publication of this white paper, Windows processes can only host a single version of the .NET Framework run-time environment. Although this is not a certainty for future releases, it is possible that a future version of SQL Server will have the following:
Require code recompilation – This will ensure that the DBA’s copy of source codes are kept in synch with the code loaded into the SQL Server and will permit easier recompilation.
Require code modification – This will be a certainty to take advantage of new API and added options to existing API, and a possibility if existing API/options are deprecated. Management of source and an awareness of API usage will go a long way in allowing accurate impact analysis and change costing.
Require more than one .NET Framework run time to be loaded if “down level” code is detected in the SQL Server. This can lead to better investment protection at the expense of performance, so ensure that the tradeoff is understood.
The DBAs who ensure that their company source code is managed, that developer API use is monitored, and who take the time to ensure that the impact of future changes are understood, will ensure that their systems are protected from unplanned and costly rewrites and performance impacts.
This white paper has explored the SQL Server 2005 .NET Common Language Runtime integration from the perspective of the conservative database administrator.
The sections have provided the following:
Proposed processes and procedures for ensuring safe and successful deployment of this new capability.
A number of catalog and dynamic management view queries aimed at opening up the database for the DBA, with easy access to lists of objects and information about the state of the system.
Suggestions for the toolset required by the DBA and their developers in order to maximize their productivity and drilled into debugging and tracing.
Care has been taken to provide examples of situations where using this new capability may not be appropriate. It is important that the richness of SQL Server 2005 CLR integration be used for the best instead of for every purpose.
The most important lesson to be learned from this paper is that the CLR integration provides a new and powerful toolkit for application development, along with many scenarios where it can add enormous productivity and unlock new options for serious enterprise systems.
Additional SQL Server 2005 Resources
Free Resources on Microsoft.com, MSDN, and TechNet
MSDN White Paper:An Overview of SQL Server 2005 for the Database Developer
MSDN White Paper:Processing XML Showplans Using SQLCLR in SQL Server 2005
MSDN White Paper: Using CLR Integration in SQL Server 2005
MSDN White Paper:XML Support in Microsoft SQL Server 2005
MSDN White Paper:XML Options in Microsoft SQL Server 2005
MSDN White Paper:What's New in FOR XML in Microsoft SQL Server 2005
MSDN White Paper:XML Best Practices for Microsoft SQL Server 2005
MSDN White Paper:Usage Scenarios for SQL Server 2005 Native Web Services
MSDN White Paper:Managed Data Access Inside SQL Server with ADO.NET and SQLCLR
Hosting the .NET Runtime in Microsoft SQL Server on the Association for Computing Machinery (www.ACM.org). To access this article, you need membership in SIGMOD, the ACM, or you can purchase just the article for download.
Service Oriented Database Architecture by David Campbell, also on the Association for Computing Machinery (www.ACM.org). To access this article, you need membership in SIGMOD, the ACM, or you can purchase just the article for download.
For more information:
Microsoft Word file