How to create a resource pool for machine learning in SQL Server
This article describes how you can create and use a resource pool specifically for managing R and Python machine learning workloads in SQL Server. It assumes that you have already installed and enabled the machine learning features, and want to reconfigure the instance to support more fine-grained management of the resources used by external process such as R or Python.
The process includes multiple steps:
- Review status of any existing resource pools. It is important that you understand what services are using existing resources.
- Modify server resource pools.
- Create a new resource pool for external processes.
- Create a classification function to identify external script requests.
- Verify that the new external resource pool is capturing R or Python jobs from the specified clients or accounts.
Review the status of existing resource pools
Use a statement such as the following to check the resources allocated to the default pool for the server.
SELECT * FROM sys.resource_governor_resource_pools WHERE name = 'default'
pool_id name min_cpu_percent max_cpu_percent min_memory_percent max_memory_percent cap_cpu_percent min_iops_per_volume max_iops_per_volume 2 default 0 100 0 100 100 0 0
Check the resources allocated to the default external resource pool.
SELECT * FROM sys.resource_governor_external_resource_pools WHERE name = 'default'
external_pool_id name max_cpu_percent max_memory_percent max_processes version 2 default 100 20 0 2
Under these server default settings, the external runtime will probably have insufficient resources to complete most tasks. To change this, you must modify the server resource usage as follows:
Reduce the maximum computer memory that can be used by the database engine.
Increase the maximum computer memory that can be used by the external process.
Modify server resource usage
In Management Studio, run the following statement to limit SQL Server memory usage to 60% of the value in the 'max server memory' setting.
ALTER RESOURCE POOL "default" WITH (max_memory_percent = 60);
Similarly, run the following statement to limit the use of memory by external processes to 40% of total computer resources.
ALTER EXTERNAL RESOURCE POOL "default" WITH (max_memory_percent = 40);
To enforce these changes, you must reconfigure and restart Resource Governor as follows:
ALTER RESOURCE GOVERNOR RECONFIGURE;
These are just suggested settings to start with; you should evaluate your machine learning tasks in light of other server processes to determine the correct balance for your environment and workload.
Create a user-defined external resource pool
Any changes to the configuration of Resource Governor are enforced across the server as a whole and affect workloads that use the default pools for the server, as well as workloads that use the external pools.
Therefore, to provide more fine-grained control over which workloads should have precedence, you can create a new user-defined external resource pool. You should also define a classification function and assign it to the external resource pool. The EXTERNAL keyword is new.
Begin by creating a new user-defined external resource pool. In the following example, the pool is named ds_ep.
CREATE EXTERNAL RESOURCE POOL ds_ep WITH (max_memory_percent = 40);
Create a workload group named
ds_wgto use in managing session requests. For SQL queries you'll use the default pool; for all external process queries will use the
CREATE WORKLOAD GROUP ds_wg WITH (importance = medium) USING "default", EXTERNAL "ds_ep";
Requests are assigned to the default group whenever the request cannot be classified, or if there is any other classification failure.
Create a classification function for machine learning
A classification function examines incoming tasks and determines whether the task is one that can be run using the current resource pool. Tasks that do not meet the criteria of the classification function are assigned back to the server's default resource pool.
Begin by specifying that a classifier function should be used by Resource Governor to determine resource pools. You can assign a null as a placeholder for the classifier function.
ALTER RESOURCE GOVERNOR WITH (classifier_function = NULL); ALTER RESOURCE GOVERNOR RECONFIGURE;
For more information, see ALTER RESOURCE GOVERNOR (Transact-SQL).
In the classifier function for each resource pool, define the type of statements or incoming requests that should be assigned to the resource pool.
For example, the following function returns the name of the schema assigned to the user-defined external resource pool if the application that sent the request is either 'Microsoft R Host' or 'RStudio'; otherwise it returns the default resource pool.
USE master GO CREATE FUNCTION is_ds_apps() RETURNS sysname WITH schemabinding AS BEGIN IF program_name() in ('Microsoft R Host', 'RStudio') RETURN 'ds_wg'; RETURN 'default' END; GO
When the function has been created, reconfigure the resource group to assign the new classifier function to the external resource group that you defined earlier.
ALTER RESOURCE GOVERNOR WITH (classifier_function = dbo.is_ds_apps); ALTER RESOURCE GOVERNOR WITH reconfigure; GO
Verify new resource pools and affinity
To verify that the changes have been made, you should check the configuration of server memory and CPU for each of the workload groups associated with these instance resource pools:
- the default pool for the SQL Server server
- the default resource pool for external processes
- the user-defined pool for external processes
Run the following statement to view all workload groups:
SELECT * FROM sys.resource_governor_workload_groups;
group_id name importance request_max_memory_grant_percent request_max_cpu_time_sec request_memory_grant_timeout_sec max_dop group_max_requests pool_id pool_idd external_pool_id 1 internal Medium 25 0 0 0 0 1 2 2 default Medium 25 0 0 0 0 2 2 256 ds_wg Medium 25 0 0 0 0 2 256
Use the new catalog view, sys.resource_governor_external_resource_pools (Transact-SQL), to view all external resource pools.
SELECT * FROM sys.resource_governor_external_resource_pools;
external_pool_id name max_cpu_percent max_memory_percent max_processes version 2 default 100 20 0 2 256 ds_ep 100 40 0 1
For more information, see Resource Governor Catalog Views (Transact-SQL).
Run the following statement to return information about the computer resources that are affinitized to the external resource pool, if applicable:
SELECT * FROM sys.resource_governor_external_resource_pool_affinity;
In this case, because the pools were created with an affinity of AUTO, no information is displayed. For more information, see sys.dm_resource_governor_resource_pool_affinity (Transact-SQL).
For more information about managing server resources, see:
For an overview of resource governance for machine learning, see:
We'd love to hear your thoughts. Choose the type you'd like to provide:
Our feedback system is built on GitHub Issues. Read more on our blog.