This article describes how to use Dynamic Management Views (DMVs) to monitor your workload and investigate query execution in Azure SQL Data Warehouse.
To query the DMVs in this article, you need either VIEW DATABASE STATE or CONTROL permission. Usually granting VIEW DATABASE STATE is the preferred permission as it is much more restrictive.
GRANT VIEW DATABASE STATE TO myuser;
All logins to SQL Data Warehouse are logged to sys.dm_pdw_exec_sessions. This DMV contains the last 10,000 logins. The session_id is the primary key and is assigned sequentially for each new logon.
-- Other Active Connections SELECT * FROM sys.dm_pdw_exec_sessions where status <> 'Closed' and session_id <> session_id();
Monitor query execution
All queries executed on SQL Data Warehouse are logged to sys.dm_pdw_exec_requests. This DMV contains the last 10,000 queries executed. The request_id uniquely identifies each query and is the primary key for this DMV. The request_id is assigned sequentially for each new query and is prefixed with QID, which stands for query ID. Querying this DMV for a given session_id shows all queries for a given logon.
Stored procedures use multiple Request IDs. Request IDs are assigned in sequential order.
Here are steps to follow to investigate query execution plans and times for a particular query.
STEP 1: Identify the query you wish to investigate
-- Monitor active queries SELECT * FROM sys.dm_pdw_exec_requests WHERE status not in ('Completed','Failed','Cancelled') AND session_id <> session_id() ORDER BY submit_time DESC; -- Find top 10 queries longest running queries SELECT TOP 10 * FROM sys.dm_pdw_exec_requests ORDER BY total_elapsed_time DESC; -- Find a query with the Label 'My Query' -- Use brackets when querying the label column, as it it a key word SELECT * FROM sys.dm_pdw_exec_requests WHERE [label] = 'My Query';
From the preceding query results, note the Request ID of the query that you would like to investigate.
Queries in the Suspended state are being queued due to concurrency limits. These queries also appear in the sys.dm_pdw_waits waits query with a type of UserConcurrencyResourceType. See Concurrency and workload management for more details on concurrency limits. Queries can also wait for other reasons such as for object locks. If your query is waiting for a resource, see Investigating queries waiting for resources further down in this article.
To simplify the lookup of a query in the sys.dm_pdw_exec_requests table, use LABEL to assign a comment to your query that can be looked up in the sys.dm_pdw_exec_requests view.
-- Query with Label SELECT * FROM sys.tables OPTION (LABEL = 'My Query') ;
STEP 2: Investigate the query plan
Use the Request ID to retrieve the query's distributed SQL (DSQL) plan from sys.dm_pdw_request_steps.
-- Find the distributed query plan steps for a specific query. -- Replace request_id with value from Step 1. SELECT * FROM sys.dm_pdw_request_steps WHERE request_id = 'QID####' ORDER BY step_index;
When a DSQL plan is taking longer than expected, the cause can be a complex plan with many DSQL steps or just one step taking a long time. If the plan is many steps with several move operations, consider optimizing your table distributions to reduce data movement. The Table distribution article explains why data must be moved to solve a query and explains some distribution strategies to minimize data movement.
To investigate further details about a single step, the operation_type column of the long-running query step and note the Step Index:
- Proceed with Step 3a for SQL operations: OnOperation, RemoteOperation, ReturnOperation.
- Proceed with Step 3b for Data Movement operations: ShuffleMoveOperation, BroadcastMoveOperation, TrimMoveOperation, PartitionMoveOperation, MoveOperation, CopyOperation.
STEP 3a: Investigate SQL on the distributed databases
Use the Request ID and the Step Index to retrieve details from sys.dm_pdw_sql_requests, which contains execution information of the query step on all of the distributed databases.
-- Find the distribution run times for a SQL step. -- Replace request_id and step_index with values from Step 1 and 3. SELECT * FROM sys.dm_pdw_sql_requests WHERE request_id = 'QID####' AND step_index = 2;
When the query step is running, DBCC PDW_SHOWEXECUTIONPLAN can be used to retrieve the SQL Server estimated plan from the SQL Server plan cache for the step running on a particular distribution.
-- Find the SQL Server execution plan for a query running on a specific SQL Data Warehouse Compute or Control node. -- Replace distribution_id and spid with values from previous query. DBCC PDW_SHOWEXECUTIONPLAN(1, 78);
STEP 3b: Investigate data movement on the distributed databases
Use the Request ID and the Step Index to retrieve information about a data movement step running on each distribution from sys.dm_pdw_dms_workers.
-- Find the information about all the workers completing a Data Movement Step. -- Replace request_id and step_index with values from Step 1 and 3. SELECT * FROM sys.dm_pdw_dms_workers WHERE request_id = 'QID####' AND step_index = 2;
- Check the total_elapsed_time column to see if a particular distribution is taking significantly longer than others for data movement.
- For the long-running distribution, check the rows_processed column to see if the number of rows being moved from that distribution is significantly larger than others. If so, this may indicate skew of your underlying data.
If the query is running, DBCC PDW_SHOWEXECUTIONPLAN can be used to retrieve the SQL Server estimated plan from the SQL Server plan cache for the currently running SQL Step within a particular distribution.
-- Find the SQL Server estimated plan for a query running on a specific SQL Data Warehouse Compute or Control node. -- Replace distribution_id and spid with values from previous query. DBCC PDW_SHOWEXECUTIONPLAN(55, 238);
Monitor waiting queries
If you discover that your query is not making progress because it is waiting for a resource, here is a query that shows all the resources a query is waiting for.
-- Find queries -- Replace request_id with value from Step 1. SELECT waits.session_id, waits.request_id, requests.command, requests.status, requests.start_time, waits.type, waits.state, waits.object_type, waits.object_name FROM sys.dm_pdw_waits waits JOIN sys.dm_pdw_exec_requests requests ON waits.request_id=requests.request_id WHERE waits.request_id = 'QID####' ORDER BY waits.object_name, waits.object_type, waits.state;
If the query is actively waiting on resources from another query, then the state will be AcquireResources. If the query has all the required resources, then the state will be Granted.
High tempdb utilization can be the root cause for slow performance and out of memory issues. Please first check if you have data skew or poor quality rowgroups and take the appropriate actions. Consider scaling your data warehouse if you find tempdb reaching its limits during query execution. The following describes how to identify tempdb usage per query on each node.
Create the following view to associate the appropriate node id for sys.dm_pdw_sql_requests. This will enable you to leverage other pass-through DMVs and join those tables with sys.dm_pdw_sql_requests.
-- sys.dm_pdw_sql_requests with the correct node id CREATE VIEW sql_requests AS (SELECT sr.request_id, sr.step_index, (CASE WHEN (sr.distribution_id = -1 ) THEN (SELECT pdw_node_id FROM sys.dm_pdw_nodes WHERE type = 'CONTROL') ELSE d.pdw_node_id END) AS pdw_node_id, sr.distribution_id, sr.status, sr.error_id, sr.start_time, sr.end_time, sr.total_elapsed_time, sr.row_count, sr.spid, sr.command FROM sys.pdw_distributions AS d RIGHT JOIN sys.dm_pdw_sql_requests AS sr ON d.distribution_id = sr.distribution_id)
Run the following query to monitor tempdb:
-- Monitor tempdb SELECT sr.request_id, ssu.session_id, ssu.pdw_node_id, sr.command, sr.total_elapsed_time, es.login_name AS 'LoginName', DB_NAME(ssu.database_id) AS 'DatabaseName', (es.memory_usage * 8) AS 'MemoryUsage (in KB)', (ssu.user_objects_alloc_page_count * 8) AS 'Space Allocated For User Objects (in KB)', (ssu.user_objects_dealloc_page_count * 8) AS 'Space Deallocated For User Objects (in KB)', (ssu.internal_objects_alloc_page_count * 8) AS 'Space Allocated For Internal Objects (in KB)', (ssu.internal_objects_dealloc_page_count * 8) AS 'Space Deallocated For Internal Objects (in KB)', CASE es.is_user_process WHEN 1 THEN 'User Session' WHEN 0 THEN 'System Session' END AS 'SessionType', es.row_count AS 'RowCount' FROM sys.dm_pdw_nodes_db_session_space_usage AS ssu INNER JOIN sys.dm_pdw_nodes_exec_sessions AS es ON ssu.session_id = es.session_id AND ssu.pdw_node_id = es.pdw_node_id INNER JOIN sys.dm_pdw_nodes_exec_connections AS er ON ssu.session_id = er.session_id AND ssu.pdw_node_id = er.pdw_node_id INNER JOIN sql_requests AS sr ON ssu.session_id = sr.spid AND ssu.pdw_node_id = sr.pdw_node_id WHERE DB_NAME(ssu.database_id) = 'tempdb' AND es.session_id <> @@SPID AND es.login_name <> 'sa' ORDER BY sr.request_id;
Memory can be the root cause for slow performance and out of memory issues. Please first check if you have data skew or poor quality rowgroups and take the appropriate actions. Consider scaling your data warehouse if you find SQL Server memory usage reaching its limits during query execution.
The following query returns SQL Server memory usage and memory pressure per node:
-- Memory consumption SELECT pc1.cntr_value as Curr_Mem_KB, pc1.cntr_value/1024.0 as Curr_Mem_MB, (pc1.cntr_value/1048576.0) as Curr_Mem_GB, pc2.cntr_value as Max_Mem_KB, pc2.cntr_value/1024.0 as Max_Mem_MB, (pc2.cntr_value/1048576.0) as Max_Mem_GB, pc1.cntr_value * 100.0/pc2.cntr_value AS Memory_Utilization_Percentage, pc1.pdw_node_id FROM -- pc1: current memory sys.dm_pdw_nodes_os_performance_counters AS pc1 -- pc2: total memory allowed for this SQL instance JOIN sys.dm_pdw_nodes_os_performance_counters AS pc2 ON pc1.object_name = pc2.object_name AND pc1.pdw_node_id = pc2.pdw_node_id WHERE pc1.counter_name = 'Total Server Memory (KB)' AND pc2.counter_name = 'Target Server Memory (KB)'
Monitor transaction log size
The following query returns the transaction log size on each distribution. Please check if you have data skew or poor quality rowgroups and take the appropriate actions. If one of the log files is reaching 160GB, you should consider scaling up your instance or limiting your transaction size.
-- Transaction log size SELECT instance_name as distribution_db, cntr_value*1.0/1048576 as log_file_size_used_GB, pdw_node_id FROM sys.dm_pdw_nodes_os_performance_counters WHERE instance_name like 'Distribution_%' AND counter_name = 'Log File(s) Used Size (KB)' AND counter_name = 'Target Server Memory (KB)'
Monitor transaction log rollback
If your queries are failing or taking a long time to proceed, you can check and monitor if you have any transactions rolling back.
-- Monitor rollback SELECT SUM(CASE WHEN t.database_transaction_next_undo_lsn IS NOT NULL THEN 1 ELSE 0 END), t.pdw_node_id, nod.[type] FROM sys.dm_pdw_nodes_tran_database_transactions t JOIN sys.dm_pdw_nodes nod ON t.pdw_node_id = nod.pdw_node_id GROUP BY t.pdw_node_id, nod.[type]