Monitoring resource utilization and query activity in Azure Synapse Analytics
Azure Synapse Analytics provides a rich monitoring experience within the Azure portal to surface insights regarding your data warehouse workload. The Azure portal is the recommended tool when monitoring your data warehouse as it provides configurable retention periods, alerts, recommendations, and customizable charts and dashboards for metrics and logs. The portal also enables you to integrate with other Azure monitoring services such as Azure Monitor (logs) with Log analytics to provide a holistic monitoring experience for not only your data warehouse but also your entire Azure analytics platform for an integrated monitoring experience. This documentation describes what monitoring capabilities are available to optimize and manage your analytics platform with Synapse SQL.
The following metrics are available in the Azure portal for Synapse SQL. These metrics are surfaced through Azure Monitor.
|Metric Name||Description||Aggregation Type|
|CPU percentage||CPU utilization across all nodes for the data warehouse||Avg, Min, Max|
|Data IO percentage||IO Utilization across all nodes for the data warehouse||Avg, Min, Max|
|Memory percentage||Memory utilization (SQL Server) across all nodes for the data warehouse||Avg, Min, Max|
|Active Queries||Number of active queries executing on the system||Sum|
|Queued Queries||Number of queued queries waiting to start executing||Sum|
|Successful Connections||Number of successful connections (logins) against the database||Sum, Count|
|Failed Connections||Number of failed connections (logins) against the database||Sum, Count|
|Blocked by Firewall||Number of logins to the data warehouse which was blocked||Sum, Count|
|DWU limit||Service level objective of the data warehouse||Avg, Min, Max|
|DWU percentage||Maximum between CPU percentage and Data IO percentage||Avg, Min, Max|
|DWU used||DWU limit * DWU percentage||Avg, Min, Max|
|Cache hit percentage||(cache hits / cache miss) * 100 where cache hits is the sum of all columnstore segments hits in the local SSD cache and cache miss is the columnstore segments misses in the local SSD cache summed across all nodes||Avg, Min, Max|
|Cache used percentage||(cache used / cache capacity) * 100 where cache used is the sum of all bytes in the local SSD cache across all nodes and cache capacity is the sum of the storage capacity of the local SSD cache across all nodes||Avg, Min, Max|
|Local tempdb percentage||Local tempdb utilization across all compute nodes - values are emitted every five minutes||Avg, Min, Max|
Things to consider when viewing metrics and setting alerts:
- DWU used represents only a high-level representation of usage across the SQL pool and is not meant to be a comprehensive indicator of utilization. To determine whether to scale up or down, consider all factors which can be impacted by DWU such as concurrency, memory, tempdb, and adaptive cache capacity. We recommend running your workload at different DWU settings to determine what works best to meet your business objectives.
- Failed and successful connections are reported for a particular data warehouse - not for the server itself.
- Memory percentage reflects utilization even if the data warehouse is in idle state - it does not reflect active workload memory consumption. Use and track this metric along with others (tempdb, gen2 cache) to make a holistic decision on if scaling for additional cache capacity will increase workload performance to meet your requirements.
For a programmatic experience when monitoring Synapse SQL via T-SQL, the service provides a set of Dynamic Management Views (DMVs). These views are useful when actively troubleshooting and identifying performance bottlenecks with your workload.
To view the list of DMVs that apply to Synapse SQL, refer to this documentation.
Metrics and diagnostics logging
Both metrics and logs can be exported to Azure Monitor, specifically the Azure Monitor logs component and can be programmatically accessed through log queries. The log latency for Synapse SQL is about 10-15 minutes. For more details on the factors impacting latency, visit the following documentation.
The following How-to guide describes common scenarios and use cases when monitoring and managing your data warehouse: