How do I get performance metrics for Azure SQL DB?

Question

Is Microsoft deliberately hiding useful metrics for monitoring the performance of SQL? In the metrics section of the Azure portal, I can only find very vague and unhelpful numbers like "% data IO" and "% log IO". These are way too abstract and are essentially meaningless for our purposes.

I'd like to see the throughput rates in bytes/sec. And the number of IOPS. And the latencies on those IOPS.

Any help would be greatly appreciated. I am using single databases (provisioned) with vCore licensing. I realize that there is resource governance going on that is limiting my performance in terms of memory and I/O. However, without being able to review the performance characteristics of the currently provisioned database, it is impossible to determine how much of an upgrade we need to get to our desired goals.

I have read a number of articles, but none of them are helpful in telling me how to get to the throughput, IOPS, or latencies.
Eg.
https://azure.microsoft.com/en-us/blog/resource-governance-in-azure-sql-database/
https://learn.microsoft.com/en-us/azure/azure-sql/database/monitor-tune-overview

Any help would be appreciated.

Accepted Answer

It appears that Microsoft has made it really hard to discover the underlying details related to SQL IOPS in Azure. I think they are assuming that most of their customers just want to know how close they are to reaching the "% of tier" limit. The problem is that if you are under the limit and your application still seems slow then the "% of tier" is essentially meaningless for any troubleshooting purposes. For that matter, even if you are at 100% you probably need to complement that information with the absolute number of IOPS you are achieving, along with the absolute throughput rate (ie. byte rate for reads/writes).

Given that these hosted SQL databases have service tiers, the Azure limitations we run into are rarely due to hardware limitations but are due to the SQL resource governor. Because the governor is hyper-focused on "allowances" and "percentages-of-tier", that is probably the reason why our performance metrics are also being reported to us as percentages . The percentages are unhelpful, to say the least.

Say, for example, you are attempting to compare your cloud performance to how things worked on-premises. There is virtually no way to make a comparison because "100%" means something totally different in each context (or rather, it means virtually nothing in either context).

The good news is that I was able to open a case with Microsoft and they pointed me to some of the metrics I was looking for. The absolute IOPS and throughput rates are part of the resource governor's DMV. Here is the DMV that they pointed me to: https://learn.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-resource-governor-resource-pools-history-ex-azure-sql-database?view=azuresqldb-current

It is a bit obscure and I probably wouldn't have ever found it on my own. Thank you Microsoft support!

I haven't tested with both of them, but I think this DMV will work properly with SQL database, and SQL elastic pool. We are currently deploying our own solution to SQL elastic pool. Note that SQL managed-instances have DMV's that are far easier to use and provide better information, so a managed-instance customer would probably not use the DMV that I'm referring to here.

Below is a sample query, but I came up with this after playing for only an hour. You may want to tinker with it to suit your own needs. The fields within it are also a bit obscure. The fields are reported at intervals, by resource pool name. The pool names are a bit obsure as well.

select   
	snapshot_time ,   
	name as resource_governor_resource_pool_name,  
	  
	duration_ms,  
	   
	cap_vcores_used_percent,  
	avg_data_io_percent,    
  
  
	avg_log_write_percent,  
  
	delta_write_io_completed,  
	delta_write_io_throttled,  
	delta_write_bytes,  
	delta_write_io_stall_ms,  
	delta_write_io_stall_queued_ms,  
  
  
	delta_read_io_completed,  
	delta_read_io_throttled,  
	delta_read_bytes,   
	delta_read_io_stall_ms,  
	delta_read_io_stall_queued_ms,  
  
  
  
	delta_io_issue_delay_ms  
	   
    from sys.dm_resource_governor_resource_pools_history_ex   
	  
	where name in ('SloSharedPool1', 'internal')  
	order by snapshot_time desc

Hope this helps anyone else who needs actual IOP metrics rather than "% of tier". I'm excited to say that this Q&A question will probably be the best result in a google search for the DMV, ("dm_resource_governor_resource_pools_history_ex "). Let me know if I'm wrong!

Answer

Good day,

The following document provides all you need regarding monitoring the Azure SQL Database:

https://learn.microsoft.com/en-us/azure/azure-sql/database/monitor-tune-overview

Answer

It appears that Azure SQL users who run "manage instance" may have the IO metrics. It looks like they can see the actual IOPS and throughput numbers for their databases (rather than the meaningless "percent of tier").

See this DMV for example:
https://learn.microsoft.com/en-us/sql/relational-databases/system-catalog-views/sys-server-resource-stats-azure-sql-database?view=azuresqldb-current

io_request
io_byte_read
io_bytes_written

Is there some equivalent for the other Azure SQL databases? A percentage is essentially a ratio that is calculated from two other numbers. Surely Microsoft understands how silly it is to give us their "IO percentage" without its two underlying metrics. It almost seems like there is a deliberate effort to hide the details. Why would IOPS and throughput be provided for "managed instance" and not for Azure SQL Database? Is there a way to back into the underlying metrics? I think there is additional detail that is available for individual queries, but I'm more interested in being able to monitor the database workload as a whole.

How do I get performance metrics for Azure SQL DB?

2 additional answers