It appears that Microsoft has made it really hard to discover the underlying details related to SQL IOPS in Azure. I think they are assuming that most of their customers just want to know how close they are to reaching the "% of tier" limit. The problem is that if you are under the limit and your application still seems slow then the "% of tier" is essentially meaningless for any troubleshooting purposes. For that matter, even if you are at 100% you probably need to complement that information with the absolute number of IOPS you are achieving, along with the absolute throughput rate (ie. byte rate for reads/writes).
Given that these hosted SQL databases have service tiers, the Azure limitations we run into are rarely due to hardware limitations but are due to the SQL resource governor. Because the governor is hyper-focused on "allowances" and "percentages-of-tier", that is probably the reason why our performance metrics are also being reported to us as percentages . The percentages are unhelpful, to say the least.
Say, for example, you are attempting to compare your cloud performance to how things worked on-premises. There is virtually no way to make a comparison because "100%" means something totally different in each context (or rather, it means virtually nothing in either context).
The good news is that I was able to open a case with Microsoft and they pointed me to some of the metrics I was looking for. The absolute IOPS and throughput rates are part of the resource governor's DMV. Here is the DMV that they pointed me to: https://learn.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-resource-governor-resource-pools-history-ex-azure-sql-database?view=azuresqldb-current
It is a bit obscure and I probably wouldn't have ever found it on my own. Thank you Microsoft support!
I haven't tested with both of them, but I think this DMV will work properly with SQL database, and SQL elastic pool. We are currently deploying our own solution to SQL elastic pool. Note that SQL managed-instances have DMV's that are far easier to use and provide better information, so a managed-instance customer would probably not use the DMV that I'm referring to here.
Below is a sample query, but I came up with this after playing for only an hour. You may want to tinker with it to suit your own needs. The fields within it are also a bit obscure. The fields are reported at intervals, by resource pool name. The pool names are a bit obsure as well.
select
snapshot_time ,
name as resource_governor_resource_pool_name,
duration_ms,
cap_vcores_used_percent,
avg_data_io_percent,
avg_log_write_percent,
delta_write_io_completed,
delta_write_io_throttled,
delta_write_bytes,
delta_write_io_stall_ms,
delta_write_io_stall_queued_ms,
delta_read_io_completed,
delta_read_io_throttled,
delta_read_bytes,
delta_read_io_stall_ms,
delta_read_io_stall_queued_ms,
delta_io_issue_delay_ms
from sys.dm_resource_governor_resource_pools_history_ex
where name in ('SloSharedPool1', 'internal')
order by snapshot_time desc
Hope this helps anyone else who needs actual IOP metrics rather than "% of tier". I'm excited to say that this Q&A question will probably be the best result in a google search for the DMV, ("dm_resource_governor_resource_pools_history_ex "). Let me know if I'm wrong!