Supported metrics for Microsoft.MachineLearningServices/workspaces/onlineEndpoints/deployments

The following table lists the metrics available for the Microsoft.MachineLearningServices/workspaces/onlineEndpoints/deployments resource type.

Table headings

Metric - The metric display name as it appears in the Azure portal.
Name in Rest API - Metric name as referred to in the REST API.
Unit - Unit of measure.
Aggregation - The default aggregation type. Valid values: Average, Minimum, Maximum, Total, Count.
Dimensions - Dimensions available for the metric.
Time Grains - Intervals at which the metric is sampled. For example, PT1M indicates that the metric is sampled every minute, PT30M every 30 minutes, PT1H every hour, and so on.
DS Export- Whether the metric is exportable to Azure Monitor Logs via Diagnostic Settings. For information on exporting metrics, see Create diagnostic settings in Azure Monitor.

For information on metric retention, see Azure Monitor Metrics overview.

Category Metric Name in REST API Unit Aggregation Dimensions Time Grains DS Export
Resource CPU Memory Utilization Percentage

Percentage of memory utilization on an instance. Utilization is reported at one minute intervals.
CpuMemoryUtilizationPercentage Percent Minimum, Maximum, Average instanceId PT1M Yes
Resource CPU Utilization Percentage

Percentage of CPU utilization on an instance. Utilization is reported at one minute intervals.
CpuUtilizationPercentage Percent Minimum, Maximum, Average instanceId PT1M Yes
Resource Data Collection Errors Per Minute

The number of data collection events dropped per minute.
DataCollectionErrorsPerMinute Count Minimum, Maximum, Average instanceId, reason, type PT1M No
Resource Data Collection Events Per Minute

The number of data collection events processed per minute.
DataCollectionEventsPerMinute Count Minimum, Maximum, Average instanceId, type PT1M No
Resource Deployment Capacity

The number of instances in the deployment.
DeploymentCapacity Count Minimum, Maximum, Average instanceId, State PT1M No
Resource Disk Utilization

Percentage of disk utilization on an instance. Utilization is reported at one minute intervals.
DiskUtilization Percent Minimum, Maximum, Average instanceId, disk PT1M Yes
Resource GPU Energy in Joules

Interval energy in Joules on a GPU node. Energy is reported at one minute intervals.
GpuEnergyJoules Count Minimum, Maximum, Average instanceId PT1M No
Resource GPU Memory Utilization Percentage

Percentage of GPU memory utilization on an instance. Utilization is reported at one minute intervals.
GpuMemoryUtilizationPercentage Percent Minimum, Maximum, Average instanceId PT1M Yes
Resource GPU Utilization Percentage

Percentage of GPU utilization on an instance. Utilization is reported at one minute intervals.
GpuUtilizationPercentage Percent Minimum, Maximum, Average instanceId PT1M Yes
Traffic Request Latency P50

The average P50 request latency aggregated by all request latency values collected over the selected time period
RequestLatency_P50 Milliseconds Average <none> PT1M Yes
Traffic Request Latency P90

The average P90 request latency aggregated by all request latency values collected over the selected time period
RequestLatency_P90 Milliseconds Average <none> PT1M Yes
Traffic Request Latency P95

The average P95 request latency aggregated by all request latency values collected over the selected time period
RequestLatency_P95 Milliseconds Average <none> PT1M Yes
Traffic Request Latency P99

The average P99 request latency aggregated by all request latency values collected over the selected time period
RequestLatency_P99 Milliseconds Average <none> PT1M Yes
Traffic Requests Per Minute

The number of requests sent to online deployment within a minute
RequestsPerMinute Count Average envoy_response_code PT1M No