Collect and transport metrics (Preview)

Applies to: yes icon IoT Edge 1.1 yes icon IoT Edge 1.2

You can remotely monitor your IoT Edge fleet using Azure Monitor and built-in metrics integration. To enable this capability on your device, add the metrics-collector module to your deployment and configure it to collect and transport module metrics to Azure Monitor.

Architecture

Metrics monitoring architecture with IoT Hub

Note Description
1 All modules must emit metrics using the Prometheus data model. While built-in metrics enable broad workload visibility by default, custom modules can also be used to emit scenario-specific metrics to enhance the monitoring solution. Learn how to instrument custom modules using open-source libraries in the Add custom metrics article.
2️ The metrics-collector module is a Microsoft-supplied IoT Edge module that collects workload module metrics and transports them off-device. Metrics collection uses a pull model. Collection frequency, endpoints, and filters can be configured to control the data egressed from the module. For more information, see metrics collector configuration section later in this article.
3️ You have two options for sending metrics from the metrics-collector module to the cloud. Option 1 sends the metrics to Log Analytics.1 The collected metrics are ingested into the specified Log Analytics workspace using a fixed, native table called InsightsMetrics. This table's schema is compatible with the Prometheus metrics data model.

This option requires access to the workspace on outbound port 443. The Log Analytics workspace ID and key must be specified as part of the module configuration. To enable in restricted networks, see Enable in restricted network access scenarios later in this article.
4️ Each metric entry contains the ResourceId that was specified as part of module configuration. This association automatically links the metric with the specified resource (for example, IoT Hub). As a result, the curated IoT Edge workbook templates can retrieve metrics by issuing queries against the resource.

This approach also allows multiple IoT hubs to safely share a single Log Analytics workspace as a metrics database.
5️ Option 2 sends the metrics to IoT Hub.1 The collector module can be configured to send the collected metrics as UTF-8 encoded JSON device-to-cloud messages via the edgeHub module. This option unlocks monitoring of locked-down IoT Edge devices that are allowed external access to only the IoT Hub endpoint. It also enables monitoring of child IoT Edge devices in a nested configuration where child devices can only access their parent device.
6️ When metrics are routed via IoT Hub, a (one-time) cloud workflow needs to be set up. The workflow processes messages arriving from the metrics-collector module and sends them to the Log Analytics workspace. The workflow enables the curated visualizations and alerts functionality even for metrics arriving via this optional path. See the Route metrics through IoT Hub section for details on how to set up this cloud workflow.

1 Currently, using option 1 to directly transport metrics to Log Analytics from the IoT Edge device is the simpler path that requires minimal setup. The first option is preferred unless your specific scenario demands the option 2 approach so that the IoT Edge device communicates only with IoT Hub.

Metrics collector module

A Microsoft-supplied metrics-collector module can be added to an IoT Edge deployment to collect module metrics and send them to Azure Monitor. The module code is open-source and available in the IoT Edge GitHub repo.

The metrics-collector module is provided as a multi-arch Docker container image that supports Linux X64, ARM32, ARM64, and Windows X64 (version 1809). It's publicly available at mcr.microsoft.com/azureiotedge-metrics-collector.

It also available in the IoT Edge Module Marketplace.

Metrics collector configuration

All configuration for the metrics-collector is done using environment variables. Minimally, the variables noted in the table below marked as Required need to be specified.

Environment variable name Description
ResourceId Resource ID of the IoT hub that the device communicates with. For more information, see the Resource ID section.

Required

Default value: none
UploadTarget Controls whether metrics are sent directly to Azure Monitor over HTTPS or to IoT Hub as D2C messages. For more information, see upload target.

Can be either AzureMonitor or IoTMessage

Not required

Default value: AzureMonitor
LogAnalyticsWorkspaceId Log Analytics workspace ID.

Required only if UploadTarget is AzureMonitor

Default value: none
LogAnalyticsSharedKey Log Analytics workspace key.

Required only if UploadTarget is AzureMonitor

Default value: none
ScrapeFrequencyInSecs Recurring time interval in seconds at which to collect and transport metrics.

Example: 600

Not required

Default value: 300
MetricsEndpointsCSV Comma-separated list of endpoints to collect Prometheus metrics from. All module endpoints to collect metrics from must appear in this list.

Example: http://edgeAgent:9600/metrics, http://edgeHub:9600/metrics, http://MetricsSpewer:9417/metrics

Not required

Default value: http://edgeHub:9600/metrics, http://edgeAgent:9600/metrics
AllowedMetrics List of metrics to collect, all other metrics will be ignored. Set to an empty string to disable. For more information, see allow and disallow lists.

Example: metricToScrape{quantile=0.99}[endpoint=http://MetricsSpewer:9417/metrics]

Not required

Default value: empty
BlockedMetrics List of metrics to ignore. Overrides AllowedMetrics, so a metric won't be reported if it's included in both lists. For more information, see allow and disallow lists.

Example: metricToIgnore{quantile=0.5}[endpoint=http://VeryNoisyModule:9001/metrics], docker_container_disk_write_bytes

Not required

Default value: empty
CompressForUpload Controls if compression should be used when uploading metrics. Applies to all upload targets.

Example: true

Not required

Default value: true
AzureDomain Specifies the top-level Azure domain to use when ingesting metrics directly to Log Analytics.

Example: azure.us

Not required

Default value: azure.com

Resource ID

The metrics-collector module requires the Azure Resource Manager ID of the IoT hub that the IoT Edge device belongs to. Provide this ID as the value of the ResourceID environment variable.

The resource ID takes the following format:

/subscriptions/<subscription id>/resourceGroups/<resource group name>/providers/Microsoft.Devices/IoTHubs/<iot hub name>

You can find the resource ID in the Properties page of the IoT hub in the Azure portal.

Retrieve resource ID from the IoT Hub properties.

Or, you retrieve the ID with the az resource show command:

az resource show -g <resource group> -n <hub name> --resource-type "Microsoft.Devices/IoTHubs"

Upload target

The UploadTarget configuration option controls whether metrics are sent directly to Azure Monitor or to IoT Hub.

If you set UploadTarget to IoTMessage, then your module metrics are published as IoT messages. These messages are emitted as UTF8-encoded json from the endpoint /messages/modules/<module name>/outputs/metricOutput. The format is as follows:

[{
    "TimeGeneratedUtc": "<time generated>",
    "Name": "<prometheus metric name>",
    "Value": <decimal value>,
    "Label": {
        "<label name>": "<label value>"
    }
}, {
    "TimeGeneratedUtc": "2020-07-28T20:00:43.2770247Z",
    "Name": "docker_container_disk_write_bytes",
    "Value": 0.0,
    "Label": {
        "name": "AzureMonitorForIotEdgeModule"
    }
}]

Allow and disallow lists

The AllowedMetrics and BlockedMetrics configuration options take space- or comma-separated lists of metric selectors. A metric will match the list and be included or excluded if it matches one or more metrics in either list.

Metric selectors use a format similar to a subset of the PromQL query language.

metricToSelect{quantile=0.5,otherLabel=~Re[ge]*|x}[http://VeryNoisyModule:9001/metrics]

Metric selectors consist of three parts:

Metric name (metricToSelect).

  • Wildcards * (any characters) and ? (any single character) can be used in metric names. For example, *CPU would match maxCPU and minCPU but not CPUMaximum. ???CPU would match maxCPU and minCPU but not maximumCPU.
  • This component is required in a metrics selector.

Label-based selectors ({quantile=0.5,otherLabel=~Re[ge]*|x}).

  • Multiple metric values can be included in the curly brackets. The values should be comma-separated.
  • A metric will be matched if at least all labels in the selector are present and also match.
  • Like PromQL, the following matching operators are allowed.
    • = Match labels exactly equal to the provided string (case sensitive).
    • != Match labels not exactly equal to the provided string.
    • =~ Match labels to a provided regex. ex: label=~CPU|Mem|[0-9]*
    • != Match labels that don't fit a provided regex.
    • Regex is fully anchored (A ^ and $ are automatically added to the start and end of each regex)
    • This component is optional in a metrics selector.

Endpoint selector ([http://VeryNoisyModule:9001/metrics]).

  • The URL should exactly match a URL listed in MetricsEndpointsCSV.
  • This component is optional in a metrics selector.

A metric must match all parts of a given selector to be selected. It must match the name and have all the same tags with matching values and come from the given endpoint. For example, mem{quantile=0.5,otherLabel=foobar}[http://VeryNoisyModule:9001/metrics] wouldn't match the selector mem{quantile=0.5,otherLabel=~foo|bar}[http://VeryNoisyModule:9001/metrics]. Multiple selectors should be used to create or-like behavior instead of and-like behavior.

For example, to allow the metric mem from a module module1 whatever tags but only allow the same metric from module2 with the tag agg=p99, the following selector can be added to AllowedMetrics:

mem{}[http://module1:9001/metrics] mem{agg="p99"}[http://module2:9001/metrics]

Or, to allow the metrics mem and cpu whatever the tags or endpoint, add the following to AllowedMetrics:

mem cpu

Enable in restricted network access scenarios

If you're sending metrics directly to the Log Analytics workspace, allow outbound access to the following URLs:

  • https://<LOG_ANALYTICS_WORKSPACE_ID>.ods.opinsights.azure.com/*
  • https://<LOG_ANALYTICS_WORKSPACE_ID>.oms.opinsights.azure.com/*

Proxy considerations

The metrics-collector module is written in .NET Core. So use the same guidance as for system modules to allow communication through a proxy server.

Metrics collection from local modules uses http protocol. Exclude local communication from going through the proxy server by setting the NO_PROXY environment variable.

Set NO_PROXY value to a comma-separated list of hostnames that should be excluded. Use module names for hostnames. For example: edgeHub,edgeAgent,myCustomModule.

Route metrics

Sometimes it's necessary to ingest metrics though IoT Hub instead of sending them directly to Log Analytics. For example, when monitoring IoT Edge devices in a nested configuration where child devices have access only to the IoT Edge hub of their parent device. Another example is when deploying an IoT Edge device with outbound network access only to IoT Hub.

To enable monitoring in this scenario, the metrics-collector module can be configured to send metrics as device-to-cloud (D2C) messages via the edgeHub module. The capability can be turned on by setting the UploadTarget environment variable to IoTMessage in the collector configuration.

Tip

Remember to add an edgeHub route to deliver metrics messages from the collector module to IoT Hub. It looks like FROM /messages/modules/replace-with-collector-module-name/* INTO $upstream.

This option does require extra setup to deliver metrics messages arriving at IoT Hub to the Log Analytics workspace. Without this set up, the other portions of the integration such as curated visualizations and alerts won't work.

Note

Be aware of additional costs with this option. Metrics messages will count against your IoT Hub message quota. You will also be charged for Log Analytics ingestion and cloud workflow resources.

Sample cloud workflow

A cloud workflow that delivers metrics messages from IoT Hub to Log Analytics is available as part of the IoT Edge logging and monitoring sample. The sample can be deployed on to existing cloud resources or serve as a production deployment reference.

Next steps

Explore the types of curated visualizations that Azure Monitor enables.