Troubleshooting message routing
This article provides monitoring and troubleshooting guidance for common issues and resolution for IoT Hub message routing.
Monitoring message routing
IoT Hub metrics lists all metrics that are enabled by default for your IoT Hub. We recommend you monitor metrics related to message routing and endpoints to give you an overview of the messages sent. Also turn on diagnostic logs in Azure Monitor diagnostic settings, to track operations for routes. These diagnostic logs can be sent to Azure Monitor logs, Event Hubs, or Azure Storage for custom processing. Learn how to set up and use metrics and diagnostic logs with an IoT Hub.
We also recommend enabling the fallback route if you want to maintain messages that don't match the query on any of the routes. These can be retained in the built-in endpoint for the amount of retention days configured.
The following are the most common issues observed with message routing. To start troubleshooting, click on the issue for detailed steps.
- Messages from my devices are not being routed as expected
- I suddenly stopped getting messages at the built-in Event Hubs endpoint
Messages from my devices are not being routed as expected
To troubleshoot this issue, analyze the following.
The routing metrics for this endpoint
All the IoT Hub metrics related to routing are prefixed with Routing. You can combine information from multiple metrics to identify root cause for issues. For example, use metric Routing Delivery Attempts to identify the number of messages that were delivered to an endpoint or dropped when they didn't match queries on any of the routes and fallback route was disabled. Check the Routing Latency metric to observe whether latency for message delivery is steady or increasing. A growing latency can indicate a problem with a specific endpoint and we recommend checking the health of the endpoint. These routing metrics also have dimensions that provide details on the metric like the endpoint type, specific endpoint name and a reason why the message was not delivered.
The diagnostic logs for any operational issues
Observe the routes diagnostic logs to get more information on the routing and endpoint operations or identify errors and relevant error code to understand the issue further. For example, the operation name RouteEvaluationError in the log indicates the route could not be evaluated because of an issue with the message format. Use the tips provided for the specific operation names to mitigate the issue. When an event is logged as an error, the log will also provide more information on why the evaluation failed. For example, if the operation name is EndpointUnhealthy, an Error codes of 403004 indicates the endpoint ran out of space.
The health of the endpoint
Use the REST API Get Endpoint Health to get health status of the endpoints. The Get Endpoint Health API also provides information on the last time a message was successfully sent to the endpoint, the last known error, last known error time and the last time a send attempt was made for this endpoint. Use the possible mitigation provided for the specific last known error.
I suddenly stopped getting messages at the built-in endpoint
To troubleshoot this issue, analyze the following.
Was a new route created?
Once a route is created, data stops flowing to the built-in-endpoint, unless a route is created to that endpoint. To ensure messages continues to flow to the built-in-endpoint if a new route is added, configure a route to the events endpoint.
Was the Fallback route disabled?
The fallback route sends all the messages that don't satisfy query conditions on any of the existing routes to the built-in-Event Hubs (messages/events), that is compatible with Event Hubs. If message routing is turned on, you can enable the fallback route capability. If there are no routes to the built-in-endpoint and a fallback route is enabled, only messages that don't match any query conditions on routes will be sent to the built-in-endpoint. Also, if all existing routes are deleted, fallback route must be enabled to receive all data at the built-in-endpoint.
You can enable/disable the fallback route in the Azure portal->Message Routing blade. You can also use Azure Resource Manager for FallbackRouteProperties to use a custom endpoint for fallback route.
Last known errors for IoT Hub routing endpoints
Get Endpoint Health in the REST API gives the health status of the endpoints, as well as the last known error, to identify the reason an endpoint is not healthy. The table below lists the most common errors.
|Last Known Error||Description/when it occurs||Possible Mitigation|
|Transient||A transient error has occurred and IoT Hub will retry the operation.||Observe routes diagnostic logs.|
|InternalError||An error occurred while delivering a message to an endpoint.||This is an internal exception but also observe the routes diagnostic logs.|
|Unauthorized||IoT Hub is not authorized to send messages to the specified endpoint.||Validate that the connection string is up to date for the endpoint. If it has changed, consider an update on your IoT Hub. If the endpoint uses managed identity, check that the IoT Hub principal has the required permissions on the target.|
|Throttled||IoT Hub is being throttled while writing messages into the endpoint.||Review the throttle limits for the affected endpoint. Modify configurations for the endpoint to scale up if needed.|
|Timeout||Operation timeout.||Retry the operation.|
|Not Found||Target resource does not exist.||Ensure that the target resource exists.|
|Container Not Found||Storage container does not exist.||Ensure the storage container exists.|
|Container disabled||Storage container is disabled.||Ensure the storage container is enabled.|
|MaxMessageSizeExceeded||Message routing has a message size limit of 256Kb.The message size being routed exceeded this limit.||Check if message size can be reduced by using fewer application properties or fewer message enrichments.|
|PartitioningAndDuplicateDetectionNotSupported||Service bus may not have duplicate detection enabled.||Disable duplicate detection from Service Bus or consider using an entity without duplicate detection.|
|SessionfulEntityNotSupported||Service bus may not have sessions enabled.||Disable session from Service Bus or consider using an entity without sessions.|
|NoMatchingSubscriptionsForMessage||There is no subscription to write message on the service bus topic.||Create a subscription for IoT Hub messages to be routed to.|
|EndpointExternallyDisabled||Endpoint is not in an active state so IoT Hub can send messages to it.||Enable the endpoint to bring it back to active state.|
|DeviceMaximumQueueDepthExceeded||Service bus size limit has been reached.||Consider removing messages from the target Event Hubs to allow new messages to be ingested into the Event Hubs.|
Routes diagnostic logs
The following are the operation names and error codes logged in the diagnostic logs.
|UndefinedRouteEvaluation||Information||The message cannot be evaluated with a giving condition. For example, if a property in the route query condition is absent in the message. Learn more about routing query syntax.|
|RouteEvaluationError||Error||There was an error evaluating the message because of an issue with the message format. For example, this error will be logged if the content encoding not specified or Content type not valid in the message. These must be set in the system properties.|
|DroppedMessage||Error||Message was dropped and not routed. This could be due to reasons like message didn't match any routing query or endpoint was dead and message could not be delivered after several retries. We recommend getting more details on the endpoint by using the REST API get endpoint health.|
|EndpointUnhealthy||Error||Endpoint has not been accepting messages from IoT Hub and IoT Hub is trying to resend the messages. We recommend observing the last known error via the REST API get endpoint health.|
|EndpointDead||Error||Endpoint has not been accepting messages from IoT Hub for over an hour. We recommend observing the last known error via the REST API get endpoint health.|
|EndpointHealthy||Information||Endpoint is healthy and receiving messages from IoT Hub. This message is not logged continuously, but logged only when the endpoint becomes healthy again. This message means IoT Hub was unable to send messages to the endpoint, but the endpoint is now healthy.|
|OrphanedMessage||Information||The message does not match to any route.|
|InvalidMessage||Error||Message is invalid because of incompatibility with the endpoint. We recommend check configurations of the endpoint.|
The operations UndefinedRouteEvaluation, RouteEvaluationError and OrphanedMessage are throttled and logged no more than once a minute per IoT Hub.
Common error codes
|401002||Iot Hub Unauthorized Access|
|413001||Message too large|
|403004||Device maximum queue depth exceeded|
|503008||Receive link throttled|
|500000||Generic Server error|
|400103||Invalid Content Encoding Or Content Type|
|404001||Device Not found|
If you need more help, you can contact the Azure experts on the MSDN Azure and Stack Overflow forums. Alternatively, you can file an Azure support incident. Go to the Azure support site and select Get Support.