Manage connectivity and reliable messaging by using Azure IoT Hub device SDKs
This article provides high-level guidance to help you design device applications that are more resilient. It shows you how to take advantage of the connectivity and reliable messaging features in Azure IoT device SDKs. The goal of this guide is to help you manage the following scenarios:
Fixing a dropped network connection
Switching between different network connections
Reconnecting because of service transient connection errors
Implementation details may vary by language. For more information, see the API documentation or specific SDK:
Designing for resiliency
IoT devices often rely on non-continuous or unstable network connections (for example, GSM or satellite). Errors can occur when devices interact with cloud-based services because of intermittent service availability and infrastructure-level or transient faults. An application that runs on a device has to manage the mechanisms for connection, re-connection, and the retry logic for sending and receiving messages. Also, the retry strategy requirements depend heavily on the device's IoT scenario, context, capabilities.
The Azure IoT Hub device SDKs aim to simplify connecting and communicating from cloud-to-device and device-to-cloud. These SDKs provide a robust way to connect to Azure IoT Hub and a comprehensive set of options for sending and receiving messages. Developers can also modify existing implementation to customize a better retry strategy for a given scenario.
The relevant SDK features that support connectivity and reliable messaging are covered in the following sections.
Connection and retry
This section gives an overview of the re-connection and retry patterns available when managing connections. It details implementation guidance for using a different retry policy in your device application and lists relevant APIs from the device SDKs.
Connection failures can happen at many levels:
Network errors: disconnected socket and name resolution errors
Protocol-level errors for HTTP, AMQP, and MQTT transport: detached links or expired sessions
Application-level errors that result from either local mistakes: invalid credentials or service behavior (for example, exceeding the quota or throttling)
The device SDKs detect errors at all three levels. OS-related errors and hardware errors are not detected and handled by the device SDKs. The SDK design is based on The Transient Fault Handling Guidance from the Azure Architecture Center.
The following steps describe the retry process when connection errors are detected:
The SDK detects the error and the associated error in the network, protocol, or application.
The SDK uses the error filter to determine the error type and decide if a retry is needed.
If the SDK identifies an unrecoverable error, operations like connection, send, and receive are stopped. The SDK notifies the user. Examples of unrecoverable errors include an authentication error and a bad endpoint error.
If the SDK identifies a recoverable error, it retries according to the specified retry policy until the defined timeout elapses. Note that the SDK uses Exponential back-off with jitter retry policy by default.
When the defined timeout expires, the SDK stops trying to connect or send. It notifies the user.
The SDK allows the user to attach a callback to receive connection status changes.
The SDKs provide three retry policies:
Exponential back-off with jitter: This default retry policy tends to be aggressive at the start and slow down over time until it reaches a maximum delay. The design is based on Retry guidance from Azure Architecture Center.
Custom retry: For some SDK languages, you can design a custom retry policy that is better suited for your scenario and then inject it into the RetryPolicy. Custom retry isn't available on the C SDK, and it is not currently supported on the Python SDK. The Python SDK reconnects as-needed.
No retry: You can set retry policy to "no retry," which disables the retry logic. The SDK tries to connect once and send a message once, assuming the connection is established. This policy is typically used in scenarios with bandwidth or cost concerns. If you choose this option, messages that fail to send are lost and can't be recovered.
Retry policy APIs
|SDK||SetRetryPolicy method||Policy implementations||Implementation guidance|
|C/iOS||IOTHUB_CLIENT_RESULT IoTHubClient_SetRetryPolicy||Default: IOTHUB_CLIENT_RETRY_EXPONENTIAL_BACKOFF
Custom: use available retryPolicy
No retry: IOTHUB_CLIENT_RETRY_NONE
|Java||SetRetryPolicy||Default: ExponentialBackoffWithJitter class
Custom: implement RetryPolicy interface
No retry: NoRetry class
|.NET||DeviceClient.SetRetryPolicy||Default: ExponentialBackoff class
Custom: implement IRetryPolicy interface
No retry: NoRetry class
|Node||setRetryPolicy||Default: ExponentialBackoffWithJitter class||Node implementation|
|Python||Not currently supported||Not currently supported||Not currently supported|
The following code samples illustrate this flow:
.NET implementation guidance
The following code sample shows how to define and set the default retry policy:
// define/set default retry policy IRetryPolicy retryPolicy = new ExponentialBackoff(int.MaxValue, TimeSpan.FromMilliseconds(100), TimeSpan.FromSeconds(10), TimeSpan.FromMilliseconds(100)); SetRetryPolicy(retryPolicy);
To avoid high CPU usage, the retries are throttled if the code fails immediately. For example, when there's no network or route to the destination. The minimum time to execute the next retry is 1 second.
If the service responds with a throttling error, the retry policy is different and can't be changed via public API:
// throttled retry policy IRetryPolicy retryPolicy = new ExponentialBackoff(RetryCount, TimeSpan.FromSeconds(10), TimeSpan.FromSeconds(60), TimeSpan.FromSeconds(5)); SetRetryPolicy(retryPolicy);
The retry mechanism stops after
DefaultOperationTimeoutInMilliseconds, which is currently set at 4 minutes.
Other languages implementation guidance
For code samples in other languages, review the following implementation documents. The repository contains samples that demonstrate the use of retry policy APIs.