What does it mean for IoT Central to have high availability, disaster recovery (HADR), and elastic scale?

Azure IoT Central is an application platform as a service (aPaaS) that manages scalability and HADR for you. An IoT Central application can scale to support millions of connected devices. For more information about device and message pricing, see Azure IoT Central pricing. For more information about the service level agreement, see SLA for Azure IoT Central.

This article provides background information about how IoT Central scales and delivers HADR. The article also includes guidance on how to take advantage of these capabilities.

Scalability

IoT Central applications internally use multiple Azure services such as IoT Hub and the Device Provisioning Service (DPS). Many of these underlying services are multi-tenanted. However, to ensure the full isolation of customer data, IoT Central uses single-tenant IoT hubs.

IoT Central automatically scales its IoT hubs based on the load profiles in your application. IoT Central can scale up individual IoT hubs and scale out the number of IoT hubs in an application. IoT Central also automatically scales other underlying services.

High availability and disaster recovery

For highly available device connectivity, an IoT Central application always have at least two IoT hubs. For exceptions to to this rule, see Limitations. The number of hubs can grow or shrink as IoT Central scales the application in response to changes in the load profile.

IoT Central also uses availability zones to make various services it uses highly available.

An incident that requires disaster recovery could range from a subset of services becoming unavailable to a whole region becoming unavailable. IoT Central follows different recovery processes depending on the nature and scale of the incident. For example, if an entire Azure region becomes unavailable in the wake of a catastrophic failure, disaster recovery procedures failover applications to another region in the same geography.

Work with multiple IoT hubs

As a consequence of automatic scaling and HADR support, the IoT hub instances in your application can change. For example:

  • The number of hubs could increase or decrease as the application scales.
  • A hub could fail and become unavailable.
  • The disaster recovery procedures could add new hubs in a different region to replace the hubs in a failed region.

Although IoT Central manages the IoT hubs in your application for you, a device must be able to re-establish a connection if the hub it connects to is unavailable:

Device provisioning

As the number of IoT hubs in your application changes, a device might need to connect to a different hub.

Before a device connects to IoT Central, it must be registered and provisioned in the underlying services. When you add a device to an IoT Central application, IoT Central adds an entry to a DPS enrollment group. Information from the enrollment group such as the ID scope, device ID, and keys is surfaced in the IoT Central UI.

When a device first connects to your IoT Central application, DPS provisions the device in one of the enrollments group's linked IoT hubs. The device is then associated with that IoT hub. DPS uses an allocation policy to load balance the provisioning across the IoT hubs in the application. This process makes sure each IoT hub has a similar number of provisioned devices.

To learn more about registration and provisioning in IoT Central, see IoT Central device connectivity guide.

Device connections

After DPS provisions a device to an IoT hub, the device always tries to connect to that hub. If a device can't reach the IoT hub it's provisioned to, it can't connect to your IoT Central application. To handle this scenario, your device firmware should include a retry strategy that reprovisions the device to another hub.

To learn more about how device firmware should handle connection errors and connect to a different hub, see Best practices.

To learn more about how to verify your device firmware can handle connection failures, see Test failover capabilities.

Data export

IoT Central applications often use other, user configured services. For example, you can configure your IoT Central application to continuously export data to services such as Azure Event Hubs and Azure Blob Storage.

If a configured data export can't write to its destination, IoT Central tries to retransmit the data for up to 15 minutes, after which IoT Central marks the destination as failed. Failed destinations are periodically checked to verify if they're writable.

You can force IoT Central to restart the failed exports by disabling and re-enabling the data export.

Review the high availability and scalability best practices for the data export destination service you're using:

Limitations

Currently, there are a few legacy IoT Central applications created before April 2021 that haven't yet migrated to the multiple IoT hub architecture. Use the az iot central device manual-failover command to check if your application still uses a single IoT hub.

Currently, IoT Edge devices can't move between IoT hubs.

Next steps

Now that you've learned about the scalability and high availability of Azure IoT Central, the suggested next step is to learn about Quotas and limits in Azure IoT Central.