A community member has associated this post with a similar question:
Does sending an event with a specific partition key in Azure Event Hubs break multi-AZ resiliency?

Only moderators can edit this content.

How does Azure Event Hubs handle high-availability when a partition key ID is provided while publishing an event?

Andrew Citera 45 Reputation points
2024-05-09T18:07:03.1233333+00:00

I'm trying to understand at a lower level of detail how Azure Event Hubs behave when there is an availability zone outage. I know that topic partitions are replicated across three availability zones and there is a service fabric model under the hood that elects a leader and that an event producer doesn't receive a successful acknowledge until replication has occurred. I also know that when a partition key isn't specified Azure Event Hubs writes to available partitions in a round robin fashion thus improving the availability.

My question is specifically how Azure Event Hubs handle recovery if an application does indeed need to provide a specific partition key. I understand that if this partition is unavailable and that partition key is supplied then this would result in an error because the event hub gateway would prevent it from being written to an unavailable partition; however, where I'm not able to find specific details is would that partition eventually recover and would it have the same key?

Take the following example (assume the Event Hub is multi-AZ enabled):

Producer A writes to Topic A Partition 0 with Partition Key ID 0 and Partition 0 is available --> Success

Producer A writes to Topic A Partition 0 with Partition Key ID 0 and Partition 0 is available --> Success

Producer A writes to Topic A Partition 0 with Partition Key ID 0 and Partition 0 is unavailable --> Failure

[What happens here? How does event hubs recover partition 0 and bring it back online?]

Producer A writes to Topic A Partition 0 with Partition Key ID 0 and Partition 0 is available --> Success or Failure? Does the partition maintain the same ID?

I assume some of the retry logic is handled by what is baked in the SDK, but the documentation isn't clear if the partition would eventually recover or if specifying a partition key completely prevents high availability. The following document snippets feel like they don't make sense e.g. is partition ID vs. high availability truly a complete tradeoff or is it just that the availability is reduced by event hubs is still going to recover that partition?

[1] https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-availability-and-consistency?tabs=dotnet#:~:text=Therefore%2C%20if%20high,see%20Partitions.

[2] https://learn.microsoft.com/en-us/azure/architecture/reference-architectures/event-hubs/partitioning-in-event-hubs-and-kafka#:~:text=With%20Kafka%2C%20if,to%20unavailable%20partitions.

Azure Event Hubs
Azure Event Hubs
An Azure real-time data ingestion service.
573 questions
{count} votes