question

Hamish-9586 avatar image
0 Votes"
Hamish-9586 asked SerkantKaraca-MSFT answered

Azure IoT Hub trigger for Azure Functions breaks until I touch EventHub partition storage blobs

I have an Azure Function which uses and IoT Hub trigger. You can see the function decorations below, but it's pretty close to sample code. IoT hub is running on the B1 tier, with four partitions. The function has its own consumer group, to avoid any conflicts with desktop debuggers, VS Code's device monitoring plugin etc. It works OK, but after a few hours, or whenever I restart the service it no longer receives any events from the Event Hub endpoint. Maybe these two conditions are the same, because the service gets restarted due to Azure reallocating resources. After multiple round of refinement, this is the simplest procedure I have found to make it work again:

  1. Using the Azure Portal, explore the azure-webjobs-eventhub container corresponding to the Event Hub.

  2. For each partition in the corresponding consumer group
    a. Open the blob for that partition of that consumer group in the editor.
    b. If the JSON defines an offset other than null, make a change then undo the change so that the UI will allow it to be saved in its unmodified state.

If I do this while there are two devices sending telemetry messages every 5 seconds, I observe that all the backlogged messages from one of the devices pour in after a write one of the blobs. When I write the other, the same happens for the messages from the other device. After this procedure, everything works as it should for a few more hours.

This is obviously not a fix I would be comfortable using in production. It seems to indicate some underlying bug or misconfiguration. Where to now?

Although all tutorials talk about using this approach to responding to IoT Hub events in Azure Functions, I note that Event Grid is another, newer, approach. Is there any reason to think it will be more reliable than Event Hub endpoints?

  [FunctionName("IotHubTrigger")]
     public static async Task Run(
         [EventHubTrigger("messages/events", Connection = "IoTHubEndpoint", ConsumerGroup = "%IoTConsumerGroup%")]EventData message,
         [SignalR(HubName = SignalRHubName)]IAsyncCollector<SignalRMessage> signalRMessages,      
         ILogger log)
     {
 //my code here
    
 }









azure-functionsazure-iot-hubazure-event-hubs
· 4
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hello @Hamish-9586 ,

Do you experience the same behavior with a S1 IoT Hub?

Why is the consumer group name having the % characters?

Regards,

Sander

0 Votes 0 ·
Hamish-9586 avatar image Hamish-9586 SandervandeVelde42 ·

The consumer group name is defined with a Binding Expression, which allows it to be configured in the function's settings. When I'm debugging on my desktop, I don't want to consume messages so that they're not received by the production instance, so the consumer group has to be different.

Lead by your question, I did switch this function over to a B1 IoT Hub. 4 partitions, same behavior except now I'm paying $10 per month to debug this thing!

0 Votes 0 ·

Hello @Hamish-9586 ,

sorry for the confusion.

I had the creation of a second hub in mind. The way you tested this is a one-way migration from F (free) tier to other tiers. There should have been some kind of message...

You have to drop the B1 and recreate the F1 again.

Billing is not done for a complete day. I believe the interval is per 1 day (I had the same issue with a T3 years ago, Consumed my complete monthly MSDN amount in 'seconds' and had to wait until a new month started).

0 Votes 0 ·
Show more comments

1 Answer

SerkantKaraca-MSFT avatar image
0 Votes"
SerkantKaraca-MSFT answered

Are you using a Gen2 Storage account by any chance?

· 3
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Yes, it's Gen2, because I need there features for other parts of my solution. From the portal:

Performance/Access tier: Standard/Hot
Replication: Locally-redundant storage (LRS)
Account kind: StorageV2 (general purpose v2)

I'll resist speculating on what you might say next!

0 Votes 0 ·

Gen2 has lease management issues that can leave a lease stuck in 'leased' state forever. Besides, Gen2 is way more costly when used as an EPH checkpoint store. I recommend creating a dedicated Standard account for EPH use.

1 Vote 1 ·
Hamish-9586 avatar image Hamish-9586 SerkantKaraca-MSFT ·

thank you for this, I will give it a try. If this turns out to be the problem, it's surprising (and disappointing) that it has not be found and mitigated before. AT the very least, one might expect a warning when selecting a storage account which presents known issues. I can't imagine this is very complex to reproduce.

0 Votes 0 ·