question

JohanKarlsson-8881 avatar image
1 Vote"
JohanKarlsson-8881 asked JohanKarlsson-8881 commented

Offline TTL and loss of power technical question

I am looking for more information regarding how TTL is handled on the Edge Hub in different scenarios. By default, the TTL is 7200 seconds (2 hours). Consider the following scenarios and please correct me if I'm wrong here:

Sunshine case
1. System is online
2. A message appears on the Edge Hub at timestamp 1.
3. The message is forwarded upstream immediately

Offline case
4. System is offline
5. A message appears on the Edge Hub at timestamp 1
6. The system remains offline for 1 hour (half the TTL)
7. When online, all received messages are forwarded to the IoT Hub

Power outage case
8. System is offline
9. A message appears on the Edge Hub at timestamp 1
10. The site loses power for 4 hours (TTL has long passed)
11. System boots up again and is now online
12. What happens to the messages in this scenario?

Say that for instance the retry interval of the edge hub is 2, the ttl is 3 and my message is sent at 1 and is not successfully delivered. We then suffer a power outage, and the hub comes back up on when the time is 10. Does the hub solve this scenario, or do I handle it myself?

Had a quick look at the edge module source and it looks like it uses System.currentTimeMillis() for Java in the EdgeModule, and if the messages timestamp is older than that a call back is fired with message status MESSAGE_EXPIRED. So I guess either increase TTL or handle this callback?

Hope you can shed some light on this! Loving the offline features so far, has saved us a ton of work.

Update:

So I realized that looking at the Java source is only half the puzzle, since the messages should be handled by the edgeHub container. Looking at the source for that, it looks like the same logic appears again. How these two interact is beyond me.

azure-iot-edge
· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hello @JohanKarlsson-8881 Thanks for posting your question on this forum.
Community SME's on this topic or our team will review your scenario and circle back at the possible earliest time.

2 Votes 2 ·
SandervandeVelde42 avatar image
1 Vote"
SandervandeVelde42 answered JohanKarlsson-8881 commented

Hello @JohanKarlsson-8881 , @SatishBoddu-MSFT ,

We have used Azure IoT edge in numerous projects in different situations, up to the jungle of Malaysia.

From the very start, we implemented this heartbeat module so we could become aware if there were irregularities together with the Azure Stream Analytics LAG query.

Personally, I prefer the current way of guaranteed delivery. If we experienced missing messages it almost always was due to external factors (failing hardware or network or misconfiguration).


· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hello @JohanKarlsson-8881 Just checking in if you have had a chance to see the previous response from SandervandeVelde42.

Please feel free to post your comments over here! Happy to help further.

1 Vote 1 ·

Thanks for the insight!

Would be interested to hear more of any use cases you have used this in.

I have a couple of different issues, one being that if we would be offline for extended periods of time a heartbeat would create a unique message each time and take up disk space which might be valuable. I think this is still the best approach for us, and I have implemented something similar that also verifies the health of the system at the same time.

0 Votes 0 ·
SatishBoddu-MSFT avatar image
1 Vote"
SatishBoddu-MSFT answered JohanKarlsson-8881 commented

Hello @JohanKarlsson-8881, the below scenario is a very good question and may address real-time industry problems.

Power outage case
1. System is offline
2. A message appears on the Edge Hub at timestamp 1
3. The site loses power for 4 hours (TTL has long passed)
4. System boots up again and is now online
5. What happens to the messages in this scenario?

Below is the quoted response from the product team, which gives a brief on the initial query and suggestion.

If we don't want messages to be dropped due to TTL, the built-in way is to just set it as some large number. Once edgehub ACKs the incoming message, the downstream sender is no longer responsible, if TTL is a concern then its length needs to be increased. Once EdgeHub receives a message, it guarantees delivery so the downstream sender wouldn't need to worry about reprocessing.

One of the suggestions, implementing own message order validation by putting some kind of sequential ID in the message headers and having their backend processor ensure that there are no gaps and if a gap is found, the backend would trigger a direct method back down to the message sender to resend the missing messages.

I will let you know if I find more content on this to help you with...

We may also take suggestions from real-time industry experts & MVP such as Sander.

Cc: @SandervandeVelde42 , could you please share your experience on this scenario, how to handle lengthy TTL during power outages?


· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

I massively increased our upstream TTL, and tried to cover the cases where this might take up too much disk space. Thank you!

1 Vote 1 ·