Azure Blob Storage on Iot Edge module stops upload processing after power outage

Boris Lange 1 Reputation point
2020-12-02T10:11:09.957+00:00

We are currently experiencing an issue with the Azure Blob Storage on IoT Edge module. The module is configured to automatically upload files to an Azure Cloud Storage Account. So far, everything is working as expected.

While uploading a file to the Edge Blob Storage module, we simulate a power outage (by simply pulling the plug). After rebooting the device we see that the file is written into the Edge Blob Storages mount path on the hard drive, but the upload towards the cloud is not being done. We can accept this behavior, since the file might not be properly transmitted to the Edge Blob Storage. The problem that we face is, that no further files, that we upload to the Edge Blob Storage, are being uploaded to the cloud either.

We see the following exception messages in the blob storage module logs:

[2020-11-30 13:59:16.410] [warning ] [tid 871] Microsoft.Azure.Devices.BlobStorage.Tiering.TieringOrchestrator: Too many errors. Scheduling retry is skipped. Next tiering run will pick up the work again depending on the backlog policy.  
[2020-11-30 13:59:16.410] [info    ] [tid 857] [BlobInterface.cc:1023] [GetBlockList] GetBlockList received. Container:ingress Blob:[REDACTED_FILENAME] BlockListType:1  
[2020-11-30 13:59:16.410] [info    ] [tid 857] [BlobInterface.cc:1118] [GetBlockList] GetBlockList completed. Container:ingress Blob:[REDACTED_FILENAME] BlockIdSize:39 UncommitedBlocksCount:0 CommitedBlocksCount:171 ContentLength:44665654 ModificationTime:132512154813857402  
[2020-11-30 13:59:16.411] [info    ] [tid 857] [BlobInterface.cc:1615] [GetBlobTieringMetadata] GetBlobTieringMetadata received. Container:ingress Blob:[REDACTED_FILENAME]  
[2020-11-30 13:59:16.412] [info    ] [tid 857] [BlobInterface.cc:316] [GetBlobData] GetBlobData received. Container:ingress Blob:[REDACTED_FILENAME] Offset:0 Length:65536 DataBufferOffset:0  
[2020-11-30 13:59:16.412] [info    ] [tid 857] [BlobInterface.cc:349] [GetBlobData] Blob retrieved. Container:ingress Blob:[REDACTED_FILENAME] CommittedBlockListSize:171  
[2020-11-30 13:59:16.412] [info    ] [tid 875] [BlobInterface.cc:316] [GetBlobData] GetBlobData received. Container:ingress Blob:[REDACTED_FILENAME] Offset:262144 Length:65536 DataBufferOffset:0  
[2020-11-30 13:59:16.412] [error   ] [tid 857] [DataStore.cc:197] [ReadChunk] open failed with error:  
[2020-11-30 13:59:16.412] [error   ] [tid 857] [BlobInterface.cc:415] [GetBlobData] GetBlobData failed. Container:ingress Blob:[REDACTED_FILENAME] Status:1359  
[2020-11-30 13:59:16.412] [info    ] [tid 875] [BlobInterface.cc:349] [GetBlobData] Blob retrieved. Container:ingress Blob:[REDACTED_FILENAME] CommittedBlockListSize:171  
[2020-11-30 13:59:16.412] [error   ] [tid 875] [DataStore.cc:197] [ReadChunk] open failed with error:  
[2020-11-30 13:59:16.412] [error   ] [tid 875] [BlobInterface.cc:415] [GetBlobData] GetBlobData failed. Container:ingress Blob:[REDACTED_FILENAME] Status:1359  
[2020-11-30 13:59:16.412] [info    ] [tid 874] [BlobInterface.cc:316] [GetBlobData] GetBlobData received. Container:ingress Blob:[REDACTED_FILENAME] Offset:524288 Length:65536 DataBufferOffset:0  
[2020-11-30 13:59:16.412] [info    ] [tid 874] [BlobInterface.cc:349] [GetBlobData] Blob retrieved. Container:ingress Blob:[REDACTED_FILENAME] CommittedBlockListSize:171  
[2020-11-30 13:59:16.412] [error   ] [tid 874] [DataStore.cc:197] [ReadChunk] open failed with error:  
[2020-11-30 13:59:16.413] [error   ] [tid 874] [BlobInterface.cc:415] [GetBlobData] GetBlobData failed. Container:ingress Blob:[REDACTED_FILENAME] Status:1359  
[2020-11-30 13:59:16.415] [error   ] [tid 857] Microsoft.Azure.Devices.BlobStorage.Tiering.BlobUpload: PutBlock failed for ingress.[REDACTED_FILENAME].2020-11-30T13:11:21.3857402Z, block id NTA2YWVjOGM5MWI5NDlhNzk4MzUwYzUzMDU5Y2M1OTQtMDAwMDAw  
Error: ErrorCode: 1359, Exception of type 'Microsoft.AzureStack.Services.Storage.Blob.BlobClientException' was thrown.  
   at Microsoft.AzureStack.Services.Storage.Blob.WBlobClient.<>c__DisplayClass28_0.<GetBlobDataAsync>b__0() in /app/common/Blob/BlobClient.cs:line 723  
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)  
--- End of stack trace from previous location where exception was thrown ---  
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot)  
--- End of stack trace from previous location where exception was thrown ---  
   at Microsoft.Azure.Devices.BlobStorage.Common.BlobReadStream.ReadAsync(Byte[] buffer, Int32 offset, Int32 count, CancellationToken cancellationToken) in /app/common/BlobReadStream.cs:line 88  
   at Microsoft.WindowsAzure.Storage.Core.Util.AsyncStreamCopier`1.StartCopyStreamAsyncHelper(Nullable`1 copyLength, Nullable`1 maxLength, CancellationToken token)  
   at Microsoft.WindowsAzure.Storage.Core.Util.AsyncStreamCopier`1.StartCopyStreamAsync(Nullable`1 copyLength, Nullable`1 maxLength, CancellationToken cancellationToken)  
   at Microsoft.WindowsAzure.Storage.Blob.CloudBlockBlob.PutBlockAsync(String blockId, Stream blockData, String contentMD5, AccessCondition accessCondition, BlobRequestOptions options, OperationContext operationContext, AggregatingProgressIncrementer progressIncrementer, CancellationToken cancellationToken)  
   at Microsoft.Azure.Devices.BlobStorage.Tiering.BlobUpload.PutBlockOnRemote(CloudBlockBlob blob, BlockToPush block) in /app/tiering/BlobUpload.cs:line 540  
[2020-11-30 13:59:16.416] [error   ] [tid 875] Microsoft.Azure.Devices.BlobStorage.Tiering.BlobUpload: PutBlock failed for ingress.[REDACTED_FILENAME].2020-11-30T13:11:21.3857402Z, block id NTA2YWVjOGM5MWI5NDlhNzk4MzUwYzUzMDU5Y2M1OTQtMDAwMDAx  
Error: ErrorCode: 1359, Exception of type 'Microsoft.AzureStack.Services.Storage.Blob.BlobClientException' was thrown.  
   at Microsoft.AzureStack.Services.Storage.Blob.WBlobClient.<>c__DisplayClass28_0.<GetBlobDataAsync>b__0() in /app/common/Blob/BlobClient.cs:line 723  
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)  
--- End of stack trace from previous location where exception was thrown ---  
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot)  
--- End of stack trace from previous location where exception was thrown ---  
   at Microsoft.Azure.Devices.BlobStorage.Common.BlobReadStream.ReadAsync(Byte[] buffer, Int32 offset, Int32 count, CancellationToken cancellationToken) in /app/common/BlobReadStream.cs:line 88  
   at Microsoft.WindowsAzure.Storage.Core.Util.AsyncStreamCopier`1.StartCopyStreamAsyncHelper(Nullable`1 copyLength, Nullable`1 maxLength, CancellationToken token)  
   at Microsoft.WindowsAzure.Storage.Core.Util.AsyncStreamCopier`1.StartCopyStreamAsync(Nullable`1 copyLength, Nullable`1 maxLength, CancellationToken cancellationToken)  
   at Microsoft.WindowsAzure.Storage.Blob.CloudBlockBlob.PutBlockAsync(String blockId, Stream blockData, String contentMD5, AccessCondition accessCondition, BlobRequestOptions options, OperationContext operationContext, AggregatingProgressIncrementer progressIncrementer, CancellationToken cancellationToken)  
   at Microsoft.Azure.Devices.BlobStorage.Tiering.BlobUpload.PutBlockOnRemote(CloudBlockBlob blob, BlockToPush block) in /app/tiering/BlobUpload.cs:line 540  
[2020-11-30 13:59:16.416] [error   ] [tid 874] Microsoft.Azure.Devices.BlobStorage.Tiering.BlobUpload: PutBlock failed for ingress.[REDACTED_FILENAME].2020-11-30T13:11:21.3857402Z, block id NTA2YWVjOGM5MWI5NDlhNzk4MzUwYzUzMDU5Y2M1OTQtMDAwMDAy  
Error: ErrorCode: 1359, Exception of type 'Microsoft.AzureStack.Services.Storage.Blob.BlobClientException' was thrown.  
   at Microsoft.AzureStack.Services.Storage.Blob.WBlobClient.<>c__DisplayClass28_0.<GetBlobDataAsync>b__0() in /app/common/Blob/BlobClient.cs:line 723  
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)  
--- End of stack trace from previous location where exception was thrown ---  
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot)  
--- End of stack trace from previous location where exception was thrown ---  
   at Microsoft.Azure.Devices.BlobStorage.Common.BlobReadStream.ReadAsync(Byte[] buffer, Int32 offset, Int32 count, CancellationToken cancellationToken) in /app/common/BlobReadStream.cs:line 88  
   at Microsoft.WindowsAzure.Storage.Core.Util.AsyncStreamCopier`1.StartCopyStreamAsyncHelper(Nullable`1 copyLength, Nullable`1 maxLength, CancellationToken token)  
   at Microsoft.WindowsAzure.Storage.Core.Util.AsyncStreamCopier`1.StartCopyStreamAsync(Nullable`1 copyLength, Nullable`1 maxLength, CancellationToken cancellationToken)  
   at Microsoft.WindowsAzure.Storage.Blob.CloudBlockBlob.PutBlockAsync(String blockId, Stream blockData, String contentMD5, AccessCondition accessCondition, BlobRequestOptions options, OperationContext operationContext, AggregatingProgressIncrementer progressIncrementer, CancellationToken cancellationToken)  
   at Microsoft.Azure.Devices.BlobStorage.Tiering.BlobUpload.PutBlockOnRemote(CloudBlockBlob blob, BlockToPush block) in /app/tiering/BlobUpload.cs:line 540  
[2020-11-30 13:59:16.416] [info    ] [tid 874] Microsoft.Azure.Devices.BlobStorage.Tiering.BlobUpload: Couldn't upload all blocks or commit them for ingress.[REDACTED_FILENAME].2020-11-30T13:11:21.3857402Z Saving partial tiering metadata.  
[2020-11-30 13:59:16.416] [info    ] [tid 874] [BlobInterface.cc:1582] [SetBlobTieringMetadata] SetBlobTieringMetadata received. Container:ingress Blob:[REDACTED_FILENAME] MetadataSize:256  
[2020-11-30 13:59:16.420] [error   ] [tid 874] Microsoft.Azure.Devices.BlobStorage.Tiering.BlobUpload: Upload failed for ingress.[REDACTED_FILENAME].2020-11-30T13:11:21.3857402Z. Error: ErrorCode: 1359, Exception of type 'Microsoft.AzureStack.Services.Storage.Blob.BlobClientException' was thrown.  
   at Microsoft.AzureStack.Services.Storage.Blob.WBlobClient.<>c__DisplayClass28_0.<GetBlobDataAsync>b__0() in /app/common/Blob/BlobClient.cs:line 723  
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)  
--- End of stack trace from previous location where exception was thrown ---  
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot)  
--- End of stack trace from previous location where exception was thrown ---  
   at Microsoft.Azure.Devices.BlobStorage.Common.BlobReadStream.ReadAsync(Byte[] buffer, Int32 offset, Int32 count, CancellationToken cancellationToken) in /app/common/BlobReadStream.cs:line 88  
   at Microsoft.WindowsAzure.Storage.Core.Util.AsyncStreamCopier`1.StartCopyStreamAsyncHelper(Nullable`1 copyLength, Nullable`1 maxLength, CancellationToken token)  
   at Microsoft.WindowsAzure.Storage.Core.Util.AsyncStreamCopier`1.StartCopyStreamAsync(Nullable`1 copyLength, Nullable`1 maxLength, CancellationToken cancellationToken)  
   at Microsoft.WindowsAzure.Storage.Blob.CloudBlockBlob.PutBlockAsync(String blockId, Stream blockData, String contentMD5, AccessCondition accessCondition, BlobRequestOptions options, OperationContext operationContext, AggregatingProgressIncrementer progressIncrementer, CancellationToken cancellationToken)  
   at Microsoft.Azure.Devices.BlobStorage.Tiering.BlobUpload.PutBlockOnRemote(CloudBlockBlob blob, BlockToPush block) in /app/tiering/BlobUpload.cs:line 540  
   at Microsoft.Azure.Devices.BlobStorage.Common.AsyncExtensions.<>c__DisplayClass1_1`1.<<ForEachAsync>b__1>d.MoveNext() in /app/common/extensions/AsyncExtensions.cs:line 59  
--- End of stack trace from previous location where exception was thrown ---  
   at Microsoft.Azure.Devices.BlobStorage.Tiering.BlobUpload.PushAndCommitBlocks(CloudBlockBlob blob, IEnumerable`1 blocks) in /app/tiering/BlobUpload.cs:line 439  
   at Microsoft.Azure.Devices.BlobStorage.Tiering.BlobUpload.UploadFromBlockList(CloudBlockBlob blob, GetBlockListOperationResult listBlocksResult) in /app/tiering/BlobUpload.cs:line 408  
   at Microsoft.Azure.Devices.BlobStorage.Tiering.BlobUpload.UploadBlobMetaAndData(CloudBlob blob, BlobProperties properties, GetBlockListOperationResult listBlocksResult) in /app/tiering/BlobUpload.cs:line 204  
   at Microsoft.Azure.Devices.BlobStorage.Tiering.BlobUpload.UploadBlob() in /app/tiering/BlobUpload.cs:line 151  
   at Microsoft.Azure.Devices.BlobStorage.Tiering.BlobUpload.Run() in /app/tiering/BlobUpload.cs:line 93  

Are we doing something wrong? We followed all the guidelines of the following documentation:

https://learn.microsoft.com/en-us/azure/iot-edge/how-to-store-data-blob?view=iotedge-2018-06
https://learn.microsoft.com/en-us/azure/iot-edge/how-to-deploy-blob?view=iotedge-2018-06

Azure IoT
Azure IoT
A category of Azure services for internet of things devices.
387 questions
Azure IoT Edge
Azure IoT Edge
An Azure service that is used to deploy cloud workloads to run on internet of things (IoT) edge devices via standard containers.
548 questions
{count} votes

1 answer

Sort by: Most helpful
  1. QuantumCache 20,106 Reputation points
    2020-12-08T06:49:06.997+00:00

    Hello all, as we are investigating this scenario, here is what we are discussing in general related to this issue.

    You can see the below comments by the Product team involved in troubleshooting this issue, we will keep this post open until we post a final resolution, we hope the below steps/suggestions may help if you too face the same issue.

    We may need more info.

    1. How many blobs does the customer think got corrupted when the machine got unplugged?
    2. Can we get more logs? Especially some logs before the first occurrence of “Microsoft.Azure.Devices.BlobStorage.Tiering.TieringOrchestrator: Too many errors. Scheduling retry is skipped.”. Ideally a couple of minutes since module startup and also logs from right before the machine was unplugged.
    3. Have the customer tried restarting the module?
    4. What is the load pattern, ie. How many blobs/how often are written to containers configured for upload?

    Some background for what may be happening:
    We have 2 modes of operation during upload: “normal” and “limited”. In normal mode, if an error happens a blob is set aside for retry in a in-memory queue. If too many consecutive errors happen we switch tactics. We stop settings aside blobs in memory (to limit memory usage growth) and we try to upload something from the top of the queue but at a slower cadence. This mode was designed mainly to handle offline periods.
    If there is one/few bad blobs among many good, the module should continue working in “normal” mode. There would be an error from time to time, blob would be set aside in retry queue with some delay, and the remaining “good” blobs should be processed.
    If there is a period where there are no new blobs to upload, the module would try to process only items on the errored/retry queue which would increase the counter of consecutive errors and switch modes to limited which means the module may be stuck on processing the same corrupted blob if the upload order is set to OldestFirst (default). Does it sound like something that might be happening here? By default retry interval for failed uploads is 3 seconds and consecutive errors limit is 20, so if no new good blob is produced in a minute (assuming 1 corrupted blob, with more it’ll be faster), module can get stuck.

    We will think on how to improve this logic. In the meanwhile here are some things worth trying (one or more from the list):
    • Restart the module. Module starts in “normal” mode by default.
    • Set deviceToCloudUploadProperties:retryIntervalSeconds to a higher number, eg. 20
    • Set deviceToCloudUploadProperties:consecutiveErrorsLimit to a higher number, eg. 200
    • Set deviceToCloudUploadProperties:uploadOrder to NewestFirst

    Please let me know if it helps. If changing these settings didn’t help, we’ll need answers to questions 1-4 above to further investigate.
    Logs from right before the machine was unplugged would help with designing a solution to avoid the corruption in the first place.

    Please comment in the below section if you need further help in this matter or to share your thoughts/feedback.