question

mayuri18kadam avatar image
0 Votes"
mayuri18kadam asked Mirosch answered

com.databricks.sql.io.FileReadException: Error while reading file wasbs:REDACTED_LOCAL_PART@****.blob.core.windows.net/

Hi,
I am getting the following error message:

 com.databricks.sql.io.FileReadException: Error while reading file wasbs:REDACTED_LOCAL_PART@****.blob.core.windows.net/cook/processYear=2021/processMonth=01/processDay=08/processHour=03/part-00003-tid-1903224826064875913-0ded1380-19a2-4ed2-9d4d-f19724b5bf5d-29101-1.c000.avro.
     at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1$$anon$2.logFileNameAndThrow(FileScanRDD.scala:286)
 Caused by: java.io.IOException
     at com.microsoft.azure.storage.core.Utility.initIOException(Utility.java:737)
 Caused by: com.microsoft.azure.storage.StorageException: Blob hash mismatch (integrity check failed), Expected value is x2rC4SZaPjA==, retrieved 6kwtbjN2v/w==.
     at com.microsoft.azure.storage.blob.CloudBlob$9.postProcessResponse(CloudBlob.java:1409)


Any idea how to resolve this? Thanks.

azure-blob-storageazure-databricks
· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hello @mayuri18kadam,

Thanks for the ask and also for using the Microsoft Q&A forum.

Could you please share the code which you are running along with the complete stack trace of the error message which you are experiencing?


0 Votes 0 ·

Hello @mayuri18kadam,

Just checking in if you have had a chance to see the previous response. We need the following information to understand/investigate this issue further.

0 Votes 0 ·
mayuri18kadam avatar image
0 Votes"
mayuri18kadam answered PRADEEPCHEEKATLA-MSFT commented

Hi Pradeep, please find the stack trace below:

 com.databricks.sql.io.FileReadException: Error while reading file wasbs:REDACTED_LOCAL_PART@*******.blob.core.windows.net/cook/processYear=2021/processMonth=01/processDay=09/processHour=00/part-00003-tid-4640843606947508963-a580-40bd-ad0d-e7c92f1e5b1f-29229-1.c000.avro.
     at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1$$anon$2.logFileNameAndThrow(FileScanRDD.scala:286)
     at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1$$anon$2.getNext(FileScanRDD.scala:264)
     at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
     at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:205)
     at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:354)
     at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:205)
     at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage58.processNext(Unknown Source)
     at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
     at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:640)
     at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage65.agg_doAggregateWithKeys_0$(Unknown Source)
     at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage65.processNext(Unknown Source)
     at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
     at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:640)
     at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
     at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
     at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
     at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
     at org.apache.spark.scheduler.Task.doRunTask(Task.scala:139)
     at org.apache.spark.scheduler.Task.run(Task.scala:112)
     at org.apache.spark.executor.Executor$TaskRunner$$anonfun$13.apply(Executor.scala:497)
     at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1526)
     at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:503)
     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
     at java.lang.Thread.run(Thread.java:748)
 Caused by: java.io.IOException
     at com.microsoft.azure.storage.core.Utility.initIOException(Utility.java:737)
     at com.microsoft.azure.storage.blob.BlobInputStream.dispatchRead(BlobInputStream.java:264)
     at com.microsoft.azure.storage.blob.BlobInputStream.readInternal(BlobInputStream.java:448)
     at com.microsoft.azure.storage.blob.BlobInputStream.read(BlobInputStream.java:420)
     at java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
     at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
     at java.io.DataInputStream.read(DataInputStream.java:149)
     at shaded.databricks.org.apache.hadoop.fs.azure.NativeAzureFileSystem$NativeAzureFsInputStream.read(NativeAzureFileSystem.java:876)
     at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
     at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
     at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
     at java.io.DataInputStream.read(DataInputStream.java:149)
     at com.databricks.spark.metrics.FSInputStreamWithMetrics$$anonfun$read$3.apply$mcI$sp(FileSystemWithMetrics.scala:206)
     at com.databricks.spark.metrics.FSInputStreamWithMetrics$$anonfun$read$3.apply(FileSystemWithMetrics.scala:206)
     at com.databricks.spark.metrics.FSInputStreamWithMetrics$$anonfun$read$3.apply(FileSystemWithMetrics.scala:206)
     at com.databricks.spark.metrics.ExtendedTaskIOMetrics$class.withTimeMetric(FileSystemWithMetrics.scala:151)
     at com.databricks.spark.metrics.ExtendedTaskIOMetrics$class.com$databricks$spark$metrics$ExtendedTaskIOMetrics$$withTimeAndBytesMetric(FileSystemWithMetrics.scala:171)
     at com.databricks.spark.metrics.ExtendedTaskIOMetrics$$anonfun$withTimeAndBytesReadMetric$1.apply$mcI$sp(FileSystemWithMetrics.scala:185)
     at com.databricks.spark.metrics.ExtendedTaskIOMetrics$$anonfun$withTimeAndBytesReadMetric$1.apply(FileSystemWithMetrics.scala:185)
     at com.databricks.spark.metrics.ExtendedTaskIOMetrics$$anonfun$withTimeAndBytesReadMetric$1.apply(FileSystemWithMetrics.scala:185)
     at com.databricks.spark.metrics.SamplerWithPeriod.sample(FileSystemWithMetrics.scala:78)
     at com.databricks.spark.metrics.ExtendedTaskIOMetrics$class.withTimeAndBytesReadMetric(FileSystemWithMetrics.scala:185)
     at com.databricks.spark.metrics.FSInputStreamWithMetrics.withTimeAndBytesReadMetric(FileSystemWithMetrics.scala:192)
     at com.databricks.spark.metrics.FSInputStreamWithMetrics.read(FileSystemWithMetrics.scala:205)
     at java.io.DataInputStream.read(DataInputStream.java:149)
     at org.apache.avro.mapred.FsInput.read(FsInput.java:54)
     at org.apache.spark.sql.avro.AvroFileFormat$.openAvroReader(AvroFileFormat.scala:275)
     at org.apache.spark.sql.avro.AvroFileFormat$$anonfun$buildReader$1.apply(AvroFileFormat.scala:202)
     at org.apache.spark.sql.avro.AvroFileFormat$$anonfun$buildReader$1.apply(AvroFileFormat.scala:183)
     at org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:147)
     at org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:134)
     at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1$$anon$2.getNext(FileScanRDD.scala:235)
     ... 23 more
 Caused by: com.microsoft.azure.storage.StorageException: Blob hash mismatch (integrity check failed), Expected value is xmypzfnpTdq8eFLxZ49DhQ==, retrieved CY7+V9/JEfVroD5omBB2Uw==.
     at com.microsoft.azure.storage.blob.CloudBlob$9.postProcessResponse(CloudBlob.java:1409)
     at com.microsoft.azure.storage.blob.CloudBlob$9.postProcessResponse(CloudBlob.java:1310)
     at com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:149)
     at com.microsoft.azure.storage.blob.CloudBlob.downloadRangeInternal(CloudBlob.java:1493)
     at com.microsoft.azure.storage.blob.BlobInputStream.dispatchRead(BlobInputStream.java:255)
     ... 53 more
· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hello @mayuri18kadam,

Thanks for sharing the stack trace of the error message.

Could you please share the code which you are running?

0 Votes 0 ·

Hello @mayuri18kadam,
Just checking in if you have had a chance to see the previous response. We need the following information to understand/investigate this issue further.

0 Votes 0 ·
mayuri18kadam avatar image
0 Votes"
mayuri18kadam answered mayuri18kadam edited

hi @PRADEEPCHEEKATLA-MSFT , following is the code to upload files to azure blob container:

 spark.conf.set("fs.azure", "org.apache.hadoop.fs.azure.NativeAzureFileSystem")
 spark.conf.set("fs.azure.sas."+blobStorageAccContainerName+"."+blobStorageAccName+".blob.core.windows.net", blobStorageBlobSASToken)
 containerPath = "wasbs://" + blobStorageAccContainerName + "@" + blobStorageAccName + ".blob.core.windows.net/"
             var storageCheckpointDirectory = checkpointDirectory
             if (storageCheckpointDirectory.isEmpty) {
               storageCheckpointDirectory = Paths.get(new java.io.File(".").getCanonicalPath).toString
             }
             storageCheckpointDirectory = storageCheckpointDirectory + blobStorageAccName + "/" + blobStorageAccContainerName + "/" + dirName
             val queryName = "uploadDataToBlob:" + dirName
             spark.sparkContext.setLocalProperty("spark.scheduler.pool", dirName)
             var df = data.writeStream
                           .option("checkpointLocation", storageCheckpointDirectory)
                           .queryName(queryName)
                           .format(format)
             if (partitionCols.nonEmpty) df = df.partitionBy(partitionCols: _*)
             df.option("path", blob.getcontainerPath + dirName)
               .start()





5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

mayuri18kadam avatar image
0 Votes"
mayuri18kadam answered mayuri18kadam edited

Hi @PRADEEPCHEEKATLA-MSFT , following is the code to read from azure blob container:

 spark.conf.set("fs.azure", "org.apache.hadoop.fs.azure.NativeAzureFileSystem")
 spark.conf.set("fs.azure.sas."+blobStorageAccContainerName+"."+blobStorageAccName+".blob.core.windows.net", blobStorageBlobSASToken)
 containerPath = "wasbs://" + blobStorageAccContainerName + "@" + blobStorageAccName + ".blob.core.windows.net/"
      spark.read
               .format(format)
               .load(containerPath  + dirName)



5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Mirosch avatar image
0 Votes"
Mirosch answered

any updates regarding this issue? we are experiencing exactly the same

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.