ADF Copy activity truncating .bz2 files

Marshall Skare 1 Reputation point
2022-01-24T01:30:09.363+00:00

I have a copy activity where .bz2 files in one blob storage location need to be copied to another blob storage location. During the copy, I need to decompress the files, so the source dataset has the Compression type set as bzip2. For small file sizes (~10KB compressed) this process works as expected. When the source file size is larger (>~200KB compressed) Azure appears to assign a file size of precisely 900000 bytes to the file, regardless of the file's actual size. Here's an example:

"dataRead": 36067,
"dataWritten": 900000,
"filesRead": 1,
"filesWritten": 1,
"sourcePeakConnections": 1,
"sinkPeakConnections": 1,
"copyDuration": 4,
"throughput": 8.805,
"errors": [],

I've checked to ensure I'm doing all copy activities single-threaded, so this should not be a file chunk for a parallel copy activity. I can copy the compressed .bz2 files from one location to another successfully regardless of file size. Are there file size limits in play for decompression on the fly like this?

Thanks for your help!

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,627 questions
{count} votes