question

JanVavra81 avatar image
0 Votes"
JanVavra81 asked JanVavra81 commented

blob storage - change path of blobs

I've acccidently uploaded 1M of files into container/2019/2019/restofpath and I'd like for each uploaded file change path to container/2019/restofpath. Can it be done, even blob by blob, via some command or api call to change only the path? Something like change metadata of blob ...

If I would have done this by couple of commands az copy, az remove, then I'd pay again for write operations (e.g. 70 e).

azure-blob-storage
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

cooldadtx avatar image
0 Votes"
cooldadtx answered

That's a common misconception of blob storage, there are no paths. The only thing in storage is the storage account and the container. Example, you are using container A and you want to store the file myfile.txt in 2019/restofpath. This is actually a single blob object with the name 2019/restofpath/myfile.txt. "Folders" don't exist in the blog world. But since this is a common need most blob explorers will render a virtual file system when you use those kinds of names in a blob name.

So, in answer to your question, all your blob objects need to be renamed. You have to do that using a copy. There is no other way.

As for costing you'll need to decide the best option. If you want to reset the container and start over then you'd pay for uploading all the documents. If you copy you'll be paying for the write as well but they aren't the same. Azure uses egress and ingress terminology and egress happens going into and out of Azure whereas ingress is generally within Azure within the same region. For Blob (based upon the calculator) copy and fresh writes cost the same so I don't know that it would matter.

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

JanVavra81 avatar image
0 Votes"
JanVavra81 answered cooldadtx commented

Ok, I understand the concept o blob storage. I was courious if there is such possibility. I imagine blob storage as a database application that stores metadata (blob path and others) and somewhere must be a filesystem for binary data. And it has all other features like soft delete, triggers, etc.

Currently I am missing two features of azcopy
- change path inside the blob container
- preserve last modified timestamp when copying to blob

Does anybody know how the data occupied is counted?
I have a million of small xml files and I am thinking about zipping them. But If it were counted at filesystem cluster level (I think it is 8kB by default), it is not worth do it.
By zipping I am undergoing a risk of data loss - one bad byte in a bad place in zip can make the file totally unreadable - and I'll get nothing - I'll pay the same money per file because the space occupied is rounded to 8 kB like the unzipped file.

And I haven't found any info about cluster size setting for blobs.
Maybe the binary data aren't stored in a filesystem, but on some block device. I'm really doubting about it. How will be perfomed deletion, concurrent writes of blocks, ...

· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Azure storage isn't an RDBS as far as I know. It would be very inefficient. The concept of a cluster size also isn't applicable here. You aren't storing raw files in a file system. Here's a summary article on the matter.

If you're using block blobs (the most common) then each "file" is stored in one or more blocks and blocks are 32MB in size I believe. Hence anything up to 32MB fits into a single block. If you had 2 really small files they would each take 1 block (2 blocks total) to store. At least that is how I understand them. If you're using page blobs then they are 512 bytes in size.

If you're using table storage instead then this uses a NoSQL database.

If you're using file storage then you are really just using a file share and now cluster size and whatnot might come into play.

0 Votes 0 ·

In your case if you have 1 million small files (< 32 MB) then you'd need 1 million blocks. If you zipped up the files into a single larger file then you'd need less blocks. However you're only taking into account half the story here. Blob storage is actually very cheap. We store TBs of data in the cloud and most of our files are less than 32 MB. The cost is very low.

If you zip up the files then you will be paying additional costs that you should consider. Firstly the client code would need to extract those files. That is more work. The bigger issue though is the egress out of Azure. You pay for reads from Azure. The cost for reading a single block out of Azure is less than reading a large file that is zipped. Hence if you zip up your files your storage costs may be slightly lower but every time you need any file you have to read the entire zip (or cache it somewhere) and you'll be paying for the reads. In my experience you should optimize for reads (and perhaps writes) and not how much space it actually takes up in Azure as that is not the critical factor.

0 Votes 0 ·
JanVavra81 avatar image
0 Votes"
JanVavra81 answered JanVavra81 commented

The article you have pointed to is saying something about option SingleBlobUploadThresholdInBytes: "the maximum size of a blob in bytes that may be uploaded as a single blob."
As I understand block blobs, understanding-block-blobs--append-blobs--and-page-blobs, the option SingleBlobUploadThresholdInBytes is only saying how the actually inserted data will be split into chunks (blocks), if they're bigger than SingleBlobUploadThresholdInBytes . Also at the docs, is written: "Maximum blob size (via Put Block List): Approximately 190.7 TiB (4000 MiB X 50,000 blocks)".

So there is nothing about minimum billed data amount. The SingleBlobUploadThresholdInBytes affects maximum size of the entire blob.
The use case of stored data is: to store audit analysis data into xml. And there is a small probability that the file will be read, maybe 1:100 000. So read effeciency is not my concern.

At Cost Management and billing at Azure Portal I can see it was billed 0,79 e ZRS data stored for this first week of April.
and in the container stats I can see there is used 390 GiB Blob Capacity and 10 M blobs. 390 GiB / 10 M is 39 kiB. These files have such size. So there I am perfectly sure there is no billing unit like filesystem clusters.

Thanks.

· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

SingleBlobUploadThresholdInBytes is an optimization on the client side and has nothing to do with how things are stored on the server. When sending data to the server the property (from my understanding) determines how much buffer is used to send the data at a time. The larger the buffer the more memory it takes to send but the faster it is. If you are doing parallel uploads then this is a mute point and hence the property value doesn't matter in that case.

The max size of a single block currently in Azure is currently (for v2019+) is 4GB. Each blob is limited to 50K blocks so the max size of a single blob is 4GB * 50K. Note that a blob is a single "file" so a container can be larger than that.

If reads are not important to you then you can zip up the file if you want. However I think you will probably regret that decision. You should perhaps look into use table or file storage instead of blob storage if you are really that concerned about space.

1 Vote 1 ·

Well, you're right https://stackoverflow.com/questions/39237750/increase-azure-blob-block-upload-limit-from-32-mb
SingleBlobUploadThresholdInBytes is about sending not storing.

My concern is a price.
Block blob are the cheapest solution.

0 Votes 0 ·