question

Emilia-1502 avatar image
1 Vote"
Emilia-1502 asked Emilia-1502 commented

Some unicode character combinations are invalid in paths when used together, but not seperately

Problem: some unicode character combinations are invalid in paths when used together but not seperately

We tried to upload blobs with some character combination in the blob name. Some combinations got rejected both in the portal and using REST API

I tested with combinations of the following unicode characters
U+103A3 [OLD PERSIAN SIGN KA] 𐎣
U+FFFC [OBJECT REPLACEMENT CHARACTER] 
U+FE69 [SMALL DOLLAR SIGN] ﹩

Working blobnames:
unicode/_2weirdordering.txt
u/_2weirdordering.txt
u/_2weirdordering_2weirdordering.txt
u/_2weirdordering.txt
u𐎣_3/﹩_smalldollar.txt

Not working:
u𐎣_3/_2weirdordering.txt
u𐎣_345/_2weirdordering.txt
u𐎣_3_2weirdordering.txt
u𐎣_3/t_2weirdordering.txt
u𐎣_3/u𐎣_4/_2weirdordering.txt

When trying to upload these files through the portal you get "Failed to validate file names."
Upload through the REST API also fail for the same set of files.

azure-blob-storage
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

1 Answer

deherman-MSFT avatar image
0 Votes"
deherman-MSFT answered Emilia-1502 commented

@Emilia-1502
Apologies for the delayed response. For information on what Unicode characters are allowed please this section here. If you feel that the characters should be allowed but are still not please let me know and I can forward your request to the service team.



Please don’t forget to "Accept the answer" and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.

· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

The outstanding problem is that while we can (and already do) ban encoded characters that do not appear in the ucschar range from rfc3987 through our application, we cannot control what our users have already uploaded with different software, and these blobnames can include characters such as U+FFFC (since azure itself does not consistently prevent these being uploaded).

We are concerned about providing support to customers where we cannot articulate which blobnames are and are not allowed in Azure. It is unlikely to be acceptable for a customer to have to upload the majority of a large data set before finding that their chosen blobnames will be problematic. Equally it is unlikely to be acceptable for a customer to be advised conservatively that blobnames that they know to work in Azure will not work with our software.

More importantly, given that this issue features a combination of characters interacting badly, can it be confirmed that this issue arises specifically because one of the characters is not permitted by rfc3987? Or might this potentially also happen with some combinations of otherwise valid characters?

0 Votes 0 ·