question

okazakevich-5437 avatar image
0 Votes"
okazakevich-5437 asked okazakevich-5437 answered

Case-insensitive behavior of some Win32 API functions (CompareStringEx etc) and Deseret alphabet

Hi!

I recently discovered that case-insensitive versions of CompareStringEx, CompareStringOrdinal, FindNLSStringEx and
FindStringOrdinal functions don't work correctly with some characters of Deseret alphabet (Unicode range U+10400...1044F).

For example, CompareStringEx for U+10400 and U+10428 (Deseret Capital and Small letters I) returns CSTR_LESS_THAN,
although CSTR_EQUAL expected. Please review the code sample below.

I wrote similar test on Javascript (based on "localecompare") and have check my guesses on a "ICU Unicode String Comparison"
page - both results indicates that those strings are equal.

Is this a bug in Windows API?

Environment: Windows 10 20H2 (build 19042.1165), but I have the same behavior on other Windows releases.

Thanks!

 #include <Windows.h>
 #include <cstdio>
    
 int main()
 {   
     // U+10400 (Deseret Capital Letter Long I)
     wchar_t const left[] = { 0xD801, 0xDC00, 0x0000 };
    
     // U+10428 (Deseret Small Letter Long I)
     wchar_t const right[] = { 0xD801, 0xDC28, 0x0000 };
    
    
     // Prints 1 (CSTR_LESS_THAN). Expected 2 (CSTR_EQUAL).
     printf("%d\r\n", CompareStringEx(LOCALE_NAME_INVARIANT, NORM_IGNORECASE, left, -1, right, -1, NULL, NULL, 0));
    
     // Prints 1 (CSTR_LESS_THAN). Expected 2 (CSTR_EQUAL).
     printf("%d\r\n", CompareStringOrdinal(left, -1, right, -1, TRUE));
    
     // Prints -1 (not found). Expected 0.
     printf("%d\r\n", FindNLSStringEx(LOCALE_NAME_INVARIANT, FIND_FROMSTART | NORM_IGNORECASE, left, -1, right, -1, NULL, NULL, NULL, 0));
    
     // Prints -1 (not found). Expected 0.
     printf("%d\r\n", FindStringOrdinal(FIND_FROMSTART, left, -1, right, -1, TRUE));
    
     return 0;
 }


windows-api-general
· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

I reproduced successfully. Please wait in order to give a conclusion.

0 Votes 0 ·
XiaopoYang-MSFT avatar image
0 Votes"
XiaopoYang-MSFT answered

For the CompareStringEx() API behavior, Someone has got some confirmation from the product group. Here are the comments:

Unfortunately, we don’t have sort weights defined for the Deseret alphabet/script, as we also don’t have any locales in Windows/NLS that use the Deseret script. This means that a CompareStringEx() call will not be able to give you the case-insensitive comparison behavior that you are expecting.
As for alternatives, the ICU (International Components for Unicode) library, available on Windows since RS2, should be able to provide the correct collation behavior for these characters. In particular, the ucol_strcoll API could be used instead of CompareStringEx: API Details - ICU Documentation (unicode-org.github.io).

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

okazakevich-5437 avatar image
0 Votes"
okazakevich-5437 answered

Thank you for the information provided.

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.