Case-insensitive behavior of some Win32 API functions (CompareStringEx etc) and Deseret alphabet

Question

Hi!

I recently discovered that case-insensitive versions of CompareStringEx, CompareStringOrdinal, FindNLSStringEx and
FindStringOrdinal functions don't work correctly with some characters of Deseret alphabet (Unicode range U+10400...1044F).

For example, CompareStringEx for U+10400 and U+10428 (Deseret Capital and Small letters I) returns CSTR_LESS_THAN,
although CSTR_EQUAL expected. Please review the code sample below.

I wrote similar test on Javascript (based on "localecompare") and have check my guesses on a "ICU Unicode String Comparison"
page - both results indicates that those strings are equal.

Is this a bug in Windows API?

Environment: Windows 10 20H2 (build 19042.1165), but I have the same behavior on other Windows releases.

Thanks!

#include 
#include 

int main()
{   
    // U+10400 (Deseret Capital Letter Long I)
    wchar_t const left[] = { 0xD801, 0xDC00, 0x0000 };

    // U+10428 (Deseret Small Letter Long I)
    wchar_t const right[] = { 0xD801, 0xDC28, 0x0000 };


    // Prints 1 (CSTR_LESS_THAN). Expected 2 (CSTR_EQUAL).
    printf("%d
", CompareStringEx(LOCALE_NAME_INVARIANT, NORM_IGNORECASE, left, -1, right, -1, NULL, NULL, 0));

    // Prints 1 (CSTR_LESS_THAN). Expected 2 (CSTR_EQUAL).
    printf("%d
", CompareStringOrdinal(left, -1, right, -1, TRUE));

    // Prints -1 (not found). Expected 0.
    printf("%d
", FindNLSStringEx(LOCALE_NAME_INVARIANT, FIND_FROMSTART | NORM_IGNORECASE, left, -1, right, -1, NULL, NULL, NULL, 0));

    // Prints -1 (not found). Expected 0.
    printf("%d
", FindStringOrdinal(FIND_FROMSTART, left, -1, right, -1, TRUE));

    return 0;
}

Accepted Answer

For the CompareStringEx() API behavior, Someone has got some confirmation from the product group. Here are the comments:

Unfortunately, we don’t have sort weights defined for the Deseret alphabet/script, as we also don’t have any locales in Windows/NLS that use the Deseret script. This means that a CompareStringEx() call will not be able to give you the case-insensitive comparison behavior that you are expecting.
As for alternatives, the ICU (International Components for Unicode) library, available on Windows since RS2, should be able to provide the correct collation behavior for these characters. In particular, the ucol_strcoll API could be used instead of CompareStringEx: API Details - ICU Documentation (unicode-org.github.io).

Answer

Thank you for the information provided.

Case-insensitive behavior of some Win32 API functions (CompareStringEx etc) and Deseret alphabet

1 additional answer