Case-insensitive behavior of some Win32 API functions (CompareStringEx etc) and Deseret alphabet

o.kazakevich 21 Reputation points
2021-08-27T10:21:47.75+00:00

Hi!

I recently discovered that case-insensitive versions of CompareStringEx, CompareStringOrdinal, FindNLSStringEx and
FindStringOrdinal functions don't work correctly with some characters of Deseret alphabet (Unicode range U+10400...1044F).

For example, CompareStringEx for U+10400 and U+10428 (Deseret Capital and Small letters I) returns CSTR_LESS_THAN,
although CSTR_EQUAL expected. Please review the code sample below.

I wrote similar test on Javascript (based on "localecompare") and have check my guesses on a "ICU Unicode String Comparison"
page - both results indicates that those strings are equal.

Is this a bug in Windows API?

Environment: Windows 10 20H2 (build 19042.1165), but I have the same behavior on other Windows releases.

Thanks!

#include <Windows.h>
#include <cstdio>

int main()
{   
    // U+10400 (Deseret Capital Letter Long I)
    wchar_t const left[] = { 0xD801, 0xDC00, 0x0000 };

    // U+10428 (Deseret Small Letter Long I)
    wchar_t const right[] = { 0xD801, 0xDC28, 0x0000 };


    // Prints 1 (CSTR_LESS_THAN). Expected 2 (CSTR_EQUAL).
    printf("%d\r\n", CompareStringEx(LOCALE_NAME_INVARIANT, NORM_IGNORECASE, left, -1, right, -1, NULL, NULL, 0));

    // Prints 1 (CSTR_LESS_THAN). Expected 2 (CSTR_EQUAL).
    printf("%d\r\n", CompareStringOrdinal(left, -1, right, -1, TRUE));

    // Prints -1 (not found). Expected 0.
    printf("%d\r\n", FindNLSStringEx(LOCALE_NAME_INVARIANT, FIND_FROMSTART | NORM_IGNORECASE, left, -1, right, -1, NULL, NULL, NULL, 0));

    // Prints -1 (not found). Expected 0.
    printf("%d\r\n", FindStringOrdinal(FIND_FROMSTART, left, -1, right, -1, TRUE));

    return 0;
}
Windows API - Win32
Windows API - Win32
A core set of Windows application programming interfaces (APIs) for desktop and server applications. Previously known as Win32 API.
2,416 questions
{count} votes

Accepted answer
  1. Xiaopo Yang - MSFT 11,336 Reputation points Microsoft Vendor
    2021-09-29T05:27:54.517+00:00

    For the CompareStringEx() API behavior, Someone has got some confirmation from the product group. Here are the comments:

    Unfortunately, we don’t have sort weights defined for the Deseret alphabet/script, as we also don’t have any locales in Windows/NLS that use the Deseret script. This means that a CompareStringEx() call will not be able to give you the case-insensitive comparison behavior that you are expecting.
    As for alternatives, the ICU (International Components for Unicode) library, available on Windows since RS2, should be able to provide the correct collation behavior for these characters. In particular, the ucol_strcoll API could be used instead of CompareStringEx: API Details - ICU Documentation (unicode-org.github.io).

    0 comments No comments

1 additional answer

Sort by: Most helpful
  1. o.kazakevich 21 Reputation points
    2021-09-29T19:23:06.107+00:00

    Thank you for the information provided.

    0 comments No comments