Unicode and MBCS

Unicode TasksMultibyte Character Set (MBCS) Tasks

The Microsoft Foundation Class Library (MFC), the C run-time library for Visual C++, and the Visual C++ development environment are enabled to assist your international programming. They provide:

  • Support for the Unicode standard on Windows NT.

    Unicode is a 16-bit character encoding, providing enough encodings for all languages. All ASCII characters are included in Unicode as “widened” characters.

    Note   The Unicode standard is not supported on Windows 95.

  • Support for a form of Multibyte Character Set (MBCS) called Double Byte Character Set (DBCS) on all platforms.

    DBCS characters are composed of one or two bytes. Some ranges of bytes are set aside for use as “lead bytes.” A lead byte specifies that it and the following “trail byte” comprise a single two-byte-wide character. You must keep track of which bytes are lead bytes. In a particular multibyte-character set, the lead bytes fall within a certain range, as do the trail bytes. When these ranges overlap, it may be necessary to evaluate the context to determine whether a given byte is functioning as a lead byte or a trail byte.

  • Support for tools that simplify MBCS programming of applications written for international markets.

    When run on an MBCS-enabled version of the Windows NT operating system, the Visual C++ development system — including the integrated source code editor, debugger, and command line tools — is completely MBCS-enabled. For more information, see MBCS Support in Visual C++.

Note   In this documentation, “MBCS” is used to describe all non-Unicode support for wide characters. In Visual C++, MBCS always means DBCS. Character sets wider than two bytes are not supported.

By definition, the ASCII character set is a subset of all multibyte-character sets. In many multibyte character sets, each character in the range 0x00 – 0x7F is identical to the character that has the same value in the ASCII character set. For example, in both ASCII and MBCS character strings, the one-byte NULL character ('\0') has value 0x00 and indicates the terminating null character.

See Also   International Enabling