Share via


Unicode and Multibyte Character Set (MBCS) Support

Some international markets use languages, such as Japanese and Chinese, with large character sets. To support programming for these markets, the Microsoft Foundation Class Library (MFC) is enabled for two different approaches to handling large character sets:

  • Unicode

  • Multibyte Character Sets (MBCS)

MFC Support for Unicode Strings

The entire class library is conditionally enabled for Unicode characters and strings. In particular, class CString is Unicode-enabled.

Nota

The Unicode versions of the MFC libraries are not copied to your hard disk unless you select them during a Custom installation. They are not copied during other types of installation. If you attempt to build or run an MFC Unicode application without the MFC Unicode files, you may get errors.

To copy the files to your hard disk, rerun Setup and click Add/Remove Features. Click Language Tools, click Visual C++, and click Visual C++ Class & Template Libraries, and select both ATL MFC Shared Libraries Unicode and ATL MFC Static Libraries Unicode. This will copy the following files to your hard drive:

UAFXCW.LIB

UAFXCW.PDB

UAFXCWD.LIB

UAFXCWD.PDB

MFCxxU.LIB

MFCxxU.PDB

MFCxxU.DLL

MFCxxUD.LIB

MFCxxUD.PDB

MFCxxUD.DLL

MFCSxxU.LIB

MFCSxxU.PDB

MFCSxxUD.LIB

MFCSxxUD.PDB

MFCMxxU.LIB

MFCMxxU.PDB

MFCMxxU.DLL

MFCMxxUD.LIB

MFCMxxUD.PDB

MFCMxxUD.DLL

where xx represents the version number of the file; for example, '80' represents version 8.0.

CString is based on the TCHAR data type. If the symbol _UNICODE is defined for a build of your program, TCHAR is defined as type wchar_t, a 16-bit character encoding type; otherwise, it is defined as char, the normal 8-bit character encoding. Under Unicode, then, CStrings are composed of 16-bit characters. Without Unicode, they are composed of characters of type char.

To complete Unicode programming of your application, you must also:

  • Use the _T macro to conditionally code literal strings to be portable to Unicode.

  • When you pass strings, pay attention to whether function arguments require a length in characters or a length in bytes. The difference is important if you're using Unicode strings.

  • Use portable versions of the C run-time string-handling functions.

  • Use the following data types for characters and character pointers:

    • TCHAR   Where you would use char.

    • LPTSTR   Where you would use char*.

    • LPCTSTR   Where you would use const char*. CString provides the operator LPCTSTR to convert between CString and LPCTSTR.

CString also supplies Unicode-aware constructors, assignment operators, and comparison operators.

For related information on Unicode programming, see Unicode and MBCS and Unicode Topics. The Run-Time Library Reference defines portable versions of all its string-handling functions. See the category Internationalization.

MFC Support for MBCS Strings

The class library is also enabled for multibyte character sets — specifically for double-byte character sets (DBCS).

Under this scheme, a character can be either one or two bytes wide. If it is two bytes wide, its first byte is a special "lead byte," chosen from a particular range depending on which code page is in use. Taken together, the lead and "trail bytes" specify a unique character encoding.

If the symbol _MBCS is defined for a build of your program, type TCHAR, on which CString is based, maps to char. It's up to you to determine which bytes in a CString are lead bytes and which are trail bytes. The C run-time library supplies functions to help you determine this.

Under DBCS, a given string can contain all single-byte ANSI characters, all double-byte characters, or a combination of the two. These possibilities require special care in parsing strings, including CString objects.

Nota

Unicode string serialization in MFC can read both Unicode and MBCS strings regardless of which version of the application you are running. Because of this, your data files are portable between Unicode and MBCS versions of your program.

CString member functions use special "generic text" versions of the C run-time functions they call, or they use Unicode-aware functions. Thus, for example, if a CString function would normally call strcmp, it calls the corresponding generic-text function _tcscmp instead. Depending on how the symbols _MBCS and _UNICODE are defined, _tcscmp maps as follows:

_MBCS defined

_mbscmp

_UNICODE defined

wcscmp

Neither symbol defined

strcmp

Nota

The symbols _MBCS and _UNICODE are mutually exclusive.

Generic-text function mappings for all of the run-time string-handling routines are detailed in the Run-Time Library Reference. See the category Internationalization.

Similarly, CString member functions are implemented using "generic" data type mappings. To enable both MBCS and Unicode, MFC uses TCHAR for char, LPTSTR for char*, and LPCTSTR for const char*. These result in the correct mappings for either MBCS or Unicode.

See Also

Concepts

Strings (ATL/MFC)