单字节和多字节字符集Single-Byte and Multibyte Character Sets

ASCII 字符集在 0x00 - 0x7F 范围内定义字符。The ASCII character set defines characters in the range 0x00 - 0x7F. 还有许多其他字符集(主要是欧洲字符集),它们在 0x00 - 0x7F 范围内定义与 ASCII 字符集相同的字符,还在 0x80 - 0xFF. 范围内定义扩展字符集。There are a number of other character sets, primarily European, that define the characters within the range 0x00 - 0x7F identically to the ASCII character set and also define an extended character set from 0x80 - 0xFF. 因此,8 位的单字节字符集 (SBCS) 足以表示 ASCII 字符集以及许多欧洲语言的字符集。Thus an 8-bit, single-byte-character set (SBCS) is sufficient to represent the ASCII character set as well as the character sets for many European languages. 但是,一些非欧洲语言的字符集(如日本汉字)包含的字符数多于单字节编码方案可表示的字符数,因此需要多字节字符集 (MBCS) 编码。However, some non-European character sets, such as Japanese Kanji, include many more characters than can be represented in a single-byte coding scheme, and therefore require multibyte-character set (MBCS) encoding.

备注

Microsoft 运行库中的许多 SBCS 例程根据需要处理多字节字节、字符和字符串。Many SBCS routines in the Microsoft run-time library handle multibyte bytes, characters, and strings as appropriate. 许多多字节字符集将 ASCII 字符集定义为子集。Many multibyte-character sets define the ASCII character set as a subset. 在许多多字节字符集中,0x00 - 0x7F 范围内的每个字符都与 ASCII 字符集中具有相同值的字符相同。In many multibyte character sets, each character in the range 0x00 - 0x7F is identical to the character that has the same value in the ASCII character set. 例如,在 ASCII 和 MBCS 字符串中,单字节 null 字符(“\0”)的值为 0x00 并指示终止空字符。For example, in both ASCII and MBCS character strings, the one-byte null character ('\0') has value 0x00 and indicates the terminating null character.

多字节字符集可能包括单字节和双字节字符。A multibyte character set may consist of both one-byte and two-byte characters. 因此,多字节字符串可以包含单字节和双字节字符的组合。Thus a multibyte-character string may contain a mixture of single-byte and double-byte characters. 两字节多字节字符具有一个前导字节和一个尾字节。A two-byte multibyte character has a lead byte and a trail byte. 在特定的多字节字符集中,前导字节位于某个范围内,尾字节也是如此。In a particular multibyte-character set, the lead bytes fall within a certain range, as do the trail bytes. 当这两种范围重叠时,可能需要计算特定上下文以确定某个给定的字节是用作前导字节还是尾字节。When these ranges overlap, it may be necessary to evaluate the particular context to determine whether a given byte is functioning as a lead byte or a trail byte.

请参阅See Also

国际化Internationalization
按类别分的通用 C 运行时例程Universal C runtime routines by category