Character Sets

A "character set" is a mapping of characters to their identifying code values. The character set most commonly used in computers today is Unicode, a global standard for character encoding. Internally, Windows applications use the UTF-16 implementation of Unicode. In UTF-16, most characters are identified by two-byte codes. The less commonly used supplementary characters are each represented by a surrogate pair, which is a pair of two-byte codes. For more information, see Surrogates and Supplementary Characters.

Some Windows applications must work with the older character sets that are native to Windows Me/98/95. Windows code pages allow your application to work with these character sets. These character sets can be divided into:

  • Single-byte character sets (SBCS). In an SBCS, each character is identified by a value one byte wide.
  • Multibyte character sets, in particular the double-byte character sets (DBCS). Multibyte character sets provide a means to represent the large number of characters in many Asian languages.

For more information, see the following topics:

About Unicode and Character Sets