Working With Unicode Surrogates

The Unicode Standard defines surrogates: a "surrogate" or "surrogate pair" is a pair of 16-bit Unicode code values that represent a single character. Surrogates provide additional character support for the languages that need more than the 65,536 characters in the 16-bit Unicode code space. For example, the Chinese speaking community alone uses over 55,000 characters.

Planes two and three defined in ISO/IEC 10646 are reserved for ideographic characters. These planes fall in the high surrogate range of U+D840 to U+D8BF. The first (high) surrogate is a 16-bit code value in the same range, U+D800 to U+DBFF. The second (low) surrogate is a 16-bit code value in the range U+DC00 to U+DFFF. Using surrogates, Unicode can support over one million characters.

Windows CE provides Unicode surrogate support at the OS level and in Microsoft® Internet Explorer. The support is limited to surrogate handling and display; editing is not supported.

The following list shows the ways Windows CE supports surrogates:

  • The Microsoft OpenType cmap-12 font format directly supports the 4-byte character code. Platform developers can add third party fonts that contain characters that map to surrogates.

  • Windows GDI APIs support cmap 12 so surrogates can be displayed correctly.

  • Windows CE edit control supports display of the characters generated by surrogate pairs.

  • HTML engine supports HTML page that includes display of the characters generated by surrogate pairs.

    Note The implementation used in Window CE differs from the desktop and does not rely on Uniscribe.

For additional information about font standards needed for surrogates, see The OpenType Specification. The specification is available in HTML format for viewing online at this Microsoft Web site.

See Also

Working with Surrogate Pairs | Customizing Fonts | Understanding the Unicode Standard

 Last updated on Friday, April 09, 2004

© 1992-2003 Microsoft Corporation. All rights reserved.