1.1 Glossary

Article
07/11/2023

This document uses the following terms:

code page: An ordered set of characters of a specific script in which a numerical index (code-point value) is associated with each character. Code pages are a means of providing support for character sets and keyboard layouts used in different countries. Devices such as the display and keyboard can be configured to use a specific code page and to switch from one code page (such as the United States) to another (such as Portugal) at the user's request.

double-byte character set (DBCS): A character set that can use more than one byte to represent a single character. A DBCS includes some characters that consist of 1 byte and some characters that consist of 2 bytes. Languages such as Chinese, Japanese, and Korean use DBCS.

IDNA2003: The IDNA2003 specification is defined by a cluster of IETF RFCs: IDNA [RFC3490], Nameprep [RFC3491], Punycode [RFC3492], and Stringprep [RFC3454].

IDNA2008: The IDNA2008 specification is defined by a cluster of IETF RFCs: Internationalized Domain Names for Applications (IDNA): Definitions and Document Framework [RFC5890], Internationalized Domain Names in Applications (IDNA) Protocol [RFC5891], The Unicode Code Points and Internationalized Domain Names for Applications (IDNA) [RFC5892], and Right-to-Left Scripts for Internationalized Domain Names for Applications (IDNA) [RFC5893]. There is also an informative document: Internationalized Domain Names for Applications (IDNA): Background, Explanation, and Rationale [RFC5894].

IDNA2008+UTS46: The IDNA2008+UTS46 citation refers to operations that comply with both the and the Unicode IDNA Compatibility Processing [TR46] specifications.

single-byte character set (SBCS): A character encoding in which each character is represented by one byte. Single-byte character sets are limited to 256 characters.

sort key: Numerical representations of a sort element based on locale-specific sorting rules. A sort key consists of several weighted components that represent a character's script, diacritics, case, and additional treatment based on locale.

Unicode: A character encoding standard developed by the Unicode Consortium that represents almost all of the written languages of the world. The Unicode standard [UNICODE5.0.0/2007] provides three forms (UTF-8, UTF-16, and UTF-32) and seven schemes (UTF-8, UTF-16, UTF-16 BE, UTF-16 LE, UTF-32, UTF-32 LE, and UTF-32 BE).

UTF-16: A standard for encoding Unicode characters, defined in the Unicode standard, in which the most commonly used characters are defined as double-byte characters. Unless specified otherwise, this term refers to the UTF-16 encoding form specified in [UNICODE5.0.0/2007] section 3.9.

MAY, SHOULD, MUST, SHOULD NOT, MUST NOT: These terms (in all caps) are used as defined in [RFC2119]. All statements of optional behavior use either MAY, SHOULD, or SHOULD NOT.

1.1 Glossary

Additional resources