NormalizationForm 枚举

定义

定义要执行的规范化的类型。Defines the type of normalization to perform.

public enum class NormalizationForm
[System.Runtime.InteropServices.ComVisible(true)]
public enum NormalizationForm
type NormalizationForm = 
Public Enum NormalizationForm
继承
NormalizationForm
属性

字段

FormC 1

指示 Unicode 字符串使用完全标准分解进行规范化,然后将序列替换为其主复合(如果可能)。Indicates that a Unicode string is normalized using full canonical decomposition, followed by the replacement of sequences with their primary composites, if possible.

FormD 2

指示 Unicode 字符串使用完全标准分解进行规范化。Indicates that a Unicode string is normalized using full canonical decomposition.

FormKC 5

指示 Unicode 字符串使用完全兼容分解进行规范化,然后将序列替换为其主复合(如果可能)。Indicates that a Unicode string is normalized using full compatibility decomposition, followed by the replacement of sequences with their primary composites, if possible.

FormKD 6

指示 Unicode 字符串使用完全兼容分解进行规范化。Indicates that a Unicode string is normalized using full compatibility decomposition.

注解

某些 Unicode 序列被视为等效, 因为它们表示相同的字符。Some Unicode sequences are considered equivalent because they represent the same character. 例如, 以下内容被视为等效项, 因为其中的任何一种都可用于表示 "ắ":For example, the following are considered equivalent because any of these can be used to represent "ắ":

  • "\u1EAF""\u1EAF"

  • "\u0103\u0301""\u0103\u0301"

  • "\u0061\u0306\u0301""\u0061\u0306\u0301"

但是, 序号 (即二进制) 比较认为这些序列不同, 因为它们包含不同的 Unicode 代码值。However, ordinal, that is, binary, comparisons consider these sequences different because they contain different Unicode code values. 执行序号比较之前, 应用程序必须规范化这些字符串, 以便将它们分解为基本组件。Before performing ordinal comparisons, applications must normalize these strings to decompose them into their basic components.

每个复合 Unicode 字符映射到一个或多个字符的更基本的序列。Each composite Unicode character is mapped to a more basic sequence of one or more characters. 分解过程使用更基本的映射替换字符串中的组合键。The process of decomposition replaces composite characters in a string with their more basic mappings. 完全分解将以递归方式执行此替换, 直到无法进一步分解字符串中的任何字符。A full decomposition recursively performs this replacement until none of the characters in the string can be decomposed further.

Unicode 定义了两种类型的分解: 兼容性分解和规范分解。Unicode defines two types of decompositions: compatibility decomposition and canonical decomposition. 在兼容性分解中, 格式设置信息可能会丢失。In compatibility decomposition, formatting information might be lost. 在规范分解 (这是兼容性分解的子集) 中, 保留格式设置信息。In canonical decomposition, which is a subset of compatibility decomposition, formatting information is preserved.

如果两组字符的完整规范分解相同, 则将其视为具有规范等效性。Two sets of characters are considered to have canonical equivalence if their full canonical decompositions are identical. 同样, 如果两个字符集的完全兼容性分解完全相同, 则认为它们具有兼容性等效性。Likewise, two sets of characters are considered to have compatibility equivalence if their full compatibility decompositions are identical.

有关规范化、分解和等效性的详细信息, 请参阅 Unicode 标准附录 #15:Unicode.org上的 Unicode 范式。For more information about normalization, decompositions and equivalence, see Unicode Standard Annex #15: Unicode Normalization Forms at unicode.org.

适用于

另请参阅