UnicodeCategory 枚举


定义字符的 Unicode 类别。Defines the Unicode category of a character.

public enum class UnicodeCategory
public enum UnicodeCategory
type UnicodeCategory = 
Public Enum UnicodeCategory


ClosePunctuation 21

成对的标点符号(例如括号、方括号和大括号)的结束字符。Closing character of one of the paired punctuation marks, such as parentheses, square brackets, and braces. 由 Unicode 代码“Pe”(标点,结束)表示。Signified by the Unicode designation "Pe" (punctuation, close). 值为 21。The value is 21.

ConnectorPunctuation 18

连接两个字符的连接符标点字符。Connector punctuation character that connects two characters. 由 Unicode 代码“Pc”(标点,连接符)表示。Signified by the Unicode designation "Pc" (punctuation, connector). 值为 18。The value is 18.

Control 14

控制代码字符,其 Unicode 值是 U+007F,或者位于 U+0000 到 U+001F 或 U+0080 到 U+009F 范围内。Control code character, with a Unicode value of U+007F or in the range U+0000 through U+001F or U+0080 through U+009F. 由 Unicode 代码“Cc”(其他,控制)表示。Signified by the Unicode designation "Cc" (other, control). 值为 14。The value is 14.

CurrencySymbol 26

货币符号字符。Currency symbol character. 由 Unicode 代码“Sc”(符号,货币)表示。Signified by the Unicode designation "Sc" (symbol, currency). 值为 26。The value is 26.

DashPunctuation 19

短划线或连字符字符。Dash or hyphen character. 由 Unicode 代码“Pd”(标点,短划线)表示。Signified by the Unicode designation "Pd" (punctuation, dash). 值为 19。The value is 19.

DecimalDigitNumber 8

十进制数字字符,即范围 0 到 9 内的字符。Decimal digit character, that is, a character in the range 0 through 9. 由 Unicode 代码“Nd”(数字,十进制数字)表示。Signified by the Unicode designation "Nd" (number, decimal digit). 值为 8。The value is 8.

EnclosingMark 7

封闭符号字符,它将基字符前面的所有字符(包括基字符)括起来。Enclosing mark character, which is a nonspacing combining character that surrounds all previous characters up to and including a base character. 由 Unicode 代码“Me”(符号,封闭)表示。Signified by the Unicode designation "Me" (mark, enclosing). 值为 7。The value is 7.

FinalQuotePunctuation 23

右引号或后引号字符。Closing or final quotation mark character. 由 Unicode 代码“Pf”(标点,后引号)表示。Signified by the Unicode designation "Pf" (punctuation, final quote). 值为 23。The value is 23.

Format 15

格式字符,它影响文本布局或文本处理操作,但是它通常不会呈现。Format character that affects the layout of text or the operation of text processes, but is not normally rendered. 由 Unicode 代码“Cf”(其他,格式)表示。Signified by the Unicode designation "Cf" (other, format). 值为 15。The value is 15.

InitialQuotePunctuation 22

左引号或前引号字符。Opening or initial quotation mark character. 由 Unicode 代码“Pi”(标点,前引号)表示。Signified by the Unicode designation "Pi" (punctuation, initial quote). 值为 22。The value is 22.

LetterNumber 9

由字母表示的数字,而不是十进制数字,例如,罗马数字 5 由字母“V”表示。Number represented by a letter, instead of a decimal digit, for example, the Roman numeral for five, which is "V". 此指示符由 Unicode 代码“Nl”(数字,字母)表示。The indicator is signified by the Unicode designation "Nl" (number, letter). 值为 9。The value is 9.

LineSeparator 12

用于分隔文本各行的字符。Character that is used to separate lines of text. 由 Unicode 代码“Zl”(分隔符,行)表示。Signified by the Unicode designation "Zl" (separator, line). 值为 12。The value is 12.

LowercaseLetter 1

小写字母。Lowercase letter. 由 Unicode 代码“Ll”(字母,小写)表示。Signified by the Unicode designation "Ll" (letter, lowercase). 值为 1。The value is 1.

MathSymbol 25

数学符号字符,例如“+”或“=”。Mathematical symbol character, such as "+" or "= ". 由 Unicode 代码“Sm”(符号,数学)表示。Signified by the Unicode designation "Sm" (symbol, math). 值为 25。The value is 25.

ModifierLetter 3

修饰符字母字符,它是独立式的间距字符,指示前面字母的修改。Modifier letter character, which is free-standing spacing character that indicates modifications of a preceding letter. 由 Unicode 代码“Lm”(字母,修饰符)表示。Signified by the Unicode designation "Lm" (letter, modifier). 值为 3。The value is 3.

ModifierSymbol 27

修饰符符号字符,它指示环绕字符的修改。Modifier symbol character, which indicates modifications of surrounding characters. 例如,分数斜线号指示其左侧的数字为分子,右侧的数字为分母。For example, the fraction slash indicates that the number to the left is the numerator and the number to the right is the denominator. 此指示符由 Unicode 代码“Sk”(符号,修饰符)表示。The indicator is signified by the Unicode designation "Sk" (symbol, modifier). 值为 27。The value is 27.

NonSpacingMark 5

指示基字符的修改的非间距字符。Nonspacing character that indicates modifications of a base character. 由 Unicode 代码“Mn”(符号,非间距)表示。Signified by the Unicode designation "Mn" (mark, nonspacing). 值为 5。The value is 5.

OpenPunctuation 20

成对的标点符号(例如括号、方括号和大括号)的开始字符。Opening character of one of the paired punctuation marks, such as parentheses, square brackets, and braces. 由 Unicode 代码“Ps”(标点,开始)表示。Signified by the Unicode designation "Ps" (punctuation, open). 值为 20。The value is 20.

OtherLetter 4

不属于大写字母、小写字母、词首字母大写或修饰符字母的字母。Letter that is not an uppercase letter, a lowercase letter, a titlecase letter, or a modifier letter. 由 Unicode 代码“Lo”(字母,其他)表示。Signified by the Unicode designation "Lo" (letter, other). 值为 4。The value is 4.

OtherNotAssigned 29

未指派给任何 Unicode 类别的字符。Character that is not assigned to any Unicode category. 由 Unicode 代码“Cn”(其他,未分配)表示。Signified by the Unicode designation "Cn" (other, not assigned). 值为 29。The value is 29.

OtherNumber 10

既不是十进制数字也不是字母数字的数字,例如分数 1/2。Number that is neither a decimal digit nor a letter number, for example, the fraction 1/2. 此指示符由 Unicode 代码“No”(数字,其他)表示。The indicator is signified by the Unicode designation "No" (number, other). 值为 10。The value is 10.

OtherPunctuation 24

不属于连接符、短划线、开始标点、结束标点、前引号或后引号的标点字符。Punctuation character that is not a connector, a dash, open punctuation, close punctuation, an initial quote, or a final quote. 由 Unicode 代码“Po”(标点,其他)表示。Signified by the Unicode designation "Po" (punctuation, other). 值为 24。The value is 24.

OtherSymbol 28

不属于数学符号、货币符号或修饰符符号的符号字符。Symbol character that is not a mathematical symbol, a currency symbol or a modifier symbol. 由 Unicode 代码“So”(符号,其他)表示。Signified by the Unicode designation "So" (symbol, other). 值为 28。The value is 28.

ParagraphSeparator 13

用于分隔段落的字符。Character used to separate paragraphs. 由 Unicode 代码“Zp”(分隔符,段落)表示。Signified by the Unicode designation "Zp" (separator, paragraph). 值为 13。The value is 13.

PrivateUse 17

专用字符,其 Unicode 值在范围 U+E000 到 U+F8FF 内。Private-use character, with a Unicode value in the range U+E000 through U+F8FF. 由 Unicode 代码“Co”(其他,专用)表示。Signified by the Unicode designation "Co" (other, private use). 值为 17。The value is 17.

SpaceSeparator 11

没有标志符号但不属于控制或格式字符的空白字符。Space character, which has no glyph but is not a control or format character. 由 Unicode 代码“Zs”(分隔符,空白)表示。Signified by the Unicode designation "Zs" (separator, space). 值为 11。The value is 11.

SpacingCombiningMark 6

间距字符,它指示基字符的修改并影响基字符的标志符号的宽度。Spacing character that indicates modifications of a base character and affects the width of the glyph for that base character. 由 Unicode 代码“Mc”(符号,间距组合)表示。Signified by the Unicode designation "Mc" (mark, spacing combining). 值为 6。The value is 6.

Surrogate 16

高代理项或低代理项字符。High surrogate or a low surrogate character. 代理项代码值在范围 U+D800 到 U+DFFF 内。Surrogate code values are in the range U+D800 through U+DFFF. 由 Unicode 代码“Cs”(其他,代理项)表示。Signified by the Unicode designation "Cs" (other, surrogate). 值为 16。The value is 16.

TitlecaseLetter 2

词首字母大写的字母。Titlecase letter. 由 Unicode 代码“Lt”(字母,词首字母大写)表示。Signified by the Unicode designation "Lt" (letter, titlecase). 值为 2。The value is 2.

UppercaseLetter 0

大写字母。Uppercase letter. 由 Unicode 代码“Lu”(字母,大写)表示。Signified by the Unicode designation "Lu" (letter, uppercase). 值为 0。The value is 0.


下面的示例显示 UppercaseLetter 类别中的字符的字符及其相应的码位。The following example displays the characters and their corresponding code points for characters in the UppercaseLetter category. 您可以修改此示例以显示任何其他类别中的字母, 方法是在赋值给category变量时将 UppercaseLetter 替换为您感兴趣的类别。You can modify the example to display the letters in any other category by replacing UppercaseLetter with the category of interest to you in the assignment to the category variable. 请注意, 某些类别的输出可能会很大。Note that the output for some categories can be extensive.

using System;
using System.Globalization;

public class Example
   public static void Main()
      int ctr = 0;
      UnicodeCategory category = UnicodeCategory.UppercaseLetter;
      for (ushort codePoint = 0; codePoint < ushort.MaxValue; codePoint++) {
         Char ch = Convert.ToChar(codePoint);

         if (CharUnicodeInfo.GetUnicodeCategory(ch) == category) {
            if (ctr % 5 == 0)
            Console.Write("{0} (U+{1:X4})     ", ch, codePoint);
      Console.WriteLine("\n{0} characters are in the {1:G} category", 
                        ctr, category);   
Imports System.Globalization

Module Example
   Public Sub Main()
      Dim ctr As Integer = 0
      Dim category As UnicodeCategory = UnicodeCategory.UppercaseLetter
      For codePoint As UShort = 0 To UShort.MaxValue - 1
         Dim ch As Char = Convert.ToChar(codePoint)

         If CharUnicodeInfo.GetUnicodeCategory(ch) = category Then
            If ctr Mod 5 = 0 Then Console.WriteLine()
            Console.Write("{0} (U+{1:X4})     ", ch, codePoint)
            ctr += 1
         End If 
      Console.WriteLine("{0} characters are in the {1:G} category", 
                        ctr, category)   
   End Sub
End Module


UnicodeCategory枚举的成员Char.GetUnicodeCategory由和CharUnicodeInfo.GetUnicodeCategory方法返回。A member of the UnicodeCategory enumeration is returned by the Char.GetUnicodeCategory and CharUnicodeInfo.GetUnicodeCategory methods. 枚举还用于支持Char方法, 如IsUpper(Char)UnicodeCategoryThe UnicodeCategory enumeration is also used to support Char methods, such as IsUpper(Char). 此类方法确定指定的字符是否为特定 Unicode 常规类别的成员。Such methods determine whether a specified character is a member of a particular Unicode general category. Unicode 常规类别定义字符的广泛分类, 即指定为字母、十进制数字、分隔符、数学符号、标点符号等类型。A Unicode general category defines the broad classification of a character, that is, designation as a type of letter, decimal digit, separator, mathematical symbol, punctuation, and so on.

此枚举基于 Unicode 标准5.0 版。This enumeration is based on The Unicode Standard, version 5.0. 有关详细信息,请参阅 Unicode 字符数据库中的“UCD 文件格式”和“常规类别值”子主题。For more information, see the "UCD File Format" and "General Category Values" subtopics at the Unicode Character Database.

Unicode 标准定义以下内容:The Unicode Standard defines the following:

代理项对是单个抽象字符的编码字符表示形式, 它由两个代码单元组成, 其中, 对的第一个单元为高代理项, 第二个是低代理项。A surrogate pair is a coded character representation for a single abstract character that consists of a sequence of two code units, where the first unit of the pair is a high surrogate and the second is a low surrogate. 高代理项是 U + D800 到 U + DBFF 范围内的 Unicode 码位, 低代理项是 U + DC00 到 U + DFFF 范围内的 Unicode 码位。A high surrogate is a Unicode code point in the range U+D800 through U+DBFF and a low surrogate is a Unicode code point in the range U+DC00 through U+DFFF.

组合字符序列是基字符和一个或多个组合字符的组合。A combining character sequence is a combination of a base character and one or more combining characters. 代理项对表示基字符或组合字符。A surrogate pair represents a base character or a combining character. 组合字符可以是空格或非空格。A combining character is either spacing or nonspacing. 间距组合字符在呈现时将自动占用一个间距位置, 而非空格组合字符不会。A spacing combining character takes up a spacing position by itself when rendered, while a nonspacing combining character does not. 音调符号是空格组合字符的一个示例。Diacritics are an example of nonspacing combining characters.

修饰符字母是一个独立式的间距字符, 类似于组合字符, 表示对前一个字母的修改。A modifier letter is a free-standing spacing character that, like a combining character, indicates modifications of a preceding letter.

封闭标记是一个非空格组合字符, 该字符将所有前面的字符置于基字符之前, 包括基字符。An enclosing mark is a nonspacing combining character that surrounds all previous characters up to and including a base character.

格式字符是通常不呈现但会影响文本布局或文本处理操作的字符。A format character is a character that is not normally rendered but that affects the layout of text or the operation of text processes.

Unicode 标准定义了一些标点符号的不同变化形式。The Unicode Standard defines several variations to some punctuation marks. 例如, 连字号可以是表示连字符的几个代码值之一, 例如 U + 002D (连字符-减) 或 u + 00AD (软连字符) 或 u + 2010 (连字符) 或 U + 2011 (不间断连字符)。For example, a hyphen can be one of several code values that represent a hyphen, such as U+002D (hyphen-minus) or U+00AD (soft hyphen) or U+2010 (hyphen) or U+2011 (nonbreaking hyphen). 对于破折号、空格字符和引号也是如此。The same is true for dashes, space characters, and quotation marks.

Unicode 标准还将代码分配给特定于给定脚本或语言的十进制数字的表示形式, 例如, U + 0030 (数字 0) 和 U + 0660 (阿拉伯-印度文数字零)。The Unicode Standard also assigns codes to representations of decimal digits that are specific to a given script or language, for example, U+0030 (digit zero) and U+0660 (Arabic-Indic digit zero).