CharUnicodeInfo.GetUnicodeCategory 方法

定义

获取 Unicode 字符的 Unicode 类别。Gets the Unicode category of a Unicode character.

重载

GetUnicodeCategory(Char)

获取指定字符的 Unicode 类别。Gets the Unicode category of the specified character.

GetUnicodeCategory(Int32)

获取指定字符的 Unicode 类别。Gets the Unicode category of the specified character.

GetUnicodeCategory(String, Int32)

获取位于指定字符串的指定索引处的字符的 Unicode 类别。Gets the Unicode category of the character at the specified index of the specified string.

GetUnicodeCategory(Char)

获取指定字符的 Unicode 类别。Gets the Unicode category of the specified character.

public:
 static System::Globalization::UnicodeCategory GetUnicodeCategory(char ch);
public static System.Globalization.UnicodeCategory GetUnicodeCategory (char ch);
static member GetUnicodeCategory : char -> System.Globalization.UnicodeCategory
Public Shared Function GetUnicodeCategory (ch As Char) As UnicodeCategory

参数

ch
Char

要获取其 Unicode 类别的 Unicode 字符。The Unicode character for which to get the Unicode category.

返回

指示指定字符类别的 UnicodeCategory 值。A UnicodeCategory value indicating the category of the specified character.

示例

下面的代码示例显示了每种方法为不同类型的字符返回的值。The following code example shows the values returned by each method for different types of characters.

using namespace System;
using namespace System::Globalization;
void PrintProperties( Char c );
int main()
{
   Console::WriteLine( "                                        c  Num   Dig   Dec   UnicodeCategory" );
   Console::Write( "U+0061 LATIN SMALL LETTER A            " );
   PrintProperties( L'a' );
   Console::Write( "U+0393 GREEK CAPITAL LETTER GAMMA      " );
   PrintProperties( L'\u0393' );
   Console::Write( "U+0039 DIGIT NINE                      " );
   PrintProperties( L'9' );
   Console::Write( "U+00B2 SUPERSCRIPT TWO                 " );
   PrintProperties( L'\u00B2' );
   Console::Write( "U+00BC VULGAR FRACTION ONE QUARTER     " );
   PrintProperties( L'\u00BC' );
   Console::Write( "U+0BEF TAMIL DIGIT NINE                " );
   PrintProperties( L'\u0BEF' );
   Console::Write( "U+0BF0 TAMIL NUMBER TEN                " );
   PrintProperties( L'\u0BF0' );
   Console::Write( "U+0F33 TIBETAN DIGIT HALF ZERO         " );
   PrintProperties( L'\u0F33' );
   Console::Write( "U+2788 CIRCLED SANS-SERIF DIGIT NINE   " );
   PrintProperties( L'\u2788' );
}

void PrintProperties( Char c )
{
   Console::Write( " {0,-3}", c );
   Console::Write( " {0,-5}", CharUnicodeInfo::GetNumericValue( c ) );
   Console::Write( " {0,-5}", CharUnicodeInfo::GetDigitValue( c ) );
   Console::Write( " {0,-5}", CharUnicodeInfo::GetDecimalDigitValue( c ) );
   Console::WriteLine( "{0}", CharUnicodeInfo::GetUnicodeCategory( c ) );
}

/*
This code produces the following output.  Some characters might not display at the console.

                                        c  Num   Dig   Dec   UnicodeCategory
U+0061 LATIN SMALL LETTER A             a   -1    -1    -1   LowercaseLetter
U+0393 GREEK CAPITAL LETTER GAMMA       \u0393   -1    -1    -1   UppercaseLetter
U+0039 DIGIT NINE                       9   9     9     9    DecimalDigitNumber
U+00B2 SUPERSCRIPT TWO                  \u00B2   2     2     2    OtherNumber
U+00BC VULGAR FRACTION ONE QUARTER      \u00BC   0.25  -1    -1   OtherNumber
U+0BEF TAMIL DIGIT NINE                 \u0BEF   9     9     9    DecimalDigitNumber
U+0BF0 TAMIL NUMBER TEN                 \u0BF0   10    -1    -1   OtherNumber
U+0F33 TIBETAN DIGIT HALF ZERO          \u0F33   -0.5  -1    -1   OtherNumber
U+2788 CIRCLED SANS-SERIF DIGIT NINE    \u2788   9     9     -1   OtherNumber

*/
using System;
using System.Globalization;

public class SamplesCharUnicodeInfo  {

   public static void Main()  {

      Console.WriteLine( "                                        c  Num   Dig   Dec   UnicodeCategory" );

      Console.Write( "U+0061 LATIN SMALL LETTER A            " );
      PrintProperties( 'a' );

      Console.Write( "U+0393 GREEK CAPITAL LETTER GAMMA      " );
      PrintProperties( '\u0393' );

      Console.Write( "U+0039 DIGIT NINE                      " );
      PrintProperties( '9' );

      Console.Write( "U+00B2 SUPERSCRIPT TWO                 " );
      PrintProperties( '\u00B2' );

      Console.Write( "U+00BC VULGAR FRACTION ONE QUARTER     " );
      PrintProperties( '\u00BC' );

      Console.Write( "U+0BEF TAMIL DIGIT NINE                " );
      PrintProperties( '\u0BEF' );

      Console.Write( "U+0BF0 TAMIL NUMBER TEN                " );
      PrintProperties( '\u0BF0' );

      Console.Write( "U+0F33 TIBETAN DIGIT HALF ZERO         " );
      PrintProperties( '\u0F33' );

      Console.Write( "U+2788 CIRCLED SANS-SERIF DIGIT NINE   " );
      PrintProperties( '\u2788' );

   }

   public static void PrintProperties( char c )  {
      Console.Write( " {0,-3}", c );
      Console.Write( " {0,-5}", CharUnicodeInfo.GetNumericValue( c ) );
      Console.Write( " {0,-5}", CharUnicodeInfo.GetDigitValue( c ) );
      Console.Write( " {0,-5}", CharUnicodeInfo.GetDecimalDigitValue( c ) );
      Console.WriteLine( "{0}", CharUnicodeInfo.GetUnicodeCategory( c ) );
   }

}


/*
This code produces the following output.  Some characters might not display at the console.

                                        c  Num   Dig   Dec   UnicodeCategory
U+0061 LATIN SMALL LETTER A             a   -1    -1    -1   LowercaseLetter
U+0393 GREEK CAPITAL LETTER GAMMA       \u0393   -1    -1    -1   UppercaseLetter
U+0039 DIGIT NINE                       9   9     9     9    DecimalDigitNumber
U+00B2 SUPERSCRIPT TWO                  \u00B2   2     2     2    OtherNumber
U+00BC VULGAR FRACTION ONE QUARTER      \u00BC   0.25  -1    -1   OtherNumber
U+0BEF TAMIL DIGIT NINE                 \u0BEF   9     9     9    DecimalDigitNumber
U+0BF0 TAMIL NUMBER TEN                 \u0BF0   10    -1    -1   OtherNumber
U+0F33 TIBETAN DIGIT HALF ZERO          \u0F33   -0.5  -1    -1   OtherNumber
U+2788 CIRCLED SANS-SERIF DIGIT NINE    \u2788   9     9     -1   OtherNumber

*/

Imports System.Globalization

Public Class SamplesCharUnicodeInfo   

   Public Shared Sub Main()

      Console.WriteLine("                                        c  Num   Dig   Dec   UnicodeCategory")

      Console.Write("U+0061 LATIN SMALL LETTER A            ")
      PrintProperties("a"c)

      Console.Write("U+0393 GREEK CAPITAL LETTER GAMMA      ")
      PrintProperties(ChrW(&H0393))

      Console.Write("U+0039 DIGIT NINE                      ")
      PrintProperties("9"c)

      Console.Write("U+00B2 SUPERSCRIPT TWO                 ")
      PrintProperties(ChrW(&H00B2))

      Console.Write("U+00BC VULGAR FRACTION ONE QUARTER     ")
      PrintProperties(ChrW(&H00BC))

      Console.Write("U+0BEF TAMIL DIGIT NINE                ")
      PrintProperties(ChrW(&H0BEF))

      Console.Write("U+0BF0 TAMIL NUMBER TEN                ")
      PrintProperties(ChrW(&H0BF0))

      Console.Write("U+0F33 TIBETAN DIGIT HALF ZERO         ")
      PrintProperties(ChrW(&H0F33))

      Console.Write("U+2788 CIRCLED SANS-SERIF DIGIT NINE   ")
      PrintProperties(ChrW(&H2788))

   End Sub

   Public Shared Sub PrintProperties(c As Char)
      Console.Write(" {0,-3}", c)
      Console.Write(" {0,-5}", CharUnicodeInfo.GetNumericValue(c))
      Console.Write(" {0,-5}", CharUnicodeInfo.GetDigitValue(c))
      Console.Write(" {0,-5}", CharUnicodeInfo.GetDecimalDigitValue(c))
      Console.WriteLine("{0}", CharUnicodeInfo.GetUnicodeCategory(c))
   End Sub

End Class


'This code produces the following output.  Some characters might not display at the console.
'
'                                        c  Num   Dig   Dec   UnicodeCategory
'U+0061 LATIN SMALL LETTER A             a   -1    -1    -1   LowercaseLetter
'U+0393 GREEK CAPITAL LETTER GAMMA       \u0393   -1    -1    -1   UppercaseLetter
'U+0039 DIGIT NINE                       9   9     9     9    DecimalDigitNumber
'U+00B2 SUPERSCRIPT TWO                  \u00B2   2     2     2    OtherNumber
'U+00BC VULGAR FRACTION ONE QUARTER      \u00BC   0.25  -1    -1   OtherNumber
'U+0BEF TAMIL DIGIT NINE                 \u0BEF   9     9     9    DecimalDigitNumber
'U+0BF0 TAMIL NUMBER TEN                 \u0BF0   10    -1    -1   OtherNumber
'U+0F33 TIBETAN DIGIT HALF ZERO          \u0F33   -0.5  -1    -1   OtherNumber
'U+2788 CIRCLED SANS-SERIF DIGIT NINE    \u2788   9     9     -1   OtherNumber

注解

Unicode 字符分为多个类别。The Unicode characters are divided into categories. 字符的类别是其属性之一。A character's category is one of its properties. 例如,字符可以是大写字母、小写字母、十进制数字、字母、数字、连接符标点、数学符号或货币符号。For example, a character might be an uppercase letter, a lowercase letter, a decimal digit number, a letter number, a connector punctuation, a math symbol, or a currency symbol. UnicodeCategory 类返回 Unicode 字符的类别。The UnicodeCategory class returns the category of a Unicode character. 有关 Unicode 字符的详细信息,请参阅Unicode 标准For more information on Unicode characters, see the Unicode Standard.

GetUnicodeCategory 方法假定 ch 对应于单个语言字符并返回其类别。The GetUnicodeCategory method assumes that ch corresponds to a single linguistic character and returns its category. 这意味着,对于代理项对,它将返回 UnicodeCategory.Surrogate 而不是代理项所属的类别。This means that, for surrogate pairs, it returns UnicodeCategory.Surrogate instead of the category to which the surrogate belongs. 例如,Ugaritic 字母表将码位 U + 10380 占据 U + 1039F。For example, the Ugaritic alphabet occupies code points U+10380 to U+1039F. 下面的示例使用 ConvertFromUtf32 方法来实例化一个字符串,该字符串表示 UGARITIC 字母 ALPA (U + 10380),这是 Ugaritic 字母表中的第一个字母。The following example uses the ConvertFromUtf32 method to instantiate a string that represents UGARITIC LETTER ALPA (U+10380), which is the first letter of the Ugaritic alphabet. 如示例的输出所示,如果传递给此字符的高代理项或低代理项,则 IsNumber(Char) 方法返回 falseAs the output from the example shows, the IsNumber(Char) method returns false if it is passed either the high surrogate or the low surrogate of this character.

int utf32 = 0x10380;       // UGARITIC LETTER ALPA
string surrogate = Char.ConvertFromUtf32(utf32);
foreach (var ch in surrogate)
   Console.WriteLine("U+{0:X4}: {1:G}", 
                     Convert.ToUInt16(ch), 
                     System.Globalization.CharUnicodeInfo.GetUnicodeCategory(ch));
// The example displays the following output:
//       U+D800: Surrogate
//       U+DF80: Surrogate
Dim utf32 As Integer = &h10380       ' UGARITIC LETTER ALPA
Dim surrogate As String = Char.ConvertFromUtf32(utf32)
For Each ch In surrogate
   Console.WriteLine("U+{0:X4}: {1:G}", 
                     Convert.ToUInt16(ch), 
                     System.Globalization.CharUnicodeInfo.GetUnicodeCategory(ch))
Next
' The example displays the following output:
'       U+D800: Surrogate
'       U+DF80: Surrogate

请注意,在将特定字符作为参数传递时,CharUnicodeInfo.GetUnicodeCategory 不会始终返回与 Char.GetUnicodeCategory 方法相同的 UnicodeCategory 值。Note that CharUnicodeInfo.GetUnicodeCategory does not always return the same UnicodeCategory value as the Char.GetUnicodeCategory method when passed a particular character as a parameter. CharUnicodeInfo.GetUnicodeCategory 方法旨在反映 Unicode 标准的当前版本。The CharUnicodeInfo.GetUnicodeCategory method is designed to reflect the current version of the Unicode standard. 相反,尽管 Char.GetUnicodeCategory 方法通常反映 Unicode 标准的当前版本,但它可能会根据标准的以前版本返回字符的类别,或者可能返回不同于当前标准的类别,以保留向后兼容性。In contrast, although the Char.GetUnicodeCategory method usually reflects the current version of the Unicode standard, it might return a character's category based on a previous version of the standard, or it might return a category that differs from the current standard to preserve backward compatibility.

另请参阅

GetUnicodeCategory(Int32)

获取指定字符的 Unicode 类别。Gets the Unicode category of the specified character.

public:
 static System::Globalization::UnicodeCategory GetUnicodeCategory(int codePoint);
public static System.Globalization.UnicodeCategory GetUnicodeCategory (int codePoint);
static member GetUnicodeCategory : int -> System.Globalization.UnicodeCategory
Public Shared Function GetUnicodeCategory (codePoint As Integer) As UnicodeCategory

参数

codePoint
Int32

一个数字,表示 Unicode 字符的 32 位码位值。A number representing the 32-bit code point value of the Unicode character.

返回

指示指定字符类别的 UnicodeCategory 值。A UnicodeCategory value indicating the category of the specified character.

GetUnicodeCategory(String, Int32)

获取位于指定字符串的指定索引处的字符的 Unicode 类别。Gets the Unicode category of the character at the specified index of the specified string.

public:
 static System::Globalization::UnicodeCategory GetUnicodeCategory(System::String ^ s, int index);
public static System.Globalization.UnicodeCategory GetUnicodeCategory (string s, int index);
static member GetUnicodeCategory : string * int -> System.Globalization.UnicodeCategory
Public Shared Function GetUnicodeCategory (s As String, index As Integer) As UnicodeCategory

参数

s
String

包含要获取其 Unicode 类别的 Unicode 字符的 StringThe String containing the Unicode character for which to get the Unicode category.

index
Int32

要获取其 Unicode 类别的 Unicode 字符的索引。The index of the Unicode character for which to get the Unicode category.

返回

指示位于指定字符串的指定索引处的字符类别的 UnicodeCategory 值。A UnicodeCategory value indicating the category of the character at the specified index of the specified string.

异常

snulls is null.

index 超出 s 中的有效索引范围。index is outside the range of valid indexes in s.

示例

下面的代码示例显示了每种方法为不同类型的字符返回的值。The following code example shows the values returned by each method for different types of characters.

using namespace System;
using namespace System::Globalization;
int main()
{
   
   // The String to get information for.
   String^ s = "a9\u0393\u00B2\u00BC\u0BEF\u0BF0\u2788";
   Console::WriteLine( "String: {0}", s );
   
   // Print the values for each of the characters in the string.
   Console::WriteLine( "index c  Num   Dig   Dec   UnicodeCategory" );
   for ( int i = 0; i < s->Length; i++ )
   {
      Console::Write( "{0,-5} {1,-3}", i, s[ i ] );
      Console::Write( " {0,-5}", CharUnicodeInfo::GetNumericValue( s, i ) );
      Console::Write( " {0,-5}", CharUnicodeInfo::GetDigitValue( s, i ) );
      Console::Write( " {0,-5}", CharUnicodeInfo::GetDecimalDigitValue( s, i ) );
      Console::WriteLine( "{0}", CharUnicodeInfo::GetUnicodeCategory( s, i ) );

   }
}

/*
This code produces the following output.  Some characters might not display at the console.

String: a9\u0393\u00B2\u00BC\u0BEF\u0BF0\u2788
index c  Num   Dig   Dec   UnicodeCategory
0     a   -1    -1    -1   LowercaseLetter
1     9   9     9     9    DecimalDigitNumber
2     \u0393   -1    -1    -1   UppercaseLetter
3     \u00B2   2     2     2    OtherNumber
4     \u00BC   0.25  -1    -1   OtherNumber
5     \u0BEF   9     9     9    DecimalDigitNumber
6     \u0BF0   10    -1    -1   OtherNumber
7     \u2788   9     9     -1   OtherNumber

*/
using System;
using System.Globalization;

public class SamplesCharUnicodeInfo  {

   public static void Main()  {

      // The String to get information for.
      String s = "a9\u0393\u00B2\u00BC\u0BEF\u0BF0\u2788";
      Console.WriteLine( "String: {0}", s );

      // Print the values for each of the characters in the string.
      Console.WriteLine( "index c  Num   Dig   Dec   UnicodeCategory" );
      for ( int i = 0; i < s.Length; i++ )  {
         Console.Write( "{0,-5} {1,-3}", i, s[i] );
         Console.Write( " {0,-5}", CharUnicodeInfo.GetNumericValue( s, i ) );
         Console.Write( " {0,-5}", CharUnicodeInfo.GetDigitValue( s, i ) );
         Console.Write( " {0,-5}", CharUnicodeInfo.GetDecimalDigitValue( s, i ) );
         Console.WriteLine( "{0}", CharUnicodeInfo.GetUnicodeCategory( s, i ) );
      }

   }

}


/*
This code produces the following output.  Some characters might not display at the console.

String: a9\u0393\u00B2\u00BC\u0BEF\u0BF0\u2788
index c  Num   Dig   Dec   UnicodeCategory
0     a   -1    -1    -1   LowercaseLetter
1     9   9     9     9    DecimalDigitNumber
2     \u0393   -1    -1    -1   UppercaseLetter
3     \u00B2   2     2     2    OtherNumber
4     \u00BC   0.25  -1    -1   OtherNumber
5     \u0BEF   9     9     9    DecimalDigitNumber
6     \u0BF0   10    -1    -1   OtherNumber
7     \u2788   9     9     -1   OtherNumber

*/

Imports System.Globalization

Public Class SamplesCharUnicodeInfo   

   Public Shared Sub Main()

      ' The String to get information for.
      Dim s As [String] = "a9\u0393\u00B2\u00BC\u0BEF\u0BF0\u2788"
      Console.WriteLine("String: {0}", s)

      ' Print the values for each of the characters in the string.
      Console.WriteLine("index c  Num   Dig   Dec   UnicodeCategory")
      Dim i As Integer
      For i = 0 To s.Length - 1
         Console.Write("{0,-5} {1,-3}", i, s(i))
         Console.Write(" {0,-5}", CharUnicodeInfo.GetNumericValue(s, i))
         Console.Write(" {0,-5}", CharUnicodeInfo.GetDigitValue(s, i))
         Console.Write(" {0,-5}", CharUnicodeInfo.GetDecimalDigitValue(s, i))
         Console.WriteLine("{0}", CharUnicodeInfo.GetUnicodeCategory(s, i))
      Next i

   End Sub

End Class


'This code produces the following output.  Some characters might not display at the console.
'
'String: a9\u0393\u00B2\u00BC\u0BEF\u0BF0\u2788
'index c  Num   Dig   Dec   UnicodeCategory
'0     a   -1    -1    -1   LowercaseLetter
'1     9   9     9     9    DecimalDigitNumber
'2     \u0393   -1    -1    -1   UppercaseLetter
'3     \u00B2   2     2     2    OtherNumber
'4     \u00BC   0.25  -1    -1   OtherNumber
'5     \u0BEF   9     9     9    DecimalDigitNumber
'6     \u0BF0   10    -1    -1   OtherNumber
'7     \u2788   9     9     -1   OtherNumber

注解

Unicode 字符分为多个类别。The Unicode characters are divided into categories. 字符的类别是其属性之一。A character's category is one of its properties. 例如,字符可以是大写字母、小写字母、十进制数字、字母、数字、连接符标点、数学符号或货币符号。For example, a character might be an uppercase letter, a lowercase letter, a decimal digit number, a letter number, a connector punctuation, a math symbol, or a currency symbol. UnicodeCategory 类返回 Unicode 字符的类别。The UnicodeCategory class returns the category of a Unicode character. 有关 Unicode 字符的详细信息,请参阅Unicode 标准For more information on Unicode characters, see the Unicode Standard.

如果位置 indexChar 对象是有效代理项对的第一个字符,则 GetUnicodeCategory(String, Int32) 方法返回代理项对的 Unicode 类别,而不是返回 UnicodeCategory.SurrogateIf the Char object at position index is the first character of a valid surrogate pair, the GetUnicodeCategory(String, Int32) method returns the Unicode category of the surrogate pair instead of returning UnicodeCategory.Surrogate. 例如,Ugaritic 字母表将码位 U + 10380 占据 U + 1039F。For example, the Ugaritic alphabet occupies code points U+10380 to U+1039F. 下面的示例使用 ConvertFromUtf32 方法来实例化一个字符串,该字符串表示 UGARITIC 字母 ALPA (U + 10380),这是 Ugaritic 字母表中的第一个字母。The following example uses the ConvertFromUtf32 method to instantiate a string that represents UGARITIC LETTER ALPA (U+10380), which is the first letter of the Ugaritic alphabet. 如示例的输出所示,如果向传递的是此字符的高代理项,则 GetUnicodeCategory(String, Int32) 方法返回 UnicodeCategory.OtherLetter,这表示它会考虑代理项对。As the output from the example shows, the GetUnicodeCategory(String, Int32) method returns UnicodeCategory.OtherLetter if it is passed the high surrogate of this character, which indicates that it considers the surrogate pair. 但是,如果将其传递到低代理项,则它仅考虑隔离中的低代理项,并返回 UnicodeCategory.SurrogateHowever, if it is passed the low surrogate, it considers only the low surrogate in isolation and returns UnicodeCategory.Surrogate.

int utf32 = 0x10380;       // UGARITIC LETTER ALPA
string surrogate = Char.ConvertFromUtf32(utf32);
for (int ctr = 0; ctr < surrogate.Length; ctr++)
   Console.WriteLine("U+{0:X4}: {1:G}", 
                     Convert.ToUInt16(surrogate[ctr]), 
                     System.Globalization.CharUnicodeInfo.GetUnicodeCategory(surrogate, ctr));
// The example displays the following output:
//       U+D800: OtherLetter
//       U+DF80: Surrogate      
Dim utf32 As Integer = &h10380       ' UGARITIC LETTER ALPA
Dim surrogate As String = Char.ConvertFromUtf32(utf32)
For ctr As Integer = 0 To surrogate.Length - 1
   Console.WriteLine("U+{0:X4}: {1:G}", 
                     Convert.ToUInt16(surrogate(ctr)), 
                     System.Globalization.CharUnicodeInfo.GetUnicodeCategory(surrogate, ctr))
Next
' The example displays the following output:
'       U+D800: OtherLetter
'       U+DF80: Surrogate      

请注意,在将特定字符作为参数传递时,CharUnicodeInfo.GetUnicodeCategory 方法不会始终返回与 Char.GetUnicodeCategory 方法相同的 UnicodeCategory 值。Note that CharUnicodeInfo.GetUnicodeCategory method does not always return the same UnicodeCategory value as the Char.GetUnicodeCategory method when passed a particular character as a parameter. CharUnicodeInfo.GetUnicodeCategory 方法旨在反映 Unicode 标准的当前版本。The CharUnicodeInfo.GetUnicodeCategory method is designed to reflect the current version of the Unicode standard. 相反,尽管 Char.GetUnicodeCategory 方法通常反映 Unicode 标准的当前版本,但它可能会根据标准的以前版本返回字符的类别,或者可能返回不同于当前标准的类别,以保留向后兼容性。In contrast, although the Char.GetUnicodeCategory method usually reflects the current version of the Unicode standard, it might return a character's category based on a previous version of the standard, or it might return a category that differs from the current standard to preserve backward compatibility.

另请参阅

适用于