System.Char 結構

發行項
01/08/2024

本文提供此 API 參考文件的補充備註。

結構 Char 代表使用UTF-16編碼的Unicode字碼點。物件的值 Char 是其16位數值（序數）值。

如果您不熟悉 Unicode、純量值、字碼點、代理字組、UTF-16 和 Rune 類型，請參閱 .NET 中的字元編碼簡介。

本文會檢查物件與字元之間的 Char 關聯性，並討論使用 Char 實例執行的一些常見工作。我們建議您考慮 Rune .NET Core 3.0 中引進的類型，作為執行其中一些工作的替代方案 Char 。

Char 物件、Unicode 字元和字串

String對像是表示文字字串之Char結構的循序集合。大部分 Unicode 字元都可以由單 Char 一物件表示，但編碼為基底字元、代理字組和/或組合字元序列的字元是由多個 Char 物件表示。因此， Char 物件中的 String 結構不一定等於單一 Unicode 字元。

在下列情況下，會使用多個16位程式代碼單位來代表單一 Unicode 字元：

字元，其可能包含單一字元或基底字元，後面接著一或多個結合字元。例如，字元 ä 是由 Char 程式代碼單位為 U+0061 的物件表示，後面接著程式 Char 代碼單位為 U+0308 的物件。（字元 ä 也可以由具有 U+00E4 程式代碼單位的單 Char 一物件定義。下列範例說明字元 ä 是由兩個 Char 對象所組成。

using System;
using System.IO;

public class Example1
{
    public static void Main()
    {
        StreamWriter sw = new StreamWriter("chars1.txt");
        char[] chars = { '\u0061', '\u0308' };
        string strng = new String(chars);
        sw.WriteLine(strng);
        sw.Close();
    }
}
// The example produces the following output:
//       ä

open System
open System.IO

let sw = new StreamWriter("chars1.txt")
let chars = [| '\u0061'; '\u0308' |]
let string = String chars
sw.WriteLine string
sw.Close()

// The example produces the following output:
//       ä

Imports System.IO

Module Example2
    Public Sub Main()
        Dim sw As New StreamWriter("chars1.txt")
        Dim chars() As Char = {ChrW(&H61), ChrW(&H308)}
        Dim strng As New String(chars)
        sw.WriteLine(strng)
        sw.Close()
    End Sub
End Module
' The example produces the following output:
'       ä

Unicode 基本多語平面以外的字元（BMP）。除了代表平面 0 的 BMP 之外，Unicode 還支援 16 個平面。 Unicode 字碼點以 UTF-32 表示，其為包含平面的 21 位值。例如，U+1D160 代表音樂符號第八記事字元。由於UTF-16編碼只有16位，因此 BMP以外的字元會以UTF-16中的代理字組來表示。下列範例說明 UTF-32 相當於 U+1D160 的 U+1D160 是 U+D834 U+D60。 U+D834 是高代理;高代理範圍從U+D800到U+DBFF。 U+DD60 是低代理;低代理範圍從U+DC00到U+DFFF。

using System;
using System.IO;

public class Example3
{
    public static void Main()
    {
        StreamWriter sw = new StreamWriter(@".\chars2.txt");
        int utf32 = 0x1D160;
        string surrogate = Char.ConvertFromUtf32(utf32);
        sw.WriteLine("U+{0:X6} UTF-32 = {1} ({2}) UTF-16",
                     utf32, surrogate, ShowCodePoints(surrogate));
        sw.Close();
    }

    private static string ShowCodePoints(string value)
    {
        string retval = null;
        foreach (var ch in value)
            retval += String.Format("U+{0:X4} ", Convert.ToUInt16(ch));

        return retval.Trim();
    }
}
// The example produces the following output:
//       U+01D160 UTF-32 = ð (U+D834 U+DD60) UTF-16

open System
open System.IO

let showCodePoints (value: char seq) =
    let str =
        value
        |> Seq.map (fun ch -> $"U+{Convert.ToUInt16 ch:X4}")
        |> String.concat ""
    str.Trim()

let sw = new StreamWriter(@".\chars2.txt")
let utf32 = 0x1D160
let surrogate = Char.ConvertFromUtf32 utf32
sw.WriteLine $"U+{utf32:X6} UTF-32 = {surrogate} ({showCodePoints surrogate}) UTF-16"
sw.Close()

// The example produces the following output:
//       U+01D160 UTF-32 = ð (U+D834 U+DD60) UTF-16

Imports System.IO

Module Example4
    Public Sub Main()
        Dim sw As New StreamWriter(".\chars2.txt")
        Dim utf32 As Integer = &H1D160
        Dim surrogate As String = Char.ConvertFromUtf32(utf32)
        sw.WriteLine("U+{0:X6} UTF-32 = {1} ({2}) UTF-16",
                   utf32, surrogate, ShowCodePoints(surrogate))
        sw.Close()
    End Sub

    Private Function ShowCodePoints(value As String) As String
        Dim retval As String = Nothing
        For Each ch In value
            retval += String.Format("U+{0:X4} ", Convert.ToUInt16(ch))
        Next
        Return retval.Trim()
    End Function
End Module
' The example produces the following output:
'       U+01D160 UTF-32 = ð (U+D834 U+DD60) UTF-16

字元和字元類別

每個 Unicode 字元或有效的代理字組都屬於 Unicode 類別。在 .NET 中，Unicode 類別會以列舉的成員UnicodeCategory表示，並包含例如、 UnicodeCategory.LowercaseLetter和 UnicodeCategory.SpaceSeparator等UnicodeCategory.CurrencySymbol值。

若要判斷字元的 Unicode 類別，請呼叫 GetUnicodeCategory 方法。例如，下列範例會呼叫 GetUnicodeCategory 來顯示字串中每個字元的 Unicode 類別。只有在實例中 String 沒有代理字組時，此範例才能正確運作。

using System;
using System.Globalization;

class Example
{
   public static void Main()
   {
      // Define a string with a variety of character categories.
      String s = "The red car drove down the long, narrow, secluded road.";
      // Determine the category of each character.
      foreach (var ch in s)
         Console.WriteLine("'{0}': {1}", ch, Char.GetUnicodeCategory(ch));
   }
}
// The example displays the following output:
//      'T': UppercaseLetter
//      'h': LowercaseLetter
//      'e': LowercaseLetter
//      ' ': SpaceSeparator
//      'r': LowercaseLetter
//      'e': LowercaseLetter
//      'd': LowercaseLetter
//      ' ': SpaceSeparator
//      'c': LowercaseLetter
//      'a': LowercaseLetter
//      'r': LowercaseLetter
//      ' ': SpaceSeparator
//      'd': LowercaseLetter
//      'r': LowercaseLetter
//      'o': LowercaseLetter
//      'v': LowercaseLetter
//      'e': LowercaseLetter
//      ' ': SpaceSeparator
//      'd': LowercaseLetter
//      'o': LowercaseLetter
//      'w': LowercaseLetter
//      'n': LowercaseLetter
//      ' ': SpaceSeparator
//      't': LowercaseLetter
//      'h': LowercaseLetter
//      'e': LowercaseLetter
//      ' ': SpaceSeparator
//      'l': LowercaseLetter
//      'o': LowercaseLetter
//      'n': LowercaseLetter
//      'g': LowercaseLetter
//      ',': OtherPunctuation
//      ' ': SpaceSeparator
//      'n': LowercaseLetter
//      'a': LowercaseLetter
//      'r': LowercaseLetter
//      'r': LowercaseLetter
//      'o': LowercaseLetter
//      'w': LowercaseLetter
//      ',': OtherPunctuation
//      ' ': SpaceSeparator
//      's': LowercaseLetter
//      'e': LowercaseLetter
//      'c': LowercaseLetter
//      'l': LowercaseLetter
//      'u': LowercaseLetter
//      'd': LowercaseLetter
//      'e': LowercaseLetter
//      'd': LowercaseLetter
//      ' ': SpaceSeparator
//      'r': LowercaseLetter
//      'o': LowercaseLetter
//      'a': LowercaseLetter
//      'd': LowercaseLetter
//      '.': OtherPunctuation

open System

// Define a string with a variety of character categories.
let s = "The red car drove down the long, narrow, secluded road."
// Determine the category of each character.
for ch in s do
    printfn $"'{ch}': {Char.GetUnicodeCategory ch}"

// The example displays the following output:
//      'T': UppercaseLetter
//      'h': LowercaseLetter
//      'e': LowercaseLetter
//      ' ': SpaceSeparator
//      'r': LowercaseLetter
//      'e': LowercaseLetter
//      'd': LowercaseLetter
//      ' ': SpaceSeparator
//      'c': LowercaseLetter
//      'a': LowercaseLetter
//      'r': LowercaseLetter
//      ' ': SpaceSeparator
//      'd': LowercaseLetter
//      'r': LowercaseLetter
//      'o': LowercaseLetter
//      'v': LowercaseLetter
//      'e': LowercaseLetter
//      ' ': SpaceSeparator
//      'd': LowercaseLetter
//      'o': LowercaseLetter
//      'w': LowercaseLetter
//      'n': LowercaseLetter
//      ' ': SpaceSeparator
//      't': LowercaseLetter
//      'h': LowercaseLetter
//      'e': LowercaseLetter
//      ' ': SpaceSeparator
//      'l': LowercaseLetter
//      'o': LowercaseLetter
//      'n': LowercaseLetter
//      'g': LowercaseLetter
//      ',': OtherPunctuation
//      ' ': SpaceSeparator
//      'n': LowercaseLetter
//      'a': LowercaseLetter
//      'r': LowercaseLetter
//      'r': LowercaseLetter
//      'o': LowercaseLetter
//      'w': LowercaseLetter
//      ',': OtherPunctuation
//      ' ': SpaceSeparator
//      's': LowercaseLetter
//      'e': LowercaseLetter
//      'c': LowercaseLetter
//      'l': LowercaseLetter
//      'u': LowercaseLetter
//      'd': LowercaseLetter
//      'e': LowercaseLetter
//      'd': LowercaseLetter
//      ' ': SpaceSeparator
//      'r': LowercaseLetter
//      'o': LowercaseLetter
//      'a': LowercaseLetter
//      'd': LowercaseLetter
//      '.': OtherPunctuation

Imports System.Globalization

Module Example1
    Public Sub Main()
        ' Define a string with a variety of character categories.
        Dim s As String = "The car drove down the narrow, secluded road."
        ' Determine the category of each character.
        For Each ch In s
            Console.WriteLine("'{0}': {1}", ch, Char.GetUnicodeCategory(ch))
        Next
    End Sub
End Module
' The example displays the following output:
'       'T': UppercaseLetter
'       'h': LowercaseLetter
'       'e': LowercaseLetter
'       ' ': SpaceSeparator
'       'r': LowercaseLetter
'       'e': LowercaseLetter
'       'd': LowercaseLetter
'       ' ': SpaceSeparator
'       'c': LowercaseLetter
'       'a': LowercaseLetter
'       'r': LowercaseLetter
'       ' ': SpaceSeparator
'       'd': LowercaseLetter
'       'r': LowercaseLetter
'       'o': LowercaseLetter
'       'v': LowercaseLetter
'       'e': LowercaseLetter
'       ' ': SpaceSeparator
'       'd': LowercaseLetter
'       'o': LowercaseLetter
'       'w': LowercaseLetter
'       'n': LowercaseLetter
'       ' ': SpaceSeparator
'       't': LowercaseLetter
'       'h': LowercaseLetter
'       'e': LowercaseLetter
'       ' ': SpaceSeparator
'       'l': LowercaseLetter
'       'o': LowercaseLetter
'       'n': LowercaseLetter
'       'g': LowercaseLetter
'       ',': OtherPunctuation
'       ' ': SpaceSeparator
'       'n': LowercaseLetter
'       'a': LowercaseLetter
'       'r': LowercaseLetter
'       'r': LowercaseLetter
'       'o': LowercaseLetter
'       'w': LowercaseLetter
'       ',': OtherPunctuation
'       ' ': SpaceSeparator
'       's': LowercaseLetter
'       'e': LowercaseLetter
'       'c': LowercaseLetter
'       'l': LowercaseLetter
'       'u': LowercaseLetter
'       'd': LowercaseLetter
'       'e': LowercaseLetter
'       'd': LowercaseLetter
'       ' ': SpaceSeparator
'       'r': LowercaseLetter
'       'o': LowercaseLetter
'       'a': LowercaseLetter
'       'd': LowercaseLetter
'       '.': OtherPunctuation

就內部而言，對於 ASCII 範圍以外的字元（U+0000 到 U+00FF），方法 GetUnicodeCategory 取決於類別所 CharUnicodeInfo 報告的 Unicode 類別。從 .NET Framework 4.6.2 開始，Unicode 字元會根據 Unicode Standard 8.0.0 版分類。在從 .NET Framework 4 到 .NET Framework 4.6.1 的 .NET Framework 版本中，它們會根據 Unicode Standard 6.3.0 版分類。

字元和文字元素

因為單一字元可以由多個 Char 物件表示，所以使用個別 Char 物件並不一定有意義。例如，下列範例會將代表愛琴海數位零到 9 的 Unicode 字碼點轉換為 UTF-16 編碼的程式代碼單位。因為它錯誤地將物件與字元相等 Char ，所以錯誤地報告產生的字串有 20 個字元。

using System;

public class Example5
{
    public static void Main()
    {
        string result = String.Empty;
        for (int ctr = 0x10107; ctr <= 0x10110; ctr++)  // Range of Aegean numbers.
            result += Char.ConvertFromUtf32(ctr);

        Console.WriteLine("The string contains {0} characters.", result.Length);
    }
}
// The example displays the following output:
//     The string contains 20 characters.

open System

let result =
    [ for i in 0x10107..0x10110 do  // Range of Aegean numbers.
        Char.ConvertFromUtf32 i ]
    |> String.concat ""

printfn $"The string contains {result.Length} characters."


// The example displays the following output:
//     The string contains 20 characters.

Module Example5
    Public Sub Main()
        Dim result As String = String.Empty
        For ctr As Integer = &H10107 To &H10110     ' Range of Aegean numbers.
            result += Char.ConvertFromUtf32(ctr)
        Next
        Console.WriteLine("The string contains {0} characters.", result.Length)
    End Sub
End Module
' The example displays the following output:
'     The string contains 20 characters.

您可以執行下列動作，以避免 Char 假設物件代表單一字元：

您可以完全使用 String 物件，而不是使用其個別字元來表示和分析語言內容。

您可以使用 String.EnumerateRunes ，如下列範例所示：

int CountLetters(string s)
{
    int letterCount = 0;

    foreach (Rune rune in s.EnumerateRunes())
    {
        if (Rune.IsLetter(rune))
        { letterCount++; }
    }

    return letterCount;
}

let countLetters (s: string) =
    let mutable letterCount = 0

    for rune in s.EnumerateRunes() do
        if Rune.IsLetter rune then
            letterCount <- letterCount + 1

    letterCount

您可以使用類別 StringInfo 來處理文字專案，而不是個別 Char 物件。下列範例會 StringInfo 使用對象來計算字串中由愛琴海數位零到九的文字元素數目。因為它會將 Surrogate 配對視為單一字元，所以它會正確地報告字串包含十個字元。

using System;
using System.Globalization;

public class Example4
{
    public static void Main()
    {
        string result = String.Empty;
        for (int ctr = 0x10107; ctr <= 0x10110; ctr++)  // Range of Aegean numbers.
            result += Char.ConvertFromUtf32(ctr);

        StringInfo si = new StringInfo(result);
        Console.WriteLine("The string contains {0} characters.",
                          si.LengthInTextElements);
    }
}
// The example displays the following output:
//       The string contains 10 characters.

open System
open System.Globalization

let result =
    [ for i in 0x10107..0x10110 do  // Range of Aegean numbers.
        Char.ConvertFromUtf32 i ]
    |> String.concat ""


let si = StringInfo result
printfn $"The string contains {si.LengthInTextElements} characters."

// The example displays the following output:
//       The string contains 10 characters.

Imports System.Globalization

Module Example6
    Public Sub Main()
        Dim result As String = String.Empty
        For ctr As Integer = &H10107 To &H10110     ' Range of Aegean numbers.
            result += Char.ConvertFromUtf32(ctr)
        Next
        Dim si As New StringInfo(result)
        Console.WriteLine("The string contains {0} characters.", si.LengthInTextElements)
    End Sub
End Module
' The example displays the following output:
'       The string contains 10 characters.

如果字串包含一或多個組合字元的基底字元，您可以呼叫 String.Normalize 方法，將子字串轉換成單一 UTF-16 編碼的程式代碼單位。下列範例會呼叫 String.Normalize 方法，將基底字元 U+0061 （LATIN SMALL LETTER A）和將字元 U+0308 （COMBINING DIAERESIS）轉換成 U+00E4 （LATIN SMALL LETTER A WITH DIAERESIS）。

using System;

public class Example2
{
    public static void Main()
    {
        string combining = "\u0061\u0308";
        ShowString(combining);

        string normalized = combining.Normalize();
        ShowString(normalized);
    }

    private static void ShowString(string s)
    {
        Console.Write("Length of string: {0} (", s.Length);
        for (int ctr = 0; ctr < s.Length; ctr++)
        {
            Console.Write("U+{0:X4}", Convert.ToUInt16(s[ctr]));
            if (ctr != s.Length - 1) Console.Write(" ");
        }
        Console.WriteLine(")\n");
    }
}
// The example displays the following output:
//       Length of string: 2 (U+0061 U+0308)
//
//       Length of string: 1 (U+00E4)

open System

let showString (s: string) =
    printf $"Length of string: {s.Length} ("
    for i = 0 to s.Length - 1 do
        printf $"U+{Convert.ToUInt16 s[i]:X4}"
        if i <> s.Length - 1 then printf " "
    printfn ")\n"

let combining = "\u0061\u0308"
showString combining

let normalized = combining.Normalize()
showString normalized

// The example displays the following output:
//       Length of string: 2 (U+0061 U+0308)
//
//       Length of string: 1 (U+00E4)

Module Example3
    Public Sub Main()
        Dim combining As String = ChrW(&H61) + ChrW(&H308)
        ShowString(combining)

        Dim normalized As String = combining.Normalize()
        ShowString(normalized)
    End Sub

    Private Sub ShowString(s As String)
        Console.Write("Length of string: {0} (", s.Length)
        For ctr As Integer = 0 To s.Length - 1
            Console.Write("U+{0:X4}", Convert.ToUInt16(s(ctr)))
            If ctr <> s.Length - 1 Then Console.Write(" ")
        Next
        Console.WriteLine(")")
        Console.WriteLine()
    End Sub
End Module
' The example displays the following output:
'       Length of string: 2 (U+0061 U+0308)
'       
'       Length of string: 1 (U+00E4)

一般作業

結構 Char 提供比較物件的方法 Char 、將目前 Char 物件的值轉換為另一種 Char 型別的物件，以及判斷物件的 Unicode 類別：

若要執行下列工作	`System.Char`使用這些方法
比較 Char 物件	CompareTo 和 Equals
將程式代碼點轉換成字串	ConvertFromUtf32 另 Rune 請參閱類型。
將 Char 物件或 Surrogate 物件 Char 組轉換成字碼點	針對單一字元： Convert.ToInt32(Char) 針對字串中的代理字組或字元： Char.ConvertToUtf32 另 Rune 請參閱類型。
取得字元的 Unicode 類別	GetUnicodeCategory 請參閱 Rune.GetUnicodeCategory。
判斷字元是否在特定 Unicode 類別中，例如數位、字母、標點符號、控制字元等等	IsControl、IsDigit、IsHighSurrogate、、IsLetter、IsLower IsLetterOrDigit IsLowSurrogate IsNumber IsPunctuation IsSurrogatePair IsSeparator IsSurrogate、IsSymbol、、和 IsUpper IsWhiteSpace 另請參閱類型的 Rune 對應方法。
Char將代表數位的物件轉換為數值類型	GetNumericValue 請參閱 Rune.GetNumericValue。
將字串中的字元轉換成 Char 物件	Parse 和 TryParse
將 Char 物件轉換成 String 物件	ToString
變更物件的大小寫Char	ToLower、ToLowerInvariant、ToUpper 和 ToUpperInvariant 另請參閱類型的 Rune 對應方法。

Char 值和 Interop

當 Managed Char 類型表示為 Unicode UTF-16 編碼程式代碼單位時，Interop 封送器預設會將字元集轉換成 ANSI。您可以將屬性套用 DllImportAttribute 至平台調用宣告，並將 StructLayoutAttribute 屬性套用至 COM Interop 宣告，以控制封送處理 Char 類型所使用的字元集。