Estructura System.Char

Artículo
01/11/2024

En este artículo se proporcionan comentarios adicionales a la documentación de referencia de esta API.

La Char estructura representa puntos de código Unicode mediante codificación UTF-16. El valor de un Char objeto es su valor numérico de 16 bits (ordinal).

Si no está familiarizado con Unicode, valores escalares, puntos de código, pares suplentes, UTF-16 y el tipo, vea Introducción a la Rune codificación de caracteres en .NET.

En este artículo se examina la relación entre un Char objeto y un carácter y se describen algunas tareas comunes realizadas con Char instancias. Se recomienda tener en cuenta el Rune tipo, introducido en .NET Core 3.0, como alternativa a Char realizar algunas de estas tareas.

Objetos Char, caracteres Unicode y cadenas

Un String objeto es una colección secuencial de Char estructuras que representa una cadena de texto. La mayoría de los caracteres Unicode se pueden representar mediante un único Char objeto, pero un carácter codificado como carácter base, par suplente o secuencia de caracteres combinado se representa mediante varios Char objetos. Por este motivo, una Char estructura de un String objeto no es necesariamente equivalente a un único carácter Unicode.

Se usan varias unidades de código de 16 bits para representar caracteres Unicode únicos en los casos siguientes:

Glifos, que pueden constar de un solo carácter o de un carácter base seguidos de uno o varios caracteres combinados. Por ejemplo, el carácter ä se representa mediante un Char objeto cuya unidad de código es U+0061 seguida de un Char objeto cuya unidad de código es U+0308. (El carácter ä también se puede definir mediante un único Char objeto que tiene una unidad de código de U+00E4). En el ejemplo siguiente se muestra que el carácter ä consta de dos Char objetos.

using System;
using System.IO;

public class Example1
{
    public static void Main()
    {
        StreamWriter sw = new StreamWriter("chars1.txt");
        char[] chars = { '\u0061', '\u0308' };
        string strng = new String(chars);
        sw.WriteLine(strng);
        sw.Close();
    }
}
// The example produces the following output:
//       ä

open System
open System.IO

let sw = new StreamWriter("chars1.txt")
let chars = [| '\u0061'; '\u0308' |]
let string = String chars
sw.WriteLine string
sw.Close()

// The example produces the following output:
//       ä

Imports System.IO

Module Example2
    Public Sub Main()
        Dim sw As New StreamWriter("chars1.txt")
        Dim chars() As Char = {ChrW(&H61), ChrW(&H308)}
        Dim strng As New String(chars)
        sw.WriteLine(strng)
        sw.Close()
    End Sub
End Module
' The example produces the following output:
'       ä

Caracteres fuera del plano multilingüe básico (BMP) unicode. Unicode admite dieciséis planos además del BMP, que representa el plano 0. Un punto de código Unicode se representa en UTF-32 por un valor de 21 bits que incluye el plano. Por ejemplo, U+1D160 representa el carácter MUSICAL SYMBOL EIGHTH NOTE. Dado que la codificación UTF-16 solo tiene 16 bits, los caracteres fuera del BMP se representan mediante pares suplentes en UTF-16. En el ejemplo siguiente se muestra que el equivalente UTF-32 de U+1D160, el carácter MUSICAL SYMBOL EIGHTH NOTE, es U+D834 U+DD60. U+D834 es el suplente alto; los suplentes altos oscilan entre U+D800 y U+DBFF. U+DD60 es el suplente bajo; Los suplentes bajos oscilan entre U+DC00 y U+DFFF.

using System;
using System.IO;

public class Example3
{
    public static void Main()
    {
        StreamWriter sw = new StreamWriter(@".\chars2.txt");
        int utf32 = 0x1D160;
        string surrogate = Char.ConvertFromUtf32(utf32);
        sw.WriteLine("U+{0:X6} UTF-32 = {1} ({2}) UTF-16",
                     utf32, surrogate, ShowCodePoints(surrogate));
        sw.Close();
    }

    private static string ShowCodePoints(string value)
    {
        string retval = null;
        foreach (var ch in value)
            retval += String.Format("U+{0:X4} ", Convert.ToUInt16(ch));

        return retval.Trim();
    }
}
// The example produces the following output:
//       U+01D160 UTF-32 = ð (U+D834 U+DD60) UTF-16

open System
open System.IO

let showCodePoints (value: char seq) =
    let str =
        value
        |> Seq.map (fun ch -> $"U+{Convert.ToUInt16 ch:X4}")
        |> String.concat ""
    str.Trim()

let sw = new StreamWriter(@".\chars2.txt")
let utf32 = 0x1D160
let surrogate = Char.ConvertFromUtf32 utf32
sw.WriteLine $"U+{utf32:X6} UTF-32 = {surrogate} ({showCodePoints surrogate}) UTF-16"
sw.Close()

// The example produces the following output:
//       U+01D160 UTF-32 = ð (U+D834 U+DD60) UTF-16

Imports System.IO

Module Example4
    Public Sub Main()
        Dim sw As New StreamWriter(".\chars2.txt")
        Dim utf32 As Integer = &H1D160
        Dim surrogate As String = Char.ConvertFromUtf32(utf32)
        sw.WriteLine("U+{0:X6} UTF-32 = {1} ({2}) UTF-16",
                   utf32, surrogate, ShowCodePoints(surrogate))
        sw.Close()
    End Sub

    Private Function ShowCodePoints(value As String) As String
        Dim retval As String = Nothing
        For Each ch In value
            retval += String.Format("U+{0:X4} ", Convert.ToUInt16(ch))
        Next
        Return retval.Trim()
    End Function
End Module
' The example produces the following output:
'       U+01D160 UTF-32 = ð (U+D834 U+DD60) UTF-16

Caracteres y categorías de caracteres

Cada carácter Unicode o un par suplente válido pertenece a una categoría Unicode. En .NET, las categorías Unicode se representan mediante miembros de la UnicodeCategory enumeración e incluyen valores como UnicodeCategory.CurrencySymbol, UnicodeCategory.LowercaseLettery UnicodeCategory.SpaceSeparator, por ejemplo.

Para determinar la categoría Unicode de un carácter, llame al GetUnicodeCategory método . Por ejemplo, en el ejemplo siguiente se llama GetUnicodeCategory a para mostrar la categoría Unicode de cada carácter de una cadena. El ejemplo solo funciona correctamente si no hay ningún par suplente en la String instancia.

using System;
using System.Globalization;

class Example
{
   public static void Main()
   {
      // Define a string with a variety of character categories.
      String s = "The red car drove down the long, narrow, secluded road.";
      // Determine the category of each character.
      foreach (var ch in s)
         Console.WriteLine("'{0}': {1}", ch, Char.GetUnicodeCategory(ch));
   }
}
// The example displays the following output:
//      'T': UppercaseLetter
//      'h': LowercaseLetter
//      'e': LowercaseLetter
//      ' ': SpaceSeparator
//      'r': LowercaseLetter
//      'e': LowercaseLetter
//      'd': LowercaseLetter
//      ' ': SpaceSeparator
//      'c': LowercaseLetter
//      'a': LowercaseLetter
//      'r': LowercaseLetter
//      ' ': SpaceSeparator
//      'd': LowercaseLetter
//      'r': LowercaseLetter
//      'o': LowercaseLetter
//      'v': LowercaseLetter
//      'e': LowercaseLetter
//      ' ': SpaceSeparator
//      'd': LowercaseLetter
//      'o': LowercaseLetter
//      'w': LowercaseLetter
//      'n': LowercaseLetter
//      ' ': SpaceSeparator
//      't': LowercaseLetter
//      'h': LowercaseLetter
//      'e': LowercaseLetter
//      ' ': SpaceSeparator
//      'l': LowercaseLetter
//      'o': LowercaseLetter
//      'n': LowercaseLetter
//      'g': LowercaseLetter
//      ',': OtherPunctuation
//      ' ': SpaceSeparator
//      'n': LowercaseLetter
//      'a': LowercaseLetter
//      'r': LowercaseLetter
//      'r': LowercaseLetter
//      'o': LowercaseLetter
//      'w': LowercaseLetter
//      ',': OtherPunctuation
//      ' ': SpaceSeparator
//      's': LowercaseLetter
//      'e': LowercaseLetter
//      'c': LowercaseLetter
//      'l': LowercaseLetter
//      'u': LowercaseLetter
//      'd': LowercaseLetter
//      'e': LowercaseLetter
//      'd': LowercaseLetter
//      ' ': SpaceSeparator
//      'r': LowercaseLetter
//      'o': LowercaseLetter
//      'a': LowercaseLetter
//      'd': LowercaseLetter
//      '.': OtherPunctuation

open System

// Define a string with a variety of character categories.
let s = "The red car drove down the long, narrow, secluded road."
// Determine the category of each character.
for ch in s do
    printfn $"'{ch}': {Char.GetUnicodeCategory ch}"

// The example displays the following output:
//      'T': UppercaseLetter
//      'h': LowercaseLetter
//      'e': LowercaseLetter
//      ' ': SpaceSeparator
//      'r': LowercaseLetter
//      'e': LowercaseLetter
//      'd': LowercaseLetter
//      ' ': SpaceSeparator
//      'c': LowercaseLetter
//      'a': LowercaseLetter
//      'r': LowercaseLetter
//      ' ': SpaceSeparator
//      'd': LowercaseLetter
//      'r': LowercaseLetter
//      'o': LowercaseLetter
//      'v': LowercaseLetter
//      'e': LowercaseLetter
//      ' ': SpaceSeparator
//      'd': LowercaseLetter
//      'o': LowercaseLetter
//      'w': LowercaseLetter
//      'n': LowercaseLetter
//      ' ': SpaceSeparator
//      't': LowercaseLetter
//      'h': LowercaseLetter
//      'e': LowercaseLetter
//      ' ': SpaceSeparator
//      'l': LowercaseLetter
//      'o': LowercaseLetter
//      'n': LowercaseLetter
//      'g': LowercaseLetter
//      ',': OtherPunctuation
//      ' ': SpaceSeparator
//      'n': LowercaseLetter
//      'a': LowercaseLetter
//      'r': LowercaseLetter
//      'r': LowercaseLetter
//      'o': LowercaseLetter
//      'w': LowercaseLetter
//      ',': OtherPunctuation
//      ' ': SpaceSeparator
//      's': LowercaseLetter
//      'e': LowercaseLetter
//      'c': LowercaseLetter
//      'l': LowercaseLetter
//      'u': LowercaseLetter
//      'd': LowercaseLetter
//      'e': LowercaseLetter
//      'd': LowercaseLetter
//      ' ': SpaceSeparator
//      'r': LowercaseLetter
//      'o': LowercaseLetter
//      'a': LowercaseLetter
//      'd': LowercaseLetter
//      '.': OtherPunctuation

Imports System.Globalization

Module Example1
    Public Sub Main()
        ' Define a string with a variety of character categories.
        Dim s As String = "The car drove down the narrow, secluded road."
        ' Determine the category of each character.
        For Each ch In s
            Console.WriteLine("'{0}': {1}", ch, Char.GetUnicodeCategory(ch))
        Next
    End Sub
End Module
' The example displays the following output:
'       'T': UppercaseLetter
'       'h': LowercaseLetter
'       'e': LowercaseLetter
'       ' ': SpaceSeparator
'       'r': LowercaseLetter
'       'e': LowercaseLetter
'       'd': LowercaseLetter
'       ' ': SpaceSeparator
'       'c': LowercaseLetter
'       'a': LowercaseLetter
'       'r': LowercaseLetter
'       ' ': SpaceSeparator
'       'd': LowercaseLetter
'       'r': LowercaseLetter
'       'o': LowercaseLetter
'       'v': LowercaseLetter
'       'e': LowercaseLetter
'       ' ': SpaceSeparator
'       'd': LowercaseLetter
'       'o': LowercaseLetter
'       'w': LowercaseLetter
'       'n': LowercaseLetter
'       ' ': SpaceSeparator
'       't': LowercaseLetter
'       'h': LowercaseLetter
'       'e': LowercaseLetter
'       ' ': SpaceSeparator
'       'l': LowercaseLetter
'       'o': LowercaseLetter
'       'n': LowercaseLetter
'       'g': LowercaseLetter
'       ',': OtherPunctuation
'       ' ': SpaceSeparator
'       'n': LowercaseLetter
'       'a': LowercaseLetter
'       'r': LowercaseLetter
'       'r': LowercaseLetter
'       'o': LowercaseLetter
'       'w': LowercaseLetter
'       ',': OtherPunctuation
'       ' ': SpaceSeparator
'       's': LowercaseLetter
'       'e': LowercaseLetter
'       'c': LowercaseLetter
'       'l': LowercaseLetter
'       'u': LowercaseLetter
'       'd': LowercaseLetter
'       'e': LowercaseLetter
'       'd': LowercaseLetter
'       ' ': SpaceSeparator
'       'r': LowercaseLetter
'       'o': LowercaseLetter
'       'a': LowercaseLetter
'       'd': LowercaseLetter
'       '.': OtherPunctuation

Internamente, para caracteres fuera del intervalo ASCII (U+0000 a U+00FF), el GetUnicodeCategory método depende de las categorías Unicode notificadas por la CharUnicodeInfo clase. A partir de .NET Framework 4.6.2, los caracteres Unicode se clasifican en función del estándar Unicode, versión 8.0.0. En versiones de .NET Framework de .NET Framework 4 a .NET Framework 4.6.1, se clasifican en función del estándar Unicode, versión 6.3.0.

Caracteres y elementos de texto

Dado que varios objetos pueden representar Char un solo carácter, no siempre es significativo trabajar con objetos individuales Char . Por ejemplo, en el ejemplo siguiente se convierten los puntos de código Unicode que representan los números Egeo cero a 9 a unidades de código codificadas UTF-16. Dado que equivale erróneamente Char a objetos con caracteres, informa inexactamente de que la cadena resultante tiene 20 caracteres.

using System;

public class Example5
{
    public static void Main()
    {
        string result = String.Empty;
        for (int ctr = 0x10107; ctr <= 0x10110; ctr++)  // Range of Aegean numbers.
            result += Char.ConvertFromUtf32(ctr);

        Console.WriteLine("The string contains {0} characters.", result.Length);
    }
}
// The example displays the following output:
//     The string contains 20 characters.

open System

let result =
    [ for i in 0x10107..0x10110 do  // Range of Aegean numbers.
        Char.ConvertFromUtf32 i ]
    |> String.concat ""

printfn $"The string contains {result.Length} characters."


// The example displays the following output:
//     The string contains 20 characters.

Module Example5
    Public Sub Main()
        Dim result As String = String.Empty
        For ctr As Integer = &H10107 To &H10110     ' Range of Aegean numbers.
            result += Char.ConvertFromUtf32(ctr)
        Next
        Console.WriteLine("The string contains {0} characters.", result.Length)
    End Sub
End Module
' The example displays the following output:
'     The string contains 20 characters.

Puede hacer lo siguiente para evitar la suposición de que un Char objeto representa un solo carácter:

Puede trabajar con un String objeto en su totalidad en lugar de trabajar con sus caracteres individuales para representar y analizar contenido lingüístico.

Puede usar String.EnumerateRunes como se muestra en el ejemplo siguiente:

int CountLetters(string s)
{
    int letterCount = 0;

    foreach (Rune rune in s.EnumerateRunes())
    {
        if (Rune.IsLetter(rune))
        { letterCount++; }
    }

    return letterCount;
}

let countLetters (s: string) =
    let mutable letterCount = 0

    for rune in s.EnumerateRunes() do
        if Rune.IsLetter rune then
            letterCount <- letterCount + 1

    letterCount

Puede usar la StringInfo clase para trabajar con elementos de texto en lugar de objetos individuales Char . En el ejemplo siguiente se usa el StringInfo objeto para contar el número de elementos de texto de una cadena que consta de los números Egeo cero a nueve. Dado que considera un par suplente un solo carácter, notifica correctamente que la cadena contiene diez caracteres.

using System;
using System.Globalization;

public class Example4
{
    public static void Main()
    {
        string result = String.Empty;
        for (int ctr = 0x10107; ctr <= 0x10110; ctr++)  // Range of Aegean numbers.
            result += Char.ConvertFromUtf32(ctr);

        StringInfo si = new StringInfo(result);
        Console.WriteLine("The string contains {0} characters.",
                          si.LengthInTextElements);
    }
}
// The example displays the following output:
//       The string contains 10 characters.

open System
open System.Globalization

let result =
    [ for i in 0x10107..0x10110 do  // Range of Aegean numbers.
        Char.ConvertFromUtf32 i ]
    |> String.concat ""


let si = StringInfo result
printfn $"The string contains {si.LengthInTextElements} characters."

// The example displays the following output:
//       The string contains 10 characters.

Imports System.Globalization

Module Example6
    Public Sub Main()
        Dim result As String = String.Empty
        For ctr As Integer = &H10107 To &H10110     ' Range of Aegean numbers.
            result += Char.ConvertFromUtf32(ctr)
        Next
        Dim si As New StringInfo(result)
        Console.WriteLine("The string contains {0} characters.", si.LengthInTextElements)
    End Sub
End Module
' The example displays the following output:
'       The string contains 10 characters.

Si una cadena contiene un carácter base que tiene uno o varios caracteres combinados, puede llamar al String.Normalize método para convertir la subcadena en una sola unidad de código codificada UTF-16. En el ejemplo siguiente se llama al String.Normalize método para convertir el carácter base U+0061 (LATIN SMALL LETTER A) y combinar el carácter U+0308 (COMBINING DIAERESIS) a U+00E4 (LATIN SMALL LETTER A WITH DIAERESIS).

using System;

public class Example2
{
    public static void Main()
    {
        string combining = "\u0061\u0308";
        ShowString(combining);

        string normalized = combining.Normalize();
        ShowString(normalized);
    }

    private static void ShowString(string s)
    {
        Console.Write("Length of string: {0} (", s.Length);
        for (int ctr = 0; ctr < s.Length; ctr++)
        {
            Console.Write("U+{0:X4}", Convert.ToUInt16(s[ctr]));
            if (ctr != s.Length - 1) Console.Write(" ");
        }
        Console.WriteLine(")\n");
    }
}
// The example displays the following output:
//       Length of string: 2 (U+0061 U+0308)
//
//       Length of string: 1 (U+00E4)

open System

let showString (s: string) =
    printf $"Length of string: {s.Length} ("
    for i = 0 to s.Length - 1 do
        printf $"U+{Convert.ToUInt16 s[i]:X4}"
        if i <> s.Length - 1 then printf " "
    printfn ")\n"

let combining = "\u0061\u0308"
showString combining

let normalized = combining.Normalize()
showString normalized

// The example displays the following output:
//       Length of string: 2 (U+0061 U+0308)
//
//       Length of string: 1 (U+00E4)

Module Example3
    Public Sub Main()
        Dim combining As String = ChrW(&H61) + ChrW(&H308)
        ShowString(combining)

        Dim normalized As String = combining.Normalize()
        ShowString(normalized)
    End Sub

    Private Sub ShowString(s As String)
        Console.Write("Length of string: {0} (", s.Length)
        For ctr As Integer = 0 To s.Length - 1
            Console.Write("U+{0:X4}", Convert.ToUInt16(s(ctr)))
            If ctr <> s.Length - 1 Then Console.Write(" ")
        Next
        Console.WriteLine(")")
        Console.WriteLine()
    End Sub
End Module
' The example displays the following output:
'       Length of string: 2 (U+0061 U+0308)
'       
'       Length of string: 1 (U+00E4)

Operaciones comunes

La Char estructura proporciona métodos para comparar Char objetos, convertir el valor del objeto actual Char en un objeto de otro tipo y determinar la categoría Unicode de un Char objeto:

Para hacer esto	Usar estos `System.Char` métodos
Comparar Char objetos	CompareTo y Equals
Convertir un punto de código en una cadena	ConvertFromUtf32 Vea también el Rune tipo .
Convertir un Char objeto o un par suplente de objetos en un punto de Char código	Para un solo carácter: Convert.ToInt32(Char) Para un par suplente o un carácter en una cadena: Char.ConvertToUtf32 Vea también el Rune tipo .
Obtener la categoría Unicode de un carácter	GetUnicodeCategory Vea también Rune.GetUnicodeCategory.
Determinar si un carácter está en una categoría Unicode determinada, como dígito, letra, puntuación, carácter de control, etc.	IsControl, IsDigit, IsHighSurrogate, IsLetter, , IsLower IsLetterOrDigit, IsLowSurrogate, IsNumber IsSurrogate IsSeparator IsSurrogatePair IsPunctuation IsSymbol, , y IsUpper IsWhiteSpace Consulte también los métodos correspondientes en el Rune tipo .
Convertir un Char objeto que representa un número en un tipo de valor numérico	GetNumericValue Vea también Rune.GetNumericValue.
Convertir un carácter en una cadena en un Char objeto	Parse y TryParse
Convertir un Char objeto en un String objeto	ToString
Cambiar el caso de un Char objeto	ToLower, ToLowerInvariant, ToUpper y ToUpperInvariant. Consulte también los métodos correspondientes en el Rune tipo .

Valores char e interoperabilidad

Cuando un tipo administrado Char , que se representa como una unidad de código codificada UTF-16 Unicode, se pasa a código no administrado, el serializador de interoperabilidad convierte el juego de caracteres en ANSI de forma predeterminada. Puede aplicar el DllImportAttribute atributo a las declaraciones de invocación de plataforma y el StructLayoutAttribute atributo a una declaración de interoperabilidad COM para controlar qué juego de caracteres usa un tipo serializado Char .