String.Normalize 方法

定义

返回一个新字符串,其二进制表示形式符合特定的 Unicode 范式。

重载

Normalize()

返回一个新字符串,其文本值与此字符串相同,但其二进制表示形式符合 Unicode 范式 C。

Normalize(NormalizationForm)

返回一个新字符串,其文本值与此字符串相同,但其二进制表示形式符合指定的 Unicode 范式。

示例

以下示例将一个字符串规范化为四种规范化形式中的每一个,确认字符串已规范化为指定的规范化形式,然后列出规范化字符串中的代码点。

using namespace System;
using namespace System::Text;

void Show( String^ title, String^ s )
{
   Console::Write( "Characters in string {0} = ", title );
   for each (short x in s) {
      Console::Write("{0:X4} ", x);
   }
   Console::WriteLine();
}

int main()
{
   
   // Character c; combining characters acute and cedilla; character 3/4
   array<Char>^temp0 = {L'c',L'\u0301',L'\u0327',L'\u00BE'};
   String^ s1 = gcnew String( temp0 );
   String^ s2 = nullptr;
   String^ divider = gcnew String( '-',80 );
   divider = String::Concat( Environment::NewLine, divider, Environment::NewLine );

   Show( "s1", s1 );
   Console::WriteLine();
   Console::WriteLine( "U+0063 = LATIN SMALL LETTER C" );
   Console::WriteLine( "U+0301 = COMBINING ACUTE ACCENT" );
   Console::WriteLine( "U+0327 = COMBINING CEDILLA" );
   Console::WriteLine( "U+00BE = VULGAR FRACTION THREE QUARTERS" );
   Console::WriteLine( divider );
   Console::WriteLine( "A1) Is s1 normalized to the default form (Form C)?: {0}", s1->IsNormalized() );
   Console::WriteLine( "A2) Is s1 normalized to Form C?:  {0}", s1->IsNormalized( NormalizationForm::FormC ) );
   Console::WriteLine( "A3) Is s1 normalized to Form D?:  {0}", s1->IsNormalized( NormalizationForm::FormD ) );
   Console::WriteLine( "A4) Is s1 normalized to Form KC?: {0}", s1->IsNormalized( NormalizationForm::FormKC ) );
   Console::WriteLine( "A5) Is s1 normalized to Form KD?: {0}", s1->IsNormalized( NormalizationForm::FormKD ) );
   Console::WriteLine( divider );
   Console::WriteLine( "Set string s2 to each normalized form of string s1." );
   Console::WriteLine();
   Console::WriteLine( "U+1E09 = LATIN SMALL LETTER C WITH CEDILLA AND ACUTE" );
   Console::WriteLine( "U+0033 = DIGIT THREE" );
   Console::WriteLine( "U+2044 = FRACTION SLASH" );
   Console::WriteLine( "U+0034 = DIGIT FOUR" );
   Console::WriteLine( divider );
   s2 = s1->Normalize();
   Console::Write( "B1) Is s2 normalized to the default form (Form C)?: " );
   Console::WriteLine( s2->IsNormalized() );
   Show( "s2", s2 );
   Console::WriteLine();
   s2 = s1->Normalize( NormalizationForm::FormC );
   Console::Write( "B2) Is s2 normalized to Form C?: " );
   Console::WriteLine( s2->IsNormalized( NormalizationForm::FormC ) );
   Show( "s2", s2 );
   Console::WriteLine();
   s2 = s1->Normalize( NormalizationForm::FormD );
   Console::Write( "B3) Is s2 normalized to Form D?: " );
   Console::WriteLine( s2->IsNormalized( NormalizationForm::FormD ) );
   Show( "s2", s2 );
   Console::WriteLine();
   s2 = s1->Normalize( NormalizationForm::FormKC );
   Console::Write( "B4) Is s2 normalized to Form KC?: " );
   Console::WriteLine( s2->IsNormalized( NormalizationForm::FormKC ) );
   Show( "s2", s2 );
   Console::WriteLine();
   s2 = s1->Normalize( NormalizationForm::FormKD );
   Console::Write( "B5) Is s2 normalized to Form KD?: " );
   Console::WriteLine( s2->IsNormalized( NormalizationForm::FormKD ) );
   Show( "s2", s2 );
   Console::WriteLine();
}

/*
This example produces the following results:

Characters in string s1 = 0063 0301 0327 00BE

U+0063 = LATIN SMALL LETTER C
U+0301 = COMBINING ACUTE ACCENT
U+0327 = COMBINING CEDILLA
U+00BE = VULGAR FRACTION THREE QUARTERS

--------------------------------------------------------------------------------

A1) Is s1 normalized to the default form (Form C)?: False
A2) Is s1 normalized to Form C?:  False
A3) Is s1 normalized to Form D?:  False
A4) Is s1 normalized to Form KC?: False
A5) Is s1 normalized to Form KD?: False

--------------------------------------------------------------------------------

Set string s2 to each normalized form of string s1.

U+1E09 = LATIN SMALL LETTER C WITH CEDILLA AND ACUTE
U+0033 = DIGIT THREE
U+2044 = FRACTION SLASH
U+0034 = DIGIT FOUR

--------------------------------------------------------------------------------

B1) Is s2 normalized to the default form (Form C)?: True
Characters in string s2 = 1E09 00BE

B2) Is s2 normalized to Form C?: True
Characters in string s2 = 1E09 00BE

B3) Is s2 normalized to Form D?: True
Characters in string s2 = 0063 0327 0301 00BE

B4) Is s2 normalized to Form KC?: True
Characters in string s2 = 1E09 0033 2044 0034

B5) Is s2 normalized to Form KD?: True
Characters in string s2 = 0063 0327 0301 0033 2044 0034

*/
using System;
using System.Text;

class Example
{
    public static void Main()
    {
       // Character c; combining characters acute and cedilla; character 3/4
       string s1 = new String( new char[] {'\u0063', '\u0301', '\u0327', '\u00BE'});
       string s2 = null;
       string divider = new String('-', 80);
       divider = String.Concat(Environment.NewLine, divider, Environment.NewLine);

       Show("s1", s1);
       Console.WriteLine();
       Console.WriteLine("U+0063 = LATIN SMALL LETTER C");
       Console.WriteLine("U+0301 = COMBINING ACUTE ACCENT");
       Console.WriteLine("U+0327 = COMBINING CEDILLA");
       Console.WriteLine("U+00BE = VULGAR FRACTION THREE QUARTERS");
       Console.WriteLine(divider);

       Console.WriteLine("A1) Is s1 normalized to the default form (Form C)?: {0}",
                                    s1.IsNormalized());
       Console.WriteLine("A2) Is s1 normalized to Form C?:  {0}",
                                    s1.IsNormalized(NormalizationForm.FormC));
       Console.WriteLine("A3) Is s1 normalized to Form D?:  {0}",
                                    s1.IsNormalized(NormalizationForm.FormD));
       Console.WriteLine("A4) Is s1 normalized to Form KC?: {0}",
                                    s1.IsNormalized(NormalizationForm.FormKC));
       Console.WriteLine("A5) Is s1 normalized to Form KD?: {0}",
                                    s1.IsNormalized(NormalizationForm.FormKD));

       Console.WriteLine(divider);

       Console.WriteLine("Set string s2 to each normalized form of string s1.");
       Console.WriteLine();
       Console.WriteLine("U+1E09 = LATIN SMALL LETTER C WITH CEDILLA AND ACUTE");
       Console.WriteLine("U+0033 = DIGIT THREE");
       Console.WriteLine("U+2044 = FRACTION SLASH");
       Console.WriteLine("U+0034 = DIGIT FOUR");
       Console.WriteLine(divider);

       s2 = s1.Normalize();
       Console.Write("B1) Is s2 normalized to the default form (Form C)?: ");
       Console.WriteLine(s2.IsNormalized());
       Show("s2", s2);
       Console.WriteLine();

       s2 = s1.Normalize(NormalizationForm.FormC);
       Console.Write("B2) Is s2 normalized to Form C?: ");
       Console.WriteLine(s2.IsNormalized(NormalizationForm.FormC));
       Show("s2", s2);
       Console.WriteLine();

       s2 = s1.Normalize(NormalizationForm.FormD);
       Console.Write("B3) Is s2 normalized to Form D?: ");
       Console.WriteLine(s2.IsNormalized(NormalizationForm.FormD));
       Show("s2", s2);
       Console.WriteLine();

       s2 = s1.Normalize(NormalizationForm.FormKC);
       Console.Write("B4) Is s2 normalized to Form KC?: ");
       Console.WriteLine(s2.IsNormalized(NormalizationForm.FormKC));
       Show("s2", s2);
       Console.WriteLine();

       s2 = s1.Normalize(NormalizationForm.FormKD);
       Console.Write("B5) Is s2 normalized to Form KD?: ");
       Console.WriteLine(s2.IsNormalized(NormalizationForm.FormKD));
       Show("s2", s2);
       Console.WriteLine();
    }

    private static void Show(string title, string s)
    {
       Console.Write("Characters in string {0} = ", title);
       foreach(short x in s) {
           Console.Write("{0:X4} ", x);
       }
       Console.WriteLine();
    }
}
/*
This example produces the following results:

Characters in string s1 = 0063 0301 0327 00BE

U+0063 = LATIN SMALL LETTER C
U+0301 = COMBINING ACUTE ACCENT
U+0327 = COMBINING CEDILLA
U+00BE = VULGAR FRACTION THREE QUARTERS

--------------------------------------------------------------------------------

A1) Is s1 normalized to the default form (Form C)?: False
A2) Is s1 normalized to Form C?:  False
A3) Is s1 normalized to Form D?:  False
A4) Is s1 normalized to Form KC?: False
A5) Is s1 normalized to Form KD?: False

--------------------------------------------------------------------------------

Set string s2 to each normalized form of string s1.

U+1E09 = LATIN SMALL LETTER C WITH CEDILLA AND ACUTE
U+0033 = DIGIT THREE
U+2044 = FRACTION SLASH
U+0034 = DIGIT FOUR

--------------------------------------------------------------------------------

B1) Is s2 normalized to the default form (Form C)?: True
Characters in string s2 = 1E09 00BE

B2) Is s2 normalized to Form C?: True
Characters in string s2 = 1E09 00BE

B3) Is s2 normalized to Form D?: True
Characters in string s2 = 0063 0327 0301 00BE

B4) Is s2 normalized to Form KC?: True
Characters in string s2 = 1E09 0033 2044 0034

B5) Is s2 normalized to Form KD?: True
Characters in string s2 = 0063 0327 0301 0033 2044 0034

*/
open System
open System.Text

let show title (s: string) =
    printf $"Characters in string %s{title} = "
    for x in s do
        printf $"{int16 x:X4} "
    printfn ""


[<EntryPoint>]
let main _ =
    // Character c; combining characters acute and cedilla; character 3/4
    let s1 = String [| '\u0063'; '\u0301'; '\u0327'; '\u00BE' |]
    let divider = String('-', 80)
    let divider = String.Concat(Environment.NewLine, divider, Environment.NewLine)

    show "s1" s1
    printfn "\nU+0063 = LATIN SMALL LETTER C"
    printfn "U+0301 = COMBINING ACUTE ACCENT"
    printfn "U+0327 = COMBINING CEDILLA"
    printfn "U+00BE = VULGAR FRACTION THREE QUARTERS"
    printfn $"{divider}"

    printfn $"A1) Is s1 normalized to the default form (Form C)?: {s1.IsNormalized()}"
    printfn $"A2) Is s1 normalized to Form C?:  {s1.IsNormalized NormalizationForm.FormC}"
    printfn $"A3) Is s1 normalized to Form D?:  {s1.IsNormalized NormalizationForm.FormD}"
    printfn $"A4) Is s1 normalized to Form KC?: {s1.IsNormalized NormalizationForm.FormKC}"
    printfn $"A5) Is s1 normalized to Form KD?: {s1.IsNormalized NormalizationForm.FormKD}"

    printfn $"{divider}"

    printfn "Set string s2 to each normalized form of string s1.\n"
    printfn "U+1E09 = LATIN SMALL LETTER C WITH CEDILLA AND ACUTE"
    printfn"U+0033 = DIGIT THREE"
    printfn"U+2044 = FRACTION SLASH"
    printfn"U+0034 = DIGIT FOUR"
    printfn $"{divider}"
 
    let s2 = s1.Normalize()
    printf "B1) Is s2 normalized to the default form (Form C)?: "
    printfn $"{s2.IsNormalized()}"
    show "s2" s2
    printfn ""

    let s2 = s1.Normalize NormalizationForm.FormC
    printf "B2) Is s2 normalized to Form C?: "
    printfn $"{s2.IsNormalized NormalizationForm.FormC}"
    show "s2" s2
    printfn ""

    let s2 = s1.Normalize NormalizationForm.FormD
    printf "B3) Is s2 normalized to Form D?: "
    printfn $"{s2.IsNormalized NormalizationForm.FormD}"
    show "s2" s2
    printfn ""

    let s2 = s1.Normalize(NormalizationForm.FormKC)
    printf "B4) Is s2 normalized to Form KC?: "
    printfn $"{s2.IsNormalized NormalizationForm.FormKC}"
    show "s2" s2
    printfn ""

    let s2 = s1.Normalize(NormalizationForm.FormKD)
    printf "B5) Is s2 normalized to Form KD?: "
    printfn $"{s2.IsNormalized NormalizationForm.FormKD}"
    show "s2" s2
    0

(*
This example produces the following results:

Characters in string s1 = 0063 0301 0327 00BE

U+0063 = LATIN SMALL LETTER C
U+0301 = COMBINING ACUTE ACCENT
U+0327 = COMBINING CEDILLA
U+00BE = VULGAR FRACTION THREE QUARTERS

--------------------------------------------------------------------------------

A1) Is s1 normalized to the default form (Form C)?: False
A2) Is s1 normalized to Form C?:  False
A3) Is s1 normalized to Form D?:  False
A4) Is s1 normalized to Form KC?: False
A5) Is s1 normalized to Form KD?: False

--------------------------------------------------------------------------------

Set string s2 to each normalized form of string s1.

U+1E09 = LATIN SMALL LETTER C WITH CEDILLA AND ACUTE
U+0033 = DIGIT THREE
U+2044 = FRACTION SLASH
U+0034 = DIGIT FOUR

--------------------------------------------------------------------------------

B1) Is s2 normalized to the default form (Form C)?: True
Characters in string s2 = 1E09 00BE

B2) Is s2 normalized to Form C?: True
Characters in string s2 = 1E09 00BE

B3) Is s2 normalized to Form D?: True
Characters in string s2 = 0063 0327 0301 00BE

B4) Is s2 normalized to Form KC?: True
Characters in string s2 = 1E09 0033 2044 0034

B5) Is s2 normalized to Form KD?: True
Characters in string s2 = 0063 0327 0301 0033 2044 0034

*)
Imports System.Text

Class Example
   Public Shared Sub Main()
      ' Character c; combining characters acute and cedilla; character 3/4
      Dim s1 = New [String](New Char() {ChrW(&H0063), ChrW(&H0301), ChrW(&H0327), ChrW(&H00BE)})
      Dim s2 As String = Nothing
      Dim divider = New [String]("-"c, 80)
      divider = [String].Concat(Environment.NewLine, divider, Environment.NewLine)
      
      Show("s1", s1)
      Console.WriteLine()
      Console.WriteLine("U+0063 = LATIN SMALL LETTER C")
      Console.WriteLine("U+0301 = COMBINING ACUTE ACCENT")
      Console.WriteLine("U+0327 = COMBINING CEDILLA")
      Console.WriteLine("U+00BE = VULGAR FRACTION THREE QUARTERS")

      Console.WriteLine(divider)
      
      Console.WriteLine("A1) Is s1 normalized to the default form (Form C)?: {0}", s1.IsNormalized())
      Console.WriteLine("A2) Is s1 normalized to Form C?:  {0}", s1.IsNormalized(NormalizationForm.FormC))
      Console.WriteLine("A3) Is s1 normalized to Form D?:  {0}", s1.IsNormalized(NormalizationForm.FormD))
      Console.WriteLine("A4) Is s1 normalized to Form KC?: {0}", s1.IsNormalized(NormalizationForm.FormKC))
      Console.WriteLine("A5) Is s1 normalized to Form KD?: {0}", s1.IsNormalized(NormalizationForm.FormKD))
      
      Console.WriteLine(divider)
      
      Console.WriteLine("Set string s2 to each normalized form of string s1.")
      Console.WriteLine()
      Console.WriteLine("U+1E09 = LATIN SMALL LETTER C WITH CEDILLA AND ACUTE")
      Console.WriteLine("U+0033 = DIGIT THREE")
      Console.WriteLine("U+2044 = FRACTION SLASH")
      Console.WriteLine("U+0034 = DIGIT FOUR")
      Console.WriteLine(divider)
      
      s2 = s1.Normalize()
      Console.Write("B1) Is s2 normalized to the default form (Form C)?: ")
      Console.WriteLine(s2.IsNormalized())
      Show("s2", s2)
      Console.WriteLine()
      
      s2 = s1.Normalize(NormalizationForm.FormC)
      Console.Write("B2) Is s2 normalized to Form C?: ")
      Console.WriteLine(s2.IsNormalized(NormalizationForm.FormC))
      Show("s2", s2)
      Console.WriteLine()
      
      s2 = s1.Normalize(NormalizationForm.FormD)
      Console.Write("B3) Is s2 normalized to Form D?: ")
      Console.WriteLine(s2.IsNormalized(NormalizationForm.FormD))
      Show("s2", s2)
      Console.WriteLine()
      
      s2 = s1.Normalize(NormalizationForm.FormKC)
      Console.Write("B4) Is s2 normalized to Form KC?: ")
      Console.WriteLine(s2.IsNormalized(NormalizationForm.FormKC))
      Show("s2", s2)
      Console.WriteLine()
      
      s2 = s1.Normalize(NormalizationForm.FormKD)
      Console.Write("B5) Is s2 normalized to Form KD?: ")
      Console.WriteLine(s2.IsNormalized(NormalizationForm.FormKD))
      Show("s2", s2)
      Console.WriteLine()
   End Sub 
   
   Private Shared Sub Show(title As String, s As String)
      Console.Write("Characters in string {0} = ", title)
      For Each x As Char In s
         Console.Write("{0:X4} ", AscW(x))
      Next 
      Console.WriteLine()
   End Sub 
End Class 
'This example produces the following results:
'
'Characters in string s1 = 0063 0301 0327 00BE
'
'U+0063 = LATIN SMALL LETTER C
'U+0301 = COMBINING ACUTE ACCENT
'U+0327 = COMBINING CEDILLA
'U+00BE = VULGAR FRACTION THREE QUARTERS
'
'--------------------------------------------------------------------------------
'
'A1) Is s1 normalized to the default form (Form C)?: False
'A2) Is s1 normalized to Form C?:  False
'A3) Is s1 normalized to Form D?:  False
'A4) Is s1 normalized to Form KC?: False
'A5) Is s1 normalized to Form KD?: False
'
'--------------------------------------------------------------------------------
'
'Set string s2 to each normalized form of string s1.
'
'U+1E09 = LATIN SMALL LETTER C WITH CEDILLA AND ACUTE
'U+0033 = DIGIT THREE
'U+2044 = FRACTION SLASH
'U+0034 = DIGIT FOUR
'
'--------------------------------------------------------------------------------
'
'B1) Is s2 normalized to the default form (Form C)?: True
'Characters in string s2 = 1E09 00BE
'
'B2) Is s2 normalized to Form C?: True
'Characters in string s2 = 1E09 00BE
'
'B3) Is s2 normalized to Form D?: True
'Characters in string s2 = 0063 0327 0301 00BE
'
'B4) Is s2 normalized to Form KC?: True
'Characters in string s2 = 1E09 0033 2044 0034
'
'B5) Is s2 normalized to Form KD?: True
'Characters in string s2 = 0063 0327 0301 0033 2044 0034
'

Normalize()

返回一个新字符串,其文本值与此字符串相同,但其二进制表示形式符合 Unicode 范式 C。

public:
 System::String ^ Normalize();
public string Normalize ();
member this.Normalize : unit -> string
Public Function Normalize () As String

返回

String

一个新的规范化字符串,其文本值与此字符串相同,但其二进制表示形式符合范式 C。

例外

当前实例包含无效的 Unicode 字符。

注解

某些 Unicode 字符具有多个等效的二进制表示形式,由组合和/或复合 Unicode 字符集组成。 例如,以下任何代码点都可以表示字母“ắ”:

  • U+1EAF

  • U+0103 U+0301

  • U+0061 U+0306 U+0301

单个字符的多个表示形式使搜索、排序、匹配和其他操作复杂化。

Unicode 标准定义一个名为规范化的进程,当给定字符的任何等效二进制表示形式时,该进程将返回一个二进制表示形式。 规范化可以使用多种算法(称为规范化形式)来执行,这些算法遵循不同的规则。 .NET 支持由 Unicode 标准定义的四种规范化形式 (C、D、KC 和 KD) 。 当两个字符串以同一规范化形式表示时,可以使用序号比较来比较它们。

若要规范化和比较两个字符串,请执行以下操作:

  1. 从输入源(如文件或用户输入设备)获取要比较的字符串。

  2. Normalize()调用该方法将字符串规范化为规范化形式 C。

  3. 若要比较两个字符串,请调用支持序号字符串比较的方法,例如 Compare(String, String, StringComparison) 该方法,并提供值 StringComparison.OrdinalStringComparison.OrdinalIgnoreCase 作为 StringComparison 参数。 若要对规范化字符串的数组进行排序,请传递 comparerStringComparer.OrdinalStringComparer.OrdinalIgnoreCase 传递给相应的重载 Array.Sort

  4. 根据上一步指示的顺序发出排序输出中的字符串。

有关支持的 Unicode 规范化表单的说明,请参阅 System.Text.NormalizationForm

调用方说明

该方法 IsNormalized 在字符串中遇到第一个非规范化字符后立即返回 false 。 因此,如果字符串包含非规范化字符,后跟无效 Unicode 字符,此方法Normalize将引发一个ArgumentException尽管IsNormalized返回。false

另请参阅

适用于

Normalize(NormalizationForm)

返回一个新字符串,其文本值与此字符串相同,但其二进制表示形式符合指定的 Unicode 范式。

public:
 System::String ^ Normalize(System::Text::NormalizationForm normalizationForm);
public string Normalize (System.Text.NormalizationForm normalizationForm);
member this.Normalize : System.Text.NormalizationForm -> string
Public Function Normalize (normalizationForm As NormalizationForm) As String

参数

normalizationForm
NormalizationForm

一个 Unicode 范式。

返回

String

一个新字符串,其文本值与此字符串相同,但其二进制表示形式符合由 normalizationForm 参数指定的范式。

例外

当前实例包含无效的 Unicode 字符。

注解

某些 Unicode 字符具有多个等效的二进制表示形式,由组合字符和/或复合 Unicode 字符集组成。 单个字符的多个表示形式使搜索、排序、匹配和其他操作复杂化。

Unicode 标准定义一个名为规范化的进程,当给定字符的任何等效二进制表示形式时,该进程将返回一个二进制表示形式。 规范化可以使用多种算法(称为规范化形式)来执行,这些算法遵循不同的规则。 .NET 支持由 Unicode 标准定义的四种规范化形式 (C、D、KC 和 KD) 。 当两个字符串以同一规范化形式表示时,可以使用序号比较来比较它们。

若要规范化和比较两个字符串,请执行以下操作:

  1. 从输入源(如文件或用户输入设备)获取要比较的字符串。

  2. Normalize(NormalizationForm)调用该方法将字符串规范化为指定的规范化形式。

  3. 若要比较两个字符串,请调用支持序号字符串比较的方法,例如 Compare(String, String, StringComparison) 该方法,并提供值 StringComparison.OrdinalStringComparison.OrdinalIgnoreCase 作为 StringComparison 参数。 若要对规范化字符串的数组进行排序,请传递 comparerStringComparer.OrdinalStringComparer.OrdinalIgnoreCase 传递给相应的重载 Array.Sort

  4. 根据上一步指示的顺序发出排序输出中的字符串。

有关支持的 Unicode 规范化表单的说明,请参阅 System.Text.NormalizationForm

调用方说明

该方法 IsNormalized 在字符串中遇到第一个非规范化字符后立即返回 false 。 因此,如果字符串包含非规范化字符,后跟无效 Unicode 字符,此方法Normalize可能会引发一个ArgumentException尽管IsNormalized返回。false

另请参阅

适用于