規則運算式中的字元類別Character classes in regular expressions

字元類別會定義一組字元,其中任何字元都可在輸入字串中出現,以便讓比對成功。A character class defines a set of characters, any one of which can occur in an input string for a match to succeed. .NET 中的規則運算式語言支援下列字元類別:The regular expression language in .NET supports the following character classes:

  • 正字元群組。Positive character groups. 輸入字串中的字元必須符合指定字元集的其中一個字元。A character in the input string must match one of a specified set of characters. 如需詳細資訊,請參閱正字元群組For more information, see Positive Character Group.

  • 負字元群組。Negative character groups. 輸入字串中的字元不得符合指定字元集的其中一個字元。A character in the input string must not match one of a specified set of characters. 如需詳細資訊,請參閱負字元群組For more information, see Negative Character Group.

  • 任何字元。Any character. 在規則運算式中的 . (點或句號) 字元是萬用字元,可符合 \n 以外的任何字元。The . (dot or period) character in a regular expression is a wildcard character that matches any character except \n. 如需詳細資訊,請參閱任何字元For more information, see Any Character.

  • 一般 Unicode 分類或具名區塊。A general Unicode category or named block. 輸入字串中的字元必須是特定 Unicode 分類的成員,或者必須落在 Unicode 字元的連續範圍內,比對才會成功。A character in the input string must be a member of a particular Unicode category or must fall within a contiguous range of Unicode characters for a match to succeed. 如需詳細資訊,請參閱 Unicode 分類或 Unicode 區塊For more information, see Unicode Category or Unicode Block.

  • 負的一般 Unicode 分類或是具名區塊。A negative general Unicode category or named block. 輸入字串中的字元不得是特定 Unicode 分類的成員,或者不得落在 Unicode 字元的連續範圍內,比對才會成功。A character in the input string must not be a member of a particular Unicode category or must not fall within a contiguous range of Unicode characters for a match to succeed. 如需詳細資訊,請參閱負 Unicode 分類或 Unicode 區塊For more information, see Negative Unicode Category or Unicode Block.

  • 文字字元。A word character. 輸入字串中的字元可以隸屬於任何適用於文字字元的 Unicode 分類。A character in the input string can belong to any of the Unicode categories that are appropriate for characters in words. 如需詳細資訊,請參閱文字字元For more information, see Word Character.

  • 非文字字元。A non-word character. 輸入字串中的字元可以隸屬於任何非文字字元的 Unicode 分類。A character in the input string can belong to any Unicode category that is not a word character. 如需詳細資訊,請參閱非文字字元For more information, see Non-Word Character.

  • 空白字元。A white-space character. 輸入字串中的字元可以是任何 Unicode 分隔符號字元,以及任何一種控制字元。A character in the input string can be any Unicode separator character, as well as any one of a number of control characters. 如需詳細資訊,請參閱空白字元For more information, see White-Space Character.

  • 非空白字元。A non-white-space character. 輸入字串中的字元可以是空白字元以外的任何字元。A character in the input string can be any character that is not a white-space character. 如需詳細資訊,請參閱非空白字元For more information, see Non-White-Space Character.

  • 十進位數字。A decimal digit. 輸入字串中的字元可以是歸類為 Unicode 十進位數字的任何一個數字字元。A character in the input string can be any of a number of characters classified as Unicode decimal digits. 如需詳細資訊,請參閱十進位數字字元For more information, see Decimal Digit Character.

  • 非十進位數字。A non-decimal digit. 輸入字串中的字元可以是 Unicode 十進位數字以外的任何字元。A character in the input string can be anything other than a Unicode decimal digit. 如需詳細資訊,請參閱十進位數字字元For more information, see Decimal Digit Character.

.NET 支援字元類別減法運算式,可讓您將一組字元定義為從某個字元類別中排除另一個字元類別的結果。.NET supports character class subtraction expressions, which enables you to define a set of characters as the result of excluding one character class from another character class. 如需詳細資訊,請參閱字元類別減法For more information, see Character Class Subtraction.

注意

依分類比對字元的字元類別 (例如,\w 會比對字組字元,或\p{} 會比對 Unicode 分類) 會依賴 CharUnicodeInfo 類別來提供字元分類的相關資訊。Character classes that match characters by category, such as \w to match word characters or \p{} to match a Unicode category, rely on the CharUnicodeInfo class to provide information about character categories. 從 .NET Framework 4.6.2 開始,字元類別根據 Unicode 標準 8.0.0 版Starting with the .NET Framework 4.6.2, character categories are based on The Unicode Standard, Version 8.0.0. 在 .NET Framework 4 至 .NET Framework 4.6.1 中,則是根據 Unicode 標準 6.3.0 版In the .NET Framework 4 through the .NET Framework 4.6.1, they are based on The Unicode Standard, Version 6.3.0.

正字元群組:[ ]Positive character group: [ ]

正字元群組會指定一份字元清單,其中任何字元都可出現在輸入字串中,以便出現相符項目。A positive character group specifies a list of characters, any one of which may appear in an input string for a match to occur. 這份字元清單可以個別指定,也可以依範圍指定,或同時依兩種方式指定。This list of characters may be specified individually, as a range, or both.

指定個別字元清單的語法如下所示:The syntax for specifying a list of individual characters is as follows:

[*character_group*]  

其中 character_group 是為了讓比對成功而可以出現在輸入字串中的個別字元清單。where character_group is a list of the individual characters that can appear in the input string for a match to succeed. character_group可以由一或多個常值字元、逸出字元或字元類別的任何組合所構成。character_group can consist of any combination of one or more literal characters, escape characters, or character classes.

指定字元範圍的語法如下所示:The syntax for specifying a range of characters is as follows:

[firstCharacter-lastCharacter]  

其中 firstCharacter 是範圍開始的字元,而 lastCharacter 是範圍結束的字元。where firstCharacter is the character that begins the range and lastCharacter is the character that ends the range. 字元範圍是指一系列連續的字元,定義的方式是指定系列中的第一個字元、連字號 (-),然後是系列中的最後一個字元。A character range is a contiguous series of characters defined by specifying the first character in the series, a hyphen (-), and then the last character in the series. 如果兩個字元具有相鄰的 Unicode 字碼指標,這兩個字元就是連續字元。Two characters are contiguous if they have adjacent Unicode code points. firstCharacter 必須是較低字碼指標的字元,lastCharacter 必須是較高字碼指標的字元。firstCharacter must be the character with the lower code point, and lastCharacter must be the character with the higher code point.

注意

由於正字元群組可以包含一組字元和一個範圍的字元,因此連字號字元 (-) 會一律解譯成範圍分隔符號,除非該字元是群組的第一個或最後一個字元。Because a positive character group can include both a set of characters and a character range, a hyphen character (-) is always interpreted as the range separator unless it is the first or last character of the group.

下表列出一些包含正字元類別的常見規則運算式模式。Some common regular expression patterns that contain positive character classes are listed in the following table.

模式Pattern 說明Description
[aeiou] 比對所有母音。Match all vowels.
[\p{P}\d] 比對所有標點符號和十進位數字字元。Match all punctuation and decimal digit characters.
[\s\p{P}] 比對所有空白字元與標點符號。Match all white space and punctuation.

下列範例會定義包含字元 "a" 和 "e" 的正字元群組,因此輸入字串必須包含文字 "grey" 或 "gray" 且後面接著另一個文字,才會出現相符項目。The following example defines a positive character group that contains the characters "a" and "e" so that the input string must contain the words "grey" or "gray" followed by another word for a match to occur.

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string pattern = @"gr[ae]y\s\S+?[\s\p{P}]";
      string input = "The gray wolf jumped over the grey wall.";
      MatchCollection matches = Regex.Matches(input, pattern);
      foreach (Match match in matches)
         Console.WriteLine($"'{match.Value}'");
   }
}
// The example displays the following output:
//       'gray wolf '
//       'grey wall.'
Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Dim pattern As String = "gr[ae]y\s\S+?[\s\p{P}]"
      Dim input As String = "The gray wolf jumped over the grey wall."
      Dim matches As MatchCollection = Regex.Matches(input, pattern)
      For Each match As Match In matches
         Console.WriteLine($"'{match.Value}'")
      Next
   End Sub
End Module
' The example displays the following output:
'       'gray wolf '
'       'grey wall.'

規則運算式 gr[ae]y\s\S+?[\s|\p{P}] 定義如下:The regular expression gr[ae]y\s\S+?[\s|\p{P}] is defined as follows:

模式Pattern 說明Description
gr 比對常值字元 "gr"。Match the literal characters "gr".
[ae] 比對 "a" 或 "e"。Match either an "a" or an "e".
y\s 比對後面接著空白字元的常值字元 "y"。Match the literal character "y" followed by a white-space character.
\S+? 比對一個或多個非空白字元,但越少越好。Match one or more non-white-space characters, but as few as possible.
[\s\p{P}] 比對空白字元或標點符號。Match either a white-space character or a punctuation mark.

下列範例將比對以任何大寫字母開頭的文字。The following example matches words that begin with any capital letter. 範例將使用子運算式 [A-Z] 表示從 A 到 Z 的大寫字母範圍。It uses the subexpression [A-Z] to represent the range of capital letters from A to Z.

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string pattern = @"\b[A-Z]\w*\b";
      string input = "A city Albany Zulu maritime Marseilles";
      foreach (Match match in Regex.Matches(input, pattern))
         Console.WriteLine(match.Value);
   }
}
// The example displays the following output:
//       A
//       Albany
//       Zulu
//       Marseilles
Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Dim pattern As String = "\b[A-Z]\w*\b"
      Dim input As String = "A city Albany Zulu maritime Marseilles"
      For Each match As Match In Regex.Matches(input, pattern)
         Console.WriteLine(match.Value)
      Next
   End Sub
End Module

規則運算式 \b[A-Z]\w*\b 的定義如下表所示。The regular expression \b[A-Z]\w*\b is defined as shown in the following table.

模式Pattern 說明Description
\b 從字緣開始。Start at a word boundary.
[A-Z] 比對從 A 到 Z 的任何大寫字元。Match any uppercase character from A to Z.
\w* 比對零個或多個文字字元。Match zero or more word characters.
\b 比對字邊界。Match a word boundary.

負字元群組:[^]Negative character group: [^]

負字元群組會指定一份字元清單,其中任何字元不得出現在輸入字串中,才會出現相符項目。A negative character group specifies a list of characters that must not appear in an input string for a match to occur. 字元清單可以個別指定,也可以依範圍指定,或同時依兩種方式指定。The list of characters may be specified individually, as a range, or both.

指定個別字元清單的語法如下所示:The syntax for specifying a list of individual characters is as follows:

[*^character_group*]  

其中 character_group 是為了讓比對成功而不可出現在輸入字串中的個別字元清單。where character_group is a list of the individual characters that cannot appear in the input string for a match to succeed. character_group可以由一或多個常值字元、逸出字元或字元類別的任何組合所構成。character_group can consist of any combination of one or more literal characters, escape characters, or character classes.

指定字元範圍的語法如下所示:The syntax for specifying a range of characters is as follows:

[^*firstCharacter*-*lastCharacter*]  

其中 firstCharacter 是範圍開始的字元,而 lastCharacter 是範圍結束的字元。where firstCharacter is the character that begins the range and lastCharacter is the character that ends the range. 字元範圍是指一系列連續的字元,定義的方式是指定系列中的第一個字元、連字號 (-),然後是系列中的最後一個字元。A character range is a contiguous series of characters defined by specifying the first character in the series, a hyphen (-), and then the last character in the series. 如果兩個字元具有相鄰的 Unicode 字碼指標,這兩個字元就是連續字元。Two characters are contiguous if they have adjacent Unicode code points. firstCharacter 必須是較低字碼指標的字元,lastCharacter 必須是較高字碼指標的字元。firstCharacter must be the character with the lower code point, and lastCharacter must be the character with the higher code point.

注意

由於負字元群組可以包含一組字元和一個範圍的字元,因此連字號字元 (-) 會一律解譯成範圍分隔符號,除非該字元是群組的第一個或最後一個字元。Because a negative character group can include both a set of characters and a character range, a hyphen character (-) is always interpreted as the range separator unless it is the first or last character of the group.

可以串連兩個或多個字元範圍。Two or more character ranges can be concatenated. 例如,若要指定從 "0" 到 "9" 的十進位數字範圍、從 "a" 到 "f" 的小寫字母範圍,以及從 "A" 到 "F" 的大寫字母範圍,可以使用 [0-9a-fA-F]For example, to specify the range of decimal digits from "0" through "9", the range of lowercase letters from "a" through "f", and the range of uppercase letters from "A" through "F", use [0-9a-fA-F].

負字元群組中的前置 ^ 字元是必要的,它表示字元群組是負字元群組而非正字元群組。The leading carat character (^) in a negative character group is mandatory and indicates the character group is a negative character group instead of a positive character group.

重要

較大規則運算式模式中的負字元群組不是零寬度的判斷提示。A negative character group in a larger regular expression pattern is not a zero-width assertion. 也就是說,在評估負字元群組之後,規則運算式引擎會在輸入字串中前進一個字元。That is, after evaluating the negative character group, the regular expression engine advances one character in the input string.

下表列出一些包含負字元群組的常見規則運算式模式。Some common regular expression patterns that contain negative character groups are listed in the following table.

模式Pattern 說明Description
[^aeiou] 比對除了母音之外的所有字元。Match all characters except vowels.
[^\p{P}\d] 比對標點符號和十進位數字字元之外的所有字元。Match all characters except punctuation and decimal digit characters.

下列範例將比對任何開頭字元是 "th" 且後面不是接著 "o" 的文字。The following example matches any word that begins with the characters "th" and is not followed by an "o".

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string pattern = @"\bth[^o]\w+\b";
      string input = "thought thing though them through thus thorough this";
      foreach (Match match in Regex.Matches(input, pattern))
         Console.WriteLine(match.Value);
   }
}
// The example displays the following output:
//       thing
//       them
//       through
//       thus
//       this
Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Dim pattern As String = "\bth[^o]\w+\b"
      Dim input As String = "thought thing though them through thus " + _
                            "thorough this"
      For Each match As Match In Regex.Matches(input, pattern)
         Console.WriteLine(match.Value)
      Next
   End Sub
End Module
' The example displays the following output:
'       thing
'       them
'       through
'       thus
'       this

規則運算式 \bth[^o]\w+\b 的定義如下表所示。The regular expression \bth[^o]\w+\b is defined as shown in the following table.

模式Pattern 說明Description
\b 從字緣開始。Start at a word boundary.
th 比對常值字元 "th"。Match the literal characters "th".
[^o] 比對不是 "o" 的任何字元。Match any character that is not an "o".
\w+ 比對一個或多個文字字元。Match one or more word characters.
\b 在字邊界結束。End at a word boundary.

任何字元:.Any character: .

句號字元 (.) 會比對 \n (新行字元 \u000A) 以外任何具有下列兩項資格的字元:The period character (.) matches any character except \n (the newline character, \u000A), with the following two qualifications:

  • 如果 RegexOptions.Singleline 選項修改了規則運算式模式,或是 . 選項修改了模式中包含 s 字元類別的部分,. 就會符合任何字元。If a regular expression pattern is modified by the RegexOptions.Singleline option, or if the portion of the pattern that contains the . character class is modified by the s option, . matches any character. 如需詳細資訊,請參閱規則運算式選項For more information, see Regular Expression Options.

    下列範例將示範 . 字元類別的預設行為與使用 RegexOptions.Singleline 選項的行為有何不同。The following example illustrates the different behavior of the . character class by default and with the RegexOptions.Singleline option. 規則運算式 ^.+ 會從字串開頭開始,比對每一個字元。The regular expression ^.+ starts at the beginning of the string and matches every character. 根據預設,比對會在第一行結尾結束。規則運算式模式會比對歸位字元 \r 或 \u000D,但不會比對 \nBy default, the match ends at the end of the first line; the regular expression pattern matches the carriage return character, \r or \u000D, but it does not match \n. 由於 RegexOptions.Singleline 選項會將整個輸入字串解譯為單行,因此它會比對輸入字串中的每個字元,包括 \nBecause the RegexOptions.Singleline option interprets the entire input string as a single line, it matches every character in the input string, including \n.

    using System;
    using System.Text.RegularExpressions;
    
    public class Example
    {
       public static void Main()
       {
          string pattern = "^.+";
          string input = "This is one line and" + Environment.NewLine + "this is the second.";
          foreach (Match match in Regex.Matches(input, pattern))
             Console.WriteLine(Regex.Escape(match.Value));
    
          Console.WriteLine();
          foreach (Match match in Regex.Matches(input, pattern, RegexOptions.Singleline))
             Console.WriteLine(Regex.Escape(match.Value));
       }
    }
    // The example displays the following output:
    //       This\ is\ one\ line\ and\r
    //       
    //       This\ is\ one\ line\ and\r\nthis\ is\ the\ second\.
    
    Imports System.Text.RegularExpressions
    
    Module Example
       Public Sub Main()
          Dim pattern As String = "^.+"
          Dim input As String = "This is one line and" + vbCrLf + "this is the second."
          For Each match As Match In Regex.Matches(input, pattern)
             Console.WriteLine(Regex.Escape(match.Value))
          Next
          Console.WriteLine()
          For Each match As Match In Regex.Matches(input, pattern, RegexOptions.SingleLine)
             Console.WriteLine(Regex.Escape(match.Value))
          Next
       End Sub
    End Module
    ' The example displays the following output:
    '       This\ is\ one\ line\ and\r
    '       
    '       This\ is\ one\ line\ and\r\nthis\ is\ the\ second\.
    

注意

由於它會比對 \n 以外的任何字元,因此 . 字元類別也會比對 \r (歸位字元 \u000D)。Because it matches any character except \n, the . character class also matches \r (the carriage return character, \u000D).

  • 在正或負字元群組中,句號會視為常值句號字元而非字元類別。In a positive or negative character group, a period is treated as a literal period character, and not as a character class. 如需詳細資訊,請參閱本主題前段的正字元群組負字元群組For more information, see Positive Character Group and Negative Character Group earlier in this topic. 下列範例將進行示範,定義包含句號字元 (.) 做為字元類別以及做為正字元群組成員的規則運算式。The following example provides an illustration by defining a regular expression that includes the period character (.) both as a character class and as a member of a positive character group. 規則運算式 \b.*[.?!;:](\s|\z) 會從字邊界開始比對所有字元,直到遇到包括句號的五個標點符號其中之一,然後比對空白字元或字串結尾。The regular expression \b.*[.?!;:](\s|\z) begins at a word boundary, matches any character until it encounters one of five punctuation marks, including a period, and then matches either a white-space character or the end of the string.

    using System;
    using System.Text.RegularExpressions;
    
    public class Example
    {
       public static void Main()
       {
          string pattern = @"\b.*[.?!;:](\s|\z)";
          string input = "this. what: is? go, thing.";
          foreach (Match match in Regex.Matches(input, pattern))
             Console.WriteLine(match.Value);
       }
    }
    // The example displays the following output:
    //       this. what: is? go, thing.
    
    Imports System.Text.RegularExpressions
    
    Module Example
       Public Sub Main()
          Dim pattern As STring = "\b.*[.?!;:](\s|\z)"
          Dim input As String = "this. what: is? go, thing."
          For Each match As Match In Regex.Matches(input, pattern)
             Console.WriteLine(match.Value)
          Next   
       End Sub
    End Module
    ' The example displays the following output:
    '       this. what: is? go, thing.
    

注意

由於它會比對任何字元,因此如果規則運算式模式嘗試多次比對任何字元,. 語言項目就會經常與 lazy 數量詞搭配使用。Because it matches any character, the . language element is often used with a lazy quantifier if a regular expression pattern attempts to match any character multiple times. 如需詳細資訊,請參閱數量詞For more information, see Quantifiers.

Unicode 類別或 Unicode 區塊:\p{}Unicode category or Unicode block: \p{}

Unicode 標準會為每個字元指派一種一般分類。The Unicode standard assigns each character a general category. 例如,特定字元可以是大寫字母 (以 Lu 分類表示)、十進位數字 (Nd 分類)、數學符號 (Sm 分類) 或段落分隔符號 (Zl 分類)。For example, a particular character can be an uppercase letter (represented by the Lu category), a decimal digit (the Nd category), a math symbol (the Sm category), or a paragraph separator (the Zl category). Unicode 標準中的特定字元集也會佔據連續字碼指標的特定範圍或區塊。Specific character sets in the Unicode standard also occupy a specific range or block of consecutive code points. 例如,從 \u0000 到 \u007F 可找到基本拉丁字元集,從 \u0600 到 \u06FF 則可找到阿拉伯字元集。For example, the basic Latin character set is found from \u0000 through \u007F, while the Arabic character set is found from \u0600 through \u06FF.

規則運算式建構The regular expression construct

\p{ name }\p{ name }

比對屬於 Unicode 一般分類或具名區塊的任何字元,其中 name 是分類縮寫或具名區塊名稱。matches any character that belongs to a Unicode general category or named block, where name is the category abbreviation or named block name. 如需分類縮寫的清單,請參閱本主題稍後的支援的 Unicode 一般分類一節。For a list of category abbreviations, see the Supported Unicode General Categories section later in this topic. 如需具名區塊清單,請參閱本主題稍後的支援的具名區塊一節。For a list of named blocks, see the Supported Named Blocks section later in this topic.

下列範例會使用 \p{name} 建構同時比對 Unicode 一般類別 (在這個案例中是 Pd,或稱 Punctuation, Dash 分類) 以及具名區塊 (IsGreekIsBasicLatin 具名區塊)。The following example uses the \p{name} construct to match both a Unicode general category (in this case, the Pd, or Punctuation, Dash category) and a named block (the IsGreek and IsBasicLatin named blocks).

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string pattern = @"\b(\p{IsGreek}+(\s)?)+\p{Pd}\s(\p{IsBasicLatin}+(\s)?)+";
      string input = "Κατα Μαθθαίον - The Gospel of Matthew";

      Console.WriteLine(Regex.IsMatch(input, pattern));        // Displays True.
   }
}
Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Dim pattern As String = "\b(\p{IsGreek}+(\s)?)+\p{Pd}\s(\p{IsBasicLatin}+(\s)?)+"
      Dim input As String = "Κατα Μαθθαίον - The Gospel of Matthew"

      Console.WriteLine(Regex.IsMatch(input, pattern))         ' Displays True.
   End Sub
End Module

規則運算式 \b(\p{IsGreek}+(\s)?)+\p{Pd}\s(\p{IsBasicLatin}+(\s)?)+ 的定義如下表所示。The regular expression \b(\p{IsGreek}+(\s)?)+\p{Pd}\s(\p{IsBasicLatin}+(\s)?)+ is defined as shown in the following table.

模式Pattern 說明Description
\b 從字緣開始。Start at a word boundary.
\p{IsGreek}+ 比對一個或多個希臘字元。Match one or more Greek characters.
(\s)? 比對零個或一個空白字元。Match zero or one white-space character.
(\p{IsGreek}+(\s)?)+ 一次或多次比對一個或多個希臘字元後面接著零或一個空白字元的模式。Match the pattern of one or more Greek characters followed by zero or one white-space characters one or more times.
\p{Pd} 比對標點符號、虛線字元。Match a Punctuation, Dash character.
\s 比對空白字元。Match a white-space character.
\p{IsBasicLatin}+ 比對一個或多個基本拉丁字元。Match one or more basic Latin characters.
(\s)? 比對零個或一個空白字元。Match zero or one white-space character.
(\p{IsBasicLatin}+(\s)?)+ 一次或多次比對一個或多個基本拉丁字元後面接著零個或一個空白字元的模式。Match the pattern of one or more basic Latin characters followed by zero or one white-space characters one or more times.

負 Unicode 類別或 Unicode 區塊:\P{}Negative Unicode category or Unicode block: \P{}

Unicode 標準會為每個字元指派一種一般分類。The Unicode standard assigns each character a general category. 例如,特定字元可以是大寫字母 (以 Lu 分類表示)、十進位數字 (Nd 分類)、數學符號 (Sm 分類) 或段落分隔符號 (Zl 分類)。For example, a particular character can be an uppercase letter (represented by the Lu category), a decimal digit (the Nd category), a math symbol (the Sm category), or a paragraph separator (the Zl category). Unicode 標準中的特定字元集也會佔據連續字碼指標的特定範圍或區塊。Specific character sets in the Unicode standard also occupy a specific range or block of consecutive code points. 例如,從 \u0000 到 \u007F 可找到基本拉丁字元集,從 \u0600 到 \u06FF 則可找到阿拉伯字元集。For example, the basic Latin character set is found from \u0000 through \u007F, while the Arabic character set is found from \u0600 through \u06FF.

規則運算式建構The regular expression construct

\P{ name }\P{ name }

比對任何不屬於 Unicode 一般分類或具名區塊的字元,其中 name 是分類縮寫或是具名區塊名稱。matches any character that does not belong to a Unicode general category or named block, where name is the category abbreviation or named block name. 如需分類縮寫的清單,請參閱本主題稍後的支援的 Unicode 一般分類一節。For a list of category abbreviations, see the Supported Unicode General Categories section later in this topic. 如需具名區塊清單,請參閱本主題稍後的支援的具名區塊一節。For a list of named blocks, see the Supported Named Blocks section later in this topic.

下列範例會使用 \P{name} 建構從數值字串中移除任何貨幣符號 (在這個案例中是 Sc,或稱 [符號、貨幣] 分類)。The following example uses the \P{name} construct to remove any currency symbols (in this case, the Sc, or Symbol, Currency category) from numeric strings.

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string pattern = @"(\P{Sc})+";
      
      string[] values = { "$164,091.78", "£1,073,142.68", "73¢", "€120" };
      foreach (string value in values)
         Console.WriteLine(Regex.Match(value, pattern).Value);
   }
}
// The example displays the following output:
//       164,091.78
//       1,073,142.68
//       73
//       120
Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Dim pattern As String = "(\P{Sc})+"
      
      Dim values() As String = { "$164,091.78", "£1,073,142.68", "73¢", "€120"}
      For Each value As String In values
         Console.WriteLine(Regex.Match(value, pattern).Value)
      Next
   End Sub
End Module
' The example displays the following output:
'       164,091.78
'       1,073,142.68
'       73
'       120

規則運算式模式 (\P{Sc})+ 會比對一個或多個不是貨幣符號的字元,並且會有效移除結果字串中的所有貨幣符號。The regular expression pattern (\P{Sc})+ matches one or more characters that are not currency symbols; it effectively strips any currency symbol from the result string.

文字字元:\wWord character: \w

\w 會比對任何文字字元。\w matches any word character. 文字字元是下表中所列的任何 Unicode 分類的成員。A word character is a member of any of the Unicode categories listed in the following table.

分類Category 說明Description
LlLl 字母、小寫Letter, Lowercase
LuLu 字母、大寫Letter, Uppercase
LtLt 字母、字首大寫Letter, Titlecase
LoLo 字母、其他Letter, Other
LmLm 字母、修飾詞Letter, Modifier
MnMn 記號,非間距Mark, Nonspacing
NdNd 數字、十進位數字Number, Decimal Digit
PcPc 標點符號、連接器。Punctuation, Connector. 這個分類包含十個字元,其中最常用的是 LOWLINE 字元 (),u+005F。This category includes ten characters, the most commonly used of which is the LOWLINE character (), u+005F.

如果指定了符合 ECMAScript 的行為,\w 就等於 [a-zA-Z_0-9]If ECMAScript-compliant behavior is specified, \w is equivalent to [a-zA-Z_0-9]. 如需 ECMAScript 規則運算式的資訊,請參閱規則運算式選項中的<ECMAScript 相符行為>一節。For information on ECMAScript regular expressions, see the "ECMAScript Matching Behavior" section in Regular Expression Options.

注意

由於它會比對任何文字字元,因此,如果規則運算式模式嘗試多次比對任何文字字元且後面接著特定文字字元,\w 語言項目就會經常與 lazy 數量詞搭配使用。Because it matches any word character, the \w language element is often used with a lazy quantifier if a regular expression pattern attempts to match any word character multiple times, followed by a specific word character. 如需詳細資訊,請參閱數量詞For more information, see Quantifiers.

下列範例會使用 \w 語言項目比對文字中重複的字元。The following example uses the \w language element to match duplicate characters in a word. 這個範例會定義規則運算式模式 (\w)\1,該模式解譯如下。The example defines a regular expression pattern, (\w)\1, which can be interpreted as follows.

元素Element 說明Description
(\w)(\w) 比對文字字元。Match a word character. 這是第一個擷取群組。This is the first capturing group.
\1\1 比對第一個擷取的值。Match the value of the first capture.
using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string pattern = @"(\w)\1";
      string[] words = { "trellis", "seer", "latter", "summer", 
                         "hoarse", "lesser", "aardvark", "stunned" };
      foreach (string word in words)
      {
         Match match = Regex.Match(word, pattern);
         if (match.Success)
            Console.WriteLine("'{0}' found in '{1}' at position {2}.", 
                              match.Value, word, match.Index);
         else
            Console.WriteLine("No double characters in '{0}'.", word);
      }                                                  
   }
}
// The example displays the following output:
//       'll' found in 'trellis' at position 3.
//       'ee' found in 'seer' at position 1.
//       'tt' found in 'latter' at position 2.
//       'mm' found in 'summer' at position 2.
//       No double characters in 'hoarse'.
//       'ss' found in 'lesser' at position 2.
//       'aa' found in 'aardvark' at position 0.
//       'nn' found in 'stunned' at position 3.
Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Dim pattern As String = "(\w)\1"
      Dim words() As String = { "trellis", "seer", "latter", "summer", _
                                "hoarse", "lesser", "aardvark", "stunned" }
      For Each word As String In words
         Dim match As Match = Regex.Match(word, pattern)
         If match.Success Then
            Console.WriteLine("'{0}' found in '{1}' at position {2}.", _
                              match.Value, word, match.Index)
         Else
            Console.WriteLine("No double characters in '{0}'.", word)
         End If
      Next                                                  
   End Sub
End Module
' The example displays the following output:
'       'll' found in 'trellis' at position 3.
'       'ee' found in 'seer' at position 1.
'       'tt' found in 'latter' at position 2.
'       'mm' found in 'summer' at position 2.
'       No double characters in 'hoarse'.
'       'ss' found in 'lesser' at position 2.
'       'aa' found in 'aardvark' at position 0.
'       'nn' found in 'stunned' at position 3.

非文字字元:\WNon-word character: \W

\W 會比對任何非文字字元。\W matches any non-word character. \W 語言項目相當於下列字元類別:The \W language element is equivalent to the following character class:

[^\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}\p{Pc}\p{Lm}]  

換句話說,它會比對下表所列 Unicode 分類中字元以外的所有字元。In other words, it matches any character except for those in the Unicode categories listed in the following table.

分類Category 說明Description
LlLl 字母、小寫Letter, Lowercase
LuLu 字母、大寫Letter, Uppercase
LtLt 字母、字首大寫Letter, Titlecase
LoLo 字母、其他Letter, Other
LmLm 字母、修飾詞Letter, Modifier
MnMn 記號,非間距Mark, Nonspacing
NdNd 數字、十進位數字Number, Decimal Digit
PcPc 標點符號、連接器。Punctuation, Connector. 這個分類包含十個字元,其中最常用的是 LOWLINE 字元 (),u+005F。This category includes ten characters, the most commonly used of which is the LOWLINE character (), u+005F.

如果指定了符合 ECMAScript 的行為,\W 就等於 [^a-zA-Z_0-9]If ECMAScript-compliant behavior is specified, \W is equivalent to [^a-zA-Z_0-9]. 如需 ECMAScript 規則運算式的資訊,請參閱規則運算式選項中的<ECMAScript 相符行為>一節。For information on ECMAScript regular expressions, see the "ECMAScript Matching Behavior" section in Regular Expression Options.

注意

由於它會比對任何非文字字元,因此,如果規則運算式模式嘗試多次比對任何非文字字元,且後面接著特定非文字字元,\W 語言項目就會經常與 lazy 數量詞搭配使用。Because it matches any non-word character, the \W language element is often used with a lazy quantifier if a regular expression pattern attempts to match any non-word character multiple times followed by a specific non-word character. 如需詳細資訊,請參閱數量詞For more information, see Quantifiers.

以下範例將說明 \W 字元類別。The following example illustrates the \W character class. 它會定義規則運算式模式 \b(\w+)(\W){1,2},該模式會比對後面接一個或多個非文字字元的文字,例如空白字元或標點符號。It defines a regular expression pattern, \b(\w+)(\W){1,2}, that matches a word followed by one or two non-word characters, such as white space or punctuation. 規則運算式的解譯方式如下表所示。The regular expression is interpreted as shown in the following table.

元素Element 說明Description
\b\b 開始字緣比對。Begin the match at a word boundary.
(\w+)(\w+) 比對一個或多個文字字元。Match one or more word characters. 這是第一個擷取群組。This is the first capturing group.
(\W){1,2}(\W){1,2} 比對一次或兩次非文字字元。Match a non-word character either one or two times. 這是第二個擷取群組。This is the second capturing group.
using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string pattern = @"\b(\w+)(\W){1,2}";
      string input = "The old, grey mare slowly walked across the narrow, green pasture.";
      foreach (Match match in Regex.Matches(input, pattern))
      {
         Console.WriteLine(match.Value);
         Console.Write("   Non-word character(s):");
         CaptureCollection captures = match.Groups[2].Captures;
         for (int ctr = 0; ctr < captures.Count; ctr++)
             Console.Write(@"'{0}' (\u{1}){2}", captures[ctr].Value, 
                           Convert.ToUInt16(captures[ctr].Value[0]).ToString("X4"), 
                           ctr < captures.Count - 1 ? ", " : "");
         Console.WriteLine();
      }   
   }
}
// The example displays the following output:
//       The
//          Non-word character(s):' ' (\u0020)
//       old,
//          Non-word character(s):',' (\u002C), ' ' (\u0020)
//       grey
//          Non-word character(s):' ' (\u0020)
//       mare
//          Non-word character(s):' ' (\u0020)
//       slowly
//          Non-word character(s):' ' (\u0020)
//       walked
//          Non-word character(s):' ' (\u0020)
//       across
//          Non-word character(s):' ' (\u0020)
//       the
//          Non-word character(s):' ' (\u0020)
//       narrow,
//          Non-word character(s):',' (\u002C), ' ' (\u0020)
//       green
//          Non-word character(s):' ' (\u0020)
//       pasture.
//          Non-word character(s):'.' (\u002E)
Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Dim pattern As String = "\b(\w+)(\W){1,2}"
      Dim input As String = "The old, grey mare slowly walked across the narrow, green pasture."
      For Each match As Match In Regex.Matches(input, pattern)
         Console.WriteLine(match.Value)
         Console.Write("   Non-word character(s):")
         Dim captures As CaptureCollection = match.Groups(2).Captures
         For ctr As Integer = 0 To captures.Count - 1
             Console.Write("'{0}' (\u{1}){2}", captures(ctr).Value, _
                           Convert.ToUInt16(captures(ctr).Value.Chars(0)).ToString("X4"), _
                           If(ctr < captures.Count - 1, ", ", ""))
         Next
         Console.WriteLine()
      Next
   End Sub
End Module
' The example displays the following output:
'       The
'          Non-word character(s):' ' (\u0020)
'       old,
'          Non-word character(s):',' (\u002C), ' ' (\u0020)
'       grey
'          Non-word character(s):' ' (\u0020)
'       mare
'          Non-word character(s):' ' (\u0020)
'       slowly
'          Non-word character(s):' ' (\u0020)
'       walked
'          Non-word character(s):' ' (\u0020)
'       across
'          Non-word character(s):' ' (\u0020)
'       the
'          Non-word character(s):' ' (\u0020)
'       narrow,
'          Non-word character(s):',' (\u002C), ' ' (\u0020)
'       green
'          Non-word character(s):' ' (\u0020)
'       pasture.
'          Non-word character(s):'.' (\u002E)

由於第二個擷取群組的 Group 物件只包含單一擷取的非文字字元,因此這個範例會從 CaptureCollection 屬性所傳回之 Group.Captures 物件擷取所有擷取的非文字字元。Because the Group object for the second capturing group contains only a single captured non-word character, the example retrieves all captured non-word characters from the CaptureCollection object that is returned by the Group.Captures property.

空白字元:\sWhitespace character: \s

\s 會比對任何空白字元。\s matches any whitespace character. 它相當於下表列出的逸出序列和 Unicode 分類。It is equivalent to the escape sequences and Unicode categories listed in the following table.

分類Category 說明Description
\f 換頁字元 \u000C。The form feed character, \u000C.
\n 新行字元 \u000A。The newline character, \u000A.
\r 歸位字元 \u000D。The carriage return character, \u000D.
\t 定位字元 \u0009。The tab character, \u0009.
\v 垂直定位字元 \u000B。The vertical tab character, \u000B.
\x85 省略符號或 NEXT LINE (NEL) 字元 (…) \u0085。The ellipsis or NEXT LINE (NEL) character (…), \u0085.
\p{Z} 比對任何分隔符號字元。Matches any separator character.

如果指定了符合 ECMAScript 的行為,\s 就等於 [ \f\n\r\t\v]If ECMAScript-compliant behavior is specified, \s is equivalent to [ \f\n\r\t\v]. 如需 ECMAScript 規則運算式的資訊,請參閱規則運算式選項中的<ECMAScript 相符行為>一節。For information on ECMAScript regular expressions, see the "ECMAScript Matching Behavior" section in Regular Expression Options.

以下範例將說明 \s 字元類別。The following example illustrates the \s character class. 它會定義規則運算式模式 \b\w+(e)?s(\s|$),該模式會比對結尾為 "s" 或 "es" 且後面加上空白字元或是輸入字串結尾的文字。It defines a regular expression pattern, \b\w+(e)?s(\s|$), that matches a word ending in either "s" or "es" followed by either a white-space character or the end of the input string. 規則運算式的解譯方式如下表所示。The regular expression is interpreted as shown in the following table.

元素Element 說明Description
\b\b 開始字緣比對。Begin the match at a word boundary.
\w+\w+ 比對一個或多個文字字元。Match one or more word characters.
(e)?(e)? 比對 "e" 零次或一次。Match an "e" either zero or one time.
s 比對 "s"。Match an "s".
(\s|$)(\s|$) 比對空白字元或輸入字串的結尾。Match either a white-space character or the end of the input string.
using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string pattern = @"\b\w+(e)?s(\s|$)";
      string input = "matches stores stops leave leaves";
      foreach (Match match in Regex.Matches(input, pattern))
         Console.WriteLine(match.Value);
   }
}
// The example displays the following output:
//       matches
//       stores
//       stops
//       leaves
Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Dim pattern As String = "\b\w+(e)?s(\s|$)"
      Dim input As String = "matches stores stops leave leaves"
      For Each match As Match In Regex.Matches(input, pattern)
         Console.WriteLine(match.Value)      
      Next
   End Sub
End Module
' The example displays the following output:
'       matches
'       stores
'       stops
'       leaves

非空白字元:\SNon-whitespace character: \S

\S 會比對任何非空白字元。\S matches any non-white-space character. 它相當於 [^\f\n\r\t\v\x85\p{Z}] 規則運算式模式,或是規則運算式模式的相反模式,相當於會比對空白字元的 \sIt is equivalent to the [^\f\n\r\t\v\x85\p{Z}] regular expression pattern, or the opposite of the regular expression pattern that is equivalent to \s, which matches white-space characters. 如需詳細資訊,請參閱空白字元:\sFor more information, see White-Space Character: \s.

如果指定了符合 ECMAScript 的行為,\S 就等於 [^ \f\n\r\t\v]If ECMAScript-compliant behavior is specified, \S is equivalent to [^ \f\n\r\t\v]. 如需 ECMAScript 規則運算式的資訊,請參閱規則運算式選項中的<ECMAScript 相符行為>一節。For information on ECMAScript regular expressions, see the "ECMAScript Matching Behavior" section in Regular Expression Options.

下列範例將說明 \S 語言項目。The following example illustrates the \S language element. 規則運算式模式 \b(\S+)\s? 會比對以空白字元分隔的字串。The regular expression pattern \b(\S+)\s? matches strings that are delimited by white-space characters. 在比對之 GroupCollection 物件中的第二個項目包含相符的字串。The second element in the match's GroupCollection object contains the matched string. 規則運算式的解譯方式如下表所示。The regular expression can be interpreted as shown in the following table.

元素Element 說明Description
\b 開始字緣比對。Begin the match at a word boundary.
(\S+) 比對一個或多個非空白字元。Match one or more non-white-space characters. 這是第一個擷取群組。This is the first capturing group.
\s? 比對零個或一個空白字元。Match zero or one white-space character.
using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string pattern = @"\b(\S+)\s?";
      string input = "This is the first sentence of the first paragraph. " + 
                            "This is the second sentence.\n" + 
                            "This is the only sentence of the second paragraph.";
      foreach (Match match in Regex.Matches(input, pattern))
         Console.WriteLine(match.Groups[1]);
   }
}
// The example displays the following output:
//    This
//    is
//    the
//    first
//    sentence
//    of
//    the
//    first
//    paragraph.
//    This
//    is
//    the
//    second
//    sentence.
//    This
//    is
//    the
//    only
//    sentence
//    of
//    the
//    second
//    paragraph.
Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Dim pattern As String = "\b(\S+)\s?"
      Dim input As String = "This is the first sentence of the first paragraph. " + _
                            "This is the second sentence." + vbCrLf + _
                            "This is the only sentence of the second paragraph."
      For Each match As Match In Regex.Matches(input, pattern)
         Console.WriteLine(match.Groups(1))
      Next
   End Sub
End Module
' The example displays the following output:
'    This
'    is
'    the
'    first
'    sentence
'    of
'    the
'    first
'    paragraph.
'    This
'    is
'    the
'    second
'    sentence.
'    This
'    is
'    the
'    only
'    sentence
'    of
'    the
'    second
'    paragraph.

十進位數字字元:\dDecimal digit character: \d

\d 會比對任何十進位數字。\d matches any decimal digit. 它相當於 \p{Nd} 規則運算式模式,其中包括標準的十進位數字 0-9,以及若干其他字元集的十進位數字。It is equivalent to the \p{Nd} regular expression pattern, which includes the standard decimal digits 0-9 as well as the decimal digits of a number of other character sets.

如果指定了符合 ECMAScript 的行為,\d 就等於 [0-9]If ECMAScript-compliant behavior is specified, \d is equivalent to [0-9]. 如需 ECMAScript 規則運算式的資訊,請參閱規則運算式選項中的<ECMAScript 相符行為>一節。For information on ECMAScript regular expressions, see the "ECMAScript Matching Behavior" section in Regular Expression Options.

下列範例將說明 \d 語言項目。The following example illustrates the \d language element. 它會測試輸入字串是否表示美國和加拿大的有效電話號碼。It tests whether an input string represents a valid telephone number in the United States and Canada. 規則運算式模式 ^(\(?\d{3}\)?[\s-])?\d{3}-\d{4}$ 的定義如下表所示。The regular expression pattern ^(\(?\d{3}\)?[\s-])?\d{3}-\d{4}$ is defined as shown in the following table.

項目Element 說明Description
^ 在輸入字串的開頭開始比對。Begin the match at the beginning of the input string.
\(? 比對零個或一個常值 "(" 字元。Match zero or one literal "(" character.
\d{3} 比對三個十進位數字。Match three decimal digits.
\)? 比對零個或一個常值 ")" 字元。Match zero or one literal ")" character.
[\s-] 比對連字號或空白字元。Match a hyphen or a white-space character.
(\(?\d{3}\)?[\s-])? 比對零次或一次選擇性的左括號,後面接著三個十進位數字、選擇性的右括號,以及空白字元或是連字號。Match an optional opening parenthesis followed by three decimal digits, an optional closing parenthesis, and either a white-space character or a hyphen zero or one time. 這是第一個擷取群組。This is the first capturing group.
\d{3}-\d{4} 比對三個十進位數字,後面接著連字號和另外四個十進位數字。Match three decimal digits followed by a hyphen and four more decimal digits.
$ 比對輸入字串的結尾。Match the end of the input string.
using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string pattern = @"^(\(?\d{3}\)?[\s-])?\d{3}-\d{4}$";
      string[] inputs = { "111 111-1111", "222-2222", "222 333-444", 
                          "(212) 111-1111", "111-AB1-1111", 
                          "212-111-1111", "01 999-9999" };
      
      foreach (string input in inputs)
      {
         if (Regex.IsMatch(input, pattern)) 
            Console.WriteLine(input + ": matched");
         else
            Console.WriteLine(input + ": match failed");
      }
   }
}
// The example displays the following output:
//       111 111-1111: matched
//       222-2222: matched
//       222 333-444: match failed
//       (212) 111-1111: matched
//       111-AB1-1111: match failed
//       212-111-1111: matched
//       01 999-9999: match failed
Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Dim pattern As String = "^(\(?\d{3}\)?[\s-])?\d{3}-\d{4}$"
      Dim inputs() As String = { "111 111-1111", "222-2222", "222 333-444", _
                                 "(212) 111-1111", "111-AB1-1111", _
                                 "212-111-1111", "01 999-9999" }
      
      For Each input As String In inputs
         If Regex.IsMatch(input, pattern) Then 
            Console.WriteLine(input + ": matched")
         Else
            Console.WriteLine(input + ": match failed")
         End If   
      Next
   End Sub
End Module
' The example displays the following output:
'       111 111-1111: matched
'       222-2222: matched
'       222 333-444: match failed
'       (212) 111-1111: matched
'       111-AB1-1111: match failed
'       212-111-1111: matched
'       01 999-9999: match failed

非數字字元:\DNon-digit character: \D

\D 會比對任何非數字字元。\D matches any non-digit character. 它相當於 \P{Nd} 規則運算式模式。It is equivalent to the \P{Nd} regular expression pattern.

如果指定了符合 ECMAScript 的行為,\D 就等於 [^0-9]If ECMAScript-compliant behavior is specified, \D is equivalent to [^0-9]. 如需 ECMAScript 規則運算式的資訊,請參閱規則運算式選項中的<ECMAScript 相符行為>一節。For information on ECMAScript regular expressions, see the "ECMAScript Matching Behavior" section in Regular Expression Options.

下列範例將說明 \D 語言項目。The following example illustrates the \D language element. 它會測試像是組件編號這類字串,是否由十進位和非十進位字元的適當組合所構成。It tests whether a string such as a part number consists of the appropriate combination of decimal and non-decimal characters. 規則運算式模式 ^\D\d{1,5}\D*$ 的定義如下表所示。The regular expression pattern ^\D\d{1,5}\D*$ is defined as shown in the following table.

項目Element 說明Description
^ 在輸入字串的開頭開始比對。Begin the match at the beginning of the input string.
\D 比對非數字字元。Match a non-digit character.
\d{1,5} 比對從一個到五個十進位數字。Match from one to five decimal digits.
\D* 比對零個、一個或更多非十進位字元。Match zero, one, or more non-decimal characters.
$ 比對輸入字串的結尾。Match the end of the input string.
using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string pattern = @"^\D\d{1,5}\D*$"; 
      string[] inputs = { "A1039C", "AA0001", "C18A", "Y938518" }; 
      
      foreach (string input in inputs)
      {
         if (Regex.IsMatch(input, pattern))
            Console.WriteLine(input + ": matched");
         else
            Console.WriteLine(input + ": match failed");
      }
   }
}
// The example displays the following output:
//       A1039C: matched
//       AA0001: match failed
//       C18A: matched
//       Y938518: match failed
Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Dim pattern As String = "^\D\d{1,5}\D*$" 
      Dim inputs() As String = { "A1039C", "AA0001", "C18A", "Y938518" } 
      
      For Each input As String In inputs
         If Regex.IsMatch(input, pattern) Then
            Console.WriteLine(input + ": matched")
         Else
            Console.WriteLine(input + ": match failed")
         End If   
      Next
   End Sub
End Module
' The example displays the following output:

支援的 Unicode 一般分類Supported Unicode general categories

Unicode 定義了下表中所列的一般類別。Unicode defines the general categories listed in the following table. 如需詳細資訊,請參閱 Unicode Character Database 中的 "UCD File Format" 和 "General Category Values" 副標題。For more information, see the "UCD File Format" and "General Category Values" subtopics at the Unicode Character Database.

分類Category 說明Description
Lu 字母、大寫Letter, Uppercase
Ll 字母、小寫Letter, Lowercase
Lt 字母、字首大寫Letter, Titlecase
Lm 字母、修飾詞Letter, Modifier
Lo 字母、其他Letter, Other
L 所有字母字元。All letter characters. 這包括 LuLlLtLmLo 字元。This includes the Lu, Ll, Lt, Lm, and Lo characters.
Mn 記號,非間距Mark, Nonspacing
Mc 記號,間距組合Mark, Spacing Combining
Me 記號,封入Mark, Enclosing
M 所有變音符號記號。All diacritic marks. 這包括 MnMcMe 分類。This includes the Mn, Mc, and Me categories.
Nd 數字、十進位數字Number, Decimal Digit
Nl 數字,字母Number, Letter
No 數字,其他Number, Other
N 所有數字。All numbers. 這包括 NdNlNo 分類。This includes the Nd, Nl, and No categories.
Pc 標點符號,連接器Punctuation, Connector
Pd 標點符號,破折號Punctuation, Dash
Ps 標點符號,左括號Punctuation, Open
Pe 標點符號,右括號Punctuation, Close
Pi 標點符號,左引號 (根據使用方式,作用可能像 Ps 或 Pe)Punctuation, Initial quote (may behave like Ps or Pe depending on usage)
Pf 標點符號,右引號 (根據使用方式,作用可能像 Ps 或 Pe)Punctuation, Final quote (may behave like Ps or Pe depending on usage)
Po 標點符號,其他Punctuation, Other
P 所有標點符號字元。All punctuation characters. 這包括 PcPdPsPePiPfPo 分類。This includes the Pc, Pd, Ps, Pe, Pi, Pf, and Po categories.
Sm 符號,數學Symbol, Math
Sc 符號,貨幣Symbol, Currency
Sk 符號,修飾詞Symbol, Modifier
So 符號,其他Symbol, Other
S 所有符號。All symbols. 這包括 SmScSkSo 分類。This includes the Sm, Sc, Sk, and So categories.
Zs 分隔符號,空格Separator, Space
Zl 分隔符號,行Separator, Line
Zp 分隔符號,段落Separator, Paragraph
Z 所有分隔符號字元。All separator characters. 這包括 ZsZlZp 分類。This includes the Zs, Zl, and Zp categories.
Cc 其他,控制Other, Control
Cf 其他,格式Other, Format
Cs 其他,SurrogateOther, Surrogate
Co 其他,專用Other, Private Use
Cn 其他,未指派 (沒有字元擁有這個屬性)Other, Not Assigned (no characters have this property)
C 所有控制字元。All control characters. 這包括 CcCfCsCoCn 分類。This includes the Cc, Cf, Cs, Co, and Cn categories.

您可以將任何特殊字元傳遞到 GetUnicodeCategory 方法,以判斷該字元的 Unicode 分類。You can determine the Unicode category of any particular character by passing that character to the GetUnicodeCategory method. 下列範例會使用 GetUnicodeCategory 方法判斷包含所選取拉丁字元的陣列中,每個項目的分類。The following example uses the GetUnicodeCategory method to determine the category of each element in an array that contains selected Latin characters.

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      char[] chars = { 'a', 'X', '8', ',', ' ', '\u0009', '!' };
      
      foreach (char ch in chars)
         Console.WriteLine("'{0}': {1}", Regex.Escape(ch.ToString()), 
                           Char.GetUnicodeCategory(ch));
   }
}
// The example displays the following output:
//       'a': LowercaseLetter
//       'X': UppercaseLetter
//       '8': DecimalDigitNumber
//       ',': OtherPunctuation
//       '\ ': SpaceSeparator
//       '\t': Control
//       '!': OtherPunctuation
Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Dim chars() As Char = { "a"c, "X"c, "8"c, ","c, " "c, ChrW(9), "!"c }
      
      For Each ch As Char In chars
         Console.WriteLine("'{0}': {1}", Regex.Escape(ch.ToString()), _
                           Char.GetUnicodeCategory(ch))
      Next         
   End Sub
End Module
' The example displays the following output:
'       'a': LowercaseLetter
'       'X': UppercaseLetter
'       '8': DecimalDigitNumber
'       ',': OtherPunctuation
'       '\ ': SpaceSeparator
'       '\t': Control
'       '!': OtherPunctuation

支援的具名區塊Supported named blocks

.NET 提供下表所列的具名區塊。.NET provides the named blocks listed in the following table. 這一組支援的具名區塊是根據 Unicode 4.0 和 Perl 5.6。The set of supported named blocks is based on Unicode 4.0 and Perl 5.6. 針對使用具名區塊的規則運算式,請參閱 Unicode 類別或 Unicode 區塊:\p{} 一節。For a regular expression that uses named blocks, see the Unicode category or Unicode block: \p{} section.

字碼指標範圍Code point range 區塊名稱Block name
0000 - 007F0000 - 007F IsBasicLatin
0080 - 00FF0080 - 00FF IsLatin-1Supplement
0100 - 017F0100 - 017F IsLatinExtended-A
0180 - 024F0180 - 024F IsLatinExtended-B
0250 - 02AF0250 - 02AF IsIPAExtensions
02B0 - 02FF02B0 - 02FF IsSpacingModifierLetters
0300 - 036F0300 - 036F IsCombiningDiacriticalMarks
0370 - 03FF0370 - 03FF IsGreek

-或--or-

IsGreekandCoptic
0400 - 04FF0400 - 04FF IsCyrillic
0500 - 052F0500 - 052F IsCyrillicSupplement
0530 - 058F0530 - 058F IsArmenian
0590 - 05FF0590 - 05FF IsHebrew
0600 - 06FF0600 - 06FF IsArabic
0700 - 074F0700 - 074F IsSyriac
0780 - 07BF0780 - 07BF IsThaana
0900 - 097F0900 - 097F IsDevanagari
0980 - 09FF0980 - 09FF IsBengali
0A00 - 0A7F0A00 - 0A7F IsGurmukhi
0A80 - 0AFF0A80 - 0AFF IsGujarati
0B00 - 0B7F0B00 - 0B7F IsOriya
0B80 - 0BFF0B80 - 0BFF IsTamil
0C00 - 0C7F0C00 - 0C7F IsTelugu
0C80 - 0CFF0C80 - 0CFF IsKannada
0D00 - 0D7F0D00 - 0D7F IsMalayalam
0D80 - 0DFF0D80 - 0DFF IsSinhala
0E00 - 0E7F0E00 - 0E7F IsThai
0E80 - 0EFF0E80 - 0EFF IsLao
0F00 - 0FFF0F00 - 0FFF IsTibetan
1000 - 109F1000 - 109F IsMyanmar
10A0 - 10FF10A0 - 10FF IsGeorgian
1100 - 11FF1100 - 11FF IsHangulJamo
1200 - 137F1200 - 137F IsEthiopic
13A0 - 13FF13A0 - 13FF IsCherokee
1400 - 167F1400 - 167F IsUnifiedCanadianAboriginalSyllabics
1680 - 169F1680 - 169F IsOgham
16A0 - 16FF16A0 - 16FF IsRunic
1700 - 171F1700 - 171F IsTagalog
1720 - 173F1720 - 173F IsHanunoo
1740 - 175F1740 - 175F IsBuhid
1760 - 177F1760 - 177F IsTagbanwa
1780 - 17FF1780 - 17FF IsKhmer
1800 - 18AF1800 - 18AF IsMongolian
1900 - 194F1900 - 194F IsLimbu
1950 - 197F1950 - 197F IsTaiLe
19E0 - 19FF19E0 - 19FF IsKhmerSymbols
1D00 - 1D7F1D00 - 1D7F IsPhoneticExtensions
1E00 - 1EFF1E00 - 1EFF IsLatinExtendedAdditional
1F00 - 1FFF1F00 - 1FFF IsGreekExtended
2000 - 206F2000 - 206F IsGeneralPunctuation
2070 - 209F2070 - 209F IsSuperscriptsandSubscripts
20A0 - 20CF20A0 - 20CF IsCurrencySymbols
20D0 - 20FF20D0 - 20FF IsCombiningDiacriticalMarksforSymbols

-或--or-

IsCombiningMarksforSymbols
2100 - 214F2100 - 214F IsLetterlikeSymbols
2150 - 218F2150 - 218F IsNumberForms
2190 - 21FF2190 - 21FF IsArrows
2200 - 22FF2200 - 22FF IsMathematicalOperators
2300 - 23FF2300 - 23FF IsMiscellaneousTechnical
2400 - 243F2400 - 243F IsControlPictures
2440 - 245F2440 - 245F IsOpticalCharacterRecognition
2460 - 24FF2460 - 24FF IsEnclosedAlphanumerics
2500 - 257F2500 - 257F IsBoxDrawing
2580 - 259F2580 - 259F IsBlockElements
25A0 - 25FF25A0 - 25FF IsGeometricShapes
2600 - 26FF2600 - 26FF IsMiscellaneousSymbols
2700 - 27BF2700 - 27BF IsDingbats
27C0 - 27EF27C0 - 27EF IsMiscellaneousMathematicalSymbols-A
27F0 - 27FF27F0 - 27FF IsSupplementalArrows-A
2800 - 28FF2800 - 28FF IsBraillePatterns
2900 - 297F2900 - 297F IsSupplementalArrows-B
2980 - 29FF2980 - 29FF IsMiscellaneousMathematicalSymbols-B
2A00 - 2AFF2A00 - 2AFF IsSupplementalMathematicalOperators
2B00 - 2BFF2B00 - 2BFF IsMiscellaneousSymbolsandArrows
2E80 - 2EFF2E80 - 2EFF IsCJKRadicalsSupplement
2F00 - 2FDF2F00 - 2FDF IsKangxiRadicals
2FF0 - 2FFF2FF0 - 2FFF IsIdeographicDescriptionCharacters
3000 - 303F3000 - 303F IsCJKSymbolsandPunctuation
3040 - 309F3040 - 309F IsHiragana
30A0 - 30FF30A0 - 30FF IsKatakana
3100 - 312F3100 - 312F IsBopomofo
3130 - 318F3130 - 318F IsHangulCompatibilityJamo
3190 - 319F3190 - 319F IsKanbun
31A0 - 31BF31A0 - 31BF IsBopomofoExtended
31F0 - 31FF31F0 - 31FF IsKatakanaPhoneticExtensions
3200 - 32FF3200 - 32FF IsEnclosedCJKLettersandMonths
3300 - 33FF3300 - 33FF IsCJKCompatibility
3400 - 4DBF3400 - 4DBF IsCJKUnifiedIdeographsExtensionA
4DC0 - 4DFF4DC0 - 4DFF IsYijingHexagramSymbols
4E00 - 9FFF4E00 - 9FFF IsCJKUnifiedIdeographs
A000 - A48FA000 - A48F IsYiSyllables
A490 - A4CFA490 - A4CF IsYiRadicals
AC00 - D7AFAC00 - D7AF IsHangulSyllables
D800 - DB7FD800 - DB7F IsHighSurrogates
DB80 - DBFFDB80 - DBFF IsHighPrivateUseSurrogates
DC00 - DFFFDC00 - DFFF IsLowSurrogates
E000 - F8FFE000 - F8FF IsPrivateUseIsPrivateUseAreaIsPrivateUse or IsPrivateUseArea
F900 - FAFFF900 - FAFF IsCJKCompatibilityIdeographs
FB00 - FB4FFB00 - FB4F IsAlphabeticPresentationForms
FB50 - FDFFFB50 - FDFF IsArabicPresentationForms-A
FE00 - FE0FFE00 - FE0F IsVariationSelectors
FE20 - FE2FFE20 - FE2F IsCombiningHalfMarks
FE30 - FE4FFE30 - FE4F IsCJKCompatibilityForms
FE50 - FE6FFE50 - FE6F IsSmallFormVariants
FE70 - FEFFFE70 - FEFF IsArabicPresentationForms-B
FF00 - FFEFFF00 - FFEF IsHalfwidthandFullwidthForms
FFF0 - FFFFFFF0 - FFFF IsSpecials

字元類別減法:[base_group - [excluded_group]]Character class subtraction: [base_group - [excluded_group]]

字元類別會定義字元集,A character class defines a set of characters. 字元類別減法會產生字元集,這個字元集是將某一個字元類別中的字元從另一個字元類別中排除的結果。Character class subtraction yields a set of characters that is the result of excluding the characters in one character class from another character class.

字元類別減法運算式的格式如下:A character class subtraction expression has the following form:

[ base_group -[ excluded_group ]][ base_group -[ excluded_group ]]

方括號 ([]) 和連字號 (-) 為必要。The square brackets ([]) and hyphen (-) are mandatory. base_group正字元群組負字元群組The base_group is a positive character group or a negative character group. excluded_group 元件是另一個正字元群組或負字元群組,或者是另一個字元類別減法運算式 (也就是說,您可以將字元類別減法運算式設為巢狀)。The excluded_group component is another positive or negative character group, or another character class subtraction expression (that is, you can nest character class subtraction expressions).

例如,假設您有一個由 "a" 到 "z" 字元範圍組成的基底群組。For example, suppose you have a base group that consists of the character range from "a" through "z". 若要定義一組由基底群組所組成的字元,但不包括字元 "m",則使用 [a-z-[m]]To define the set of characters that consists of the base group except for the character "m", use [a-z-[m]]. 若要定義一組由基底群組所組成的字元,但不包括 "d"、"j" 和 "p" 這組字元,則使用 [a-z-[djp]]To define the set of characters that consists of the base group except for the set of characters "d", "j", and "p", use [a-z-[djp]]. 若要定義一組由基底群組所組成的字元,但不包括 "m" 到 "p" 的字元範圍,則使用 [a-z-[m-p]]To define the set of characters that consists of the base group except for the character range from "m" through "p", use [a-z-[m-p]].

請考慮使用巢狀字元類別減法運算式 [a-z-[d-w-[m-o]]]Consider the nested character class subtraction expression, [a-z-[d-w-[m-o]]]. 這個運算式會從最內部的字元範圍向外評估。The expression is evaluated from the innermost character range outward. 首先從 "d" 到 "w" 字元範圍減去 "m" 到 "o" 字元範圍,這樣會產生從 "d" 到 "l" 及從 "p" 到 "w" 的字元集。First, the character range from "m" through "o" is subtracted from the character range "d" through "w", which yields the set of characters from "d" through "l" and "p" through "w". 接著會從字元範圍 "a" 到 "z" 中減去該字元集,此時會產生 [abcmnoxyz] 字元集。That set is then subtracted from the character range from "a" through "z", which yields the set of characters [abcmnoxyz].

您可以使用任何字元類別搭配字元類別減法。You can use any character class with character class subtraction. 若要定義由 \u0000 到 \uFFFF 的所有 Unicode 字元組成的字元集,但是不包含空白字元 (\s)、標點符號一般分類內的字元 (\p{P})、IsGreek 具名區塊內的字元 (\p{IsGreek}) 以及 Unicode NEXT LINE 控制字元 (\x85),請使用 [\u0000-\uFFFF-[\s\p{P}\p{IsGreek}\x85]]To define the set of characters that consists of all Unicode characters from \u0000 through \uFFFF except white-space characters (\s), the characters in the punctuation general category (\p{P}), the characters in the IsGreek named block (\p{IsGreek}), and the Unicode NEXT LINE control character (\x85), use [\u0000-\uFFFF-[\s\p{P}\p{IsGreek}\x85]].

為字元類別減法運算式選擇將會產生有用結果的字元類別,Choose character classes for a character class subtraction expression that will yield useful results. 避免選擇會產生空字元集的運算式,該運算式無法比對任何項目,也不要選擇相當於原始基底群組的運算式。Avoid an expression that yields an empty set of characters, which cannot match anything, or an expression that is equivalent to the original base group. 例如,空集合是 [\p{IsBasicLatin}-[\x00-\x7F]] 運算式的結果,該運算式會從 IsBasicLatin 一般分類中減去 IsBasicLatin 字元範圍中的所有字元。For example, the empty set is the result of the expression [\p{IsBasicLatin}-[\x00-\x7F]], which subtracts all characters in the IsBasicLatin character range from the IsBasicLatin general category. 同樣地,原始基底群組是 [a-z-[0-9]] 運算式的結果。Similarly, the original base group is the result of the expression [a-z-[0-9]]. 這是因為基底群組就是從 "a" 到 "z" 的字母字元範圍,該群組不包含已排除之群組中的任何字元,也就是從 "0" 到 "9" 的十進位數字字元範圍。This is because the base group, which is the character range of letters from "a" through "z", does not contain any characters in the excluded group, which is the character range of decimal digits from "0" through "9".

下列範例會定義規則運算式 (^[0-9-[2468]]+$),該運算式會比對輸入字串中的零和奇數數字。The following example defines a regular expression, ^[0-9-[2468]]+$, that matches zero and odd digits in an input string. 規則運算式的解譯方式如下表所示。The regular expression is interpreted as shown in the following table.

元素Element 說明Description
^ 從輸入字串開頭開始比對。Begin the match at the start of the input string.
[0-9-[2468]]+ 比對 0 到 9 中不包括 2、4、6 和 8 的任何出現一次或多次的字元。Match one or more occurrences of any character from 0 to 9 except for 2, 4, 6, and 8. 換句話說,就是比對出現一次或多次的零或奇數。In other words, match one or more occurrences of zero or an odd digit.
$ 在輸入字串結尾結束比對。End the match at the end of the input string.
using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string[] inputs = { "123", "13579753", "3557798", "335599901" };
      string pattern = @"^[0-9-[2468]]+$";
      
      foreach (string input in inputs)
      {
         Match match = Regex.Match(input, pattern);
         if (match.Success) 
            Console.WriteLine(match.Value);
      }      
   }
}
// The example displays the following output:
//       13579753
//       335599901
Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Dim inputs() As String = { "123", "13579753", "3557798", "335599901" }
      Dim pattern As String = "^[0-9-[2468]]+$"
      
      For Each input As String In inputs
         Dim match As Match = Regex.Match(input, pattern)
         If match.Success Then Console.WriteLine(match.Value)
      Next
   End Sub
End Module
' The example displays the following output:
'       13579753
'       335599901

另請參閱See also