在規則運算式中執行字元逸出Character Escapes in Regular Expressions

規則運算式中的反斜線 (\) 表示下列其中之一:The backslash (\) in a regular expression indicates one of the following:

  • 它後面所接的字元是特殊字元,如下節中的資料表所示。The character that follows it is a special character, as shown in the table in the following section. 比方說,\b 是表示規則運算式比對應該在文字邊界上開始的一個錨點,\t 代表索引標籤,而 \x020 代表空間。For example, \b is an anchor that indicates that a regular expression match should begin on a word boundary, \t represents a tab, and \x020 represents a space.

  • 一個字元應該依其字面來解譯,否則會被解譯為未逸出的語言結構。A character that otherwise would be interpreted as an unescaped language construct should be interpreted literally. 例如,括號 ({) 開始定義數量詞,但是反斜線後面接著一個括號 (\{) 則表示規則運算式引擎應該與括號相符。For example, a brace ({) begins the definition of a quantifier, but a backslash followed by a brace (\{) indicates that the regular expression engine should match the brace. 同樣地,單一反斜線標記逸出的語言建構之開頭,但兩個反斜線 (\\) 表示規則運算式引擎應該符合反斜線。Similarly, a single backslash marks the beginning of an escaped language construct, but two backslashes (\\) indicate that the regular expression engine should match the backslash.

注意

逸出字元會在規則運算式模式而不是在取代模式中被辨識。Character escapes are recognized in regular expression patterns but not in replacement patterns.

.NET 中的逸出字元Character Escapes in .NET

下表列出 .NET 中的規則運算式所支援的逸出字元。The following table lists the character escapes supported by regular expressions in .NET.

字元或序列Character or sequence 說明Description
下列字元以外的所有字元:All characters except for the following:

. $ ^ { [ ( | ) * + ?$ ^ { [ ( | ) * + ? \
不同於列在 [字元或序列] 資料行中的其他字元在規則運算式中沒有任何特殊的意義;它們符合其本身。Characters other than those listed in the Character or sequence column have no special meaning in regular expressions; they match themselves.

[字元或序列] 資料行中所包含的字元是規則運算式的特殊語言項目。The characters included in the Character or sequence column are special regular expression language elements. 若要在規則運算式中進行比對,它們必須逸出或包含在正字元群組To match them in a regular expression, they must be escaped or included in a positive character group. 例如,規則運算式 \$\d+[$]\d+ 符合「$1200」。For example, the regular expression \$\d+ or [$]\d+ matches "$1200".
\a 符合警鈴 (警示) 字元 \u0007Matches a bell (alarm) character, \u0007.
\b [character_group] 字元類別,比對退格鍵 \u0008In a [character_group] character class, matches a backspace, \u0008. (請參閱字元類別。)在字元類別之外, \b 符合文字邊界錨點。(See Character Classes.) Outside a character class, \b is an anchor that matches a word boundary. (請參閱錨點。)(See Anchors.)
\t 符合索引標籤, \u0009Matches a tab, \u0009.
\r 符合歸位字元, \u000DMatches a carriage return, \u000D. 請注意,\r 不等於新行字元 \nNote that \r is not equivalent to the newline character, \n.
\v 符合垂直定位, \u000BMatches a vertical tab, \u000B.
\f 符合換頁字元, \u000CMatches a form feed, \u000C.
\n 符合新行字元, \u000AMatches a new line, \u000A.
\e 符合逸出字元, \u001BMatches an escape, \u001B.
\ nnn\ nnn 符合 ASCII 字元,其中 nnn 是由代表八進位字元碼的兩個或三個數字所組成。Matches an ASCII character, where nnn consists of two or three digits that represent the octal character code. 例如,\040 代表空格字元。For example, \040 represents a space character. 其若只有一個數字 (例如 \2),或其對應至擷取群組的編號,會將此建構解譯為反向參考 This construct is interpreted as a backreference if it has only one digit (for example, \2) or if it corresponds to the number of a capturing group. (請參閱反向參考建構。)(See Backreference Constructs.)
\x nn\x nn 符合 ASCII 字元,其中 nn 是兩位數的十六進位字元碼。Matches an ASCII character, where nn is a two-digit hexadecimal character code.
\c X\c X 符合 ASCII 控制字元,其中 X 是控制字元的字母。Matches an ASCII control character, where X is the letter of the control character. 例如,\cC 是 CTRL + C。For example, \cC is CTRL-C.
\u nnnn\u nnnn 符合 UTF-16 字碼單位,其值為十六進位的 nnnnMatches a UTF-16 code unit whose value is nnnn hexadecimal. 注意: .NET 不支援用來指定 Unicode 的 Perl 5 逸出字元。Note: The Perl 5 character escape that is used to specify Unicode is not supported by .NET. Perl 5 字元逸出的形式是 \x{ #### …},其中 #### 是一系列的十六進位數字。The Perl 5 character escape has the form \x{####…}, where #### is a series of hexadecimal digits. 請改用 \unnnnInstead, use \unnnn.
\ 當後面加上一個不被認為是逸出的字元時,符合該字元。When followed by a character that is not recognized as an escaped character, matches that character. 例如,\* 符合使用星號 (*),而且與 \x2A 相同。For example, \* matches an asterisk (*) and is the same as \x2A.

範例An Example

下列範例說明如何在規則運算式中使用逸出字元。The following example illustrates the use of character escapes in a regular expression. 它會剖析字串,包含在 2009 年世界上最大城市的名稱以及人口。It parses a string that contains the names of the world's largest cities and their populations in 2009. 每個城市名稱及其人口數目被 Tab (\t) 或分隔號 (| 或 \u007c) 分開。Each city name is separated from its population by a tab (\t) or a vertical bar (| or \u007c). 個別的城市及其人口是被歸位字元和換行字元分隔開的。Individual cities and their populations are separated from each other by a carriage return and line feed.

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string delimited = @"\G(.+)[\t\u007c](.+)\r?\n";
      string input = "Mumbai, India|13,922,125\t\n" + 
                            "Shanghai, China\t13,831,900\n" + 
                            "Karachi, Pakistan|12,991,000\n" + 
                            "Delhi, India\t12,259,230\n" + 
                            "Istanbul, Turkey|11,372,613\n";
      Console.WriteLine("Population of the World's Largest Cities, 2009");
      Console.WriteLine();
      Console.WriteLine("{0,-20} {1,10}", "City", "Population");
      Console.WriteLine();
      foreach (Match match in Regex.Matches(input, delimited))
         Console.WriteLine("{0,-20} {1,10}", match.Groups[1].Value, 
                                            match.Groups[2].Value);
   }
}
// The example displays the following output:
//       Population of the World's Largest Cities, 2009
//       
//       City                 Population
//       
//       Mumbai, India        13,922,125
//       Shanghai, China      13,831,900
//       Karachi, Pakistan    12,991,000
//       Delhi, India         12,259,230
//       Istanbul, Turkey     11,372,613
Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Dim delimited As String = "\G(.+)[\t\u007c](.+)\r?\n"
      Dim input As String = "Mumbai, India|13,922,125" + vbCrLf + _
                            "Shanghai, China" + vbTab + "13,831,900" + vbCrLf + _
                            "Karachi, Pakistan|12,991,000" + vbCrLf + _
                            "Delhi, India" + vbTab + "12,259,230" + vbCrLf + _
                            "Istanbul, Turkey|11,372,613" + vbCrLf
      Console.WriteLine("Population of the World's Largest Cities, 2009")
      Console.WriteLine()
      Console.WriteLine("{0,-20} {1,10}", "City", "Population")
      Console.WriteLine()
      For Each match As Match In Regex.Matches(input, delimited)
         Console.WriteLine("{0,-20} {1,10}", match.Groups(1).Value, _
                                            match.Groups(2).Value)
      Next                         
   End Sub
End Module
' The example displays the following output:
'       Population of the World's Largest Cities, 2009
'       
'       City                 Population
'       
'       Mumbai, India        13,922,125
'       Shanghai, China      13,831,900
'       Karachi, Pakistan    12,991,000
'       Delhi, India         12,259,230
'       Istanbul, Turkey     11,372,613

規則運算式 \G(.+)[\t|\u007c](.+)\r?\n 的解譯方式如下表所示。The regular expression \G(.+)[\t|\u007c](.+)\r?\n is interpreted as shown in the following table.

模式Pattern 說明Description
\G 從最後比對結束之處開始比對。Begin the match where the last match ended.
(.+) 一或多次比對任何字元。Match any character one or more times. 這是第一個擷取群組。This is the first capturing group.
[\t\u007c] 比對 Tab (\t) 或分隔號 (|)。Match a tab (\t) or a vertical bar (|).
(.+) 一或多次比對任何字元。Match any character one or more times. 這是第二個擷取群組。This is the second capturing group.
\r?\n 比對後面接著新行的歸位字元其中的零或指定項目。Match zero or one occurrence of a carriage return followed by a new line.

另請參閱See also