規則運算式中的反向參考建構Backreference Constructs in Regular Expressions

反向參考提供便利的方式來識別字串內的重複字元或子字串。Backreferences provide a convenient way to identify a repeated character or substring within a string. 例如,如果輸入字串包含多次出現的任意子字串,您可以比對第一個出現的子字串與擷取的群組,接著使用反向參考來比對隨後出現的子字串。For example, if the input string contains multiple occurrences of an arbitrary substring, you can match the first occurrence with a capturing group, and then use a backreference to match subsequent occurrences of the substring.

注意

對於取代字串中的具名和編號擷取群組,會使用不同的語法來參考。A separate syntax is used to refer to named and numbered capturing groups in replacement strings. 如需詳細資訊,請參閱替代For more information, see Substitutions.

.NET 會定義個別的語言項目,以參考編號和具名擷取群組。.NET defines separate language elements to refer to numbered and named capturing groups. 如需擷取群組的詳細資訊,請參閱群組建構For more information about capturing groups, see Grouping Constructs.

編號反向參考Numbered Backreferences

編號反向參考會使用下列語法:A numbered backreference uses the following syntax:

\ number\ number

其中 number 是規則運算式中的擷取群組序數位置。where number is the ordinal position of the capturing group in the regular expression. 例如,\4 會比對第四個擷取群組的內容。For example, \4 matches the contents of the fourth capturing group. 如果規則運算式模式中未定義 number,便會發生剖析錯誤,而規則運算式引擎會擲回 ArgumentExceptionIf number is not defined in the regular expression pattern, a parsing error occurs, and the regular expression engine throws an ArgumentException. 例如,規則運算式 \b(\w+)\s\1 有效,因為 (\w+) 是運算式中第一個和唯一的擷取群組。For example, the regular expression \b(\w+)\s\1 is valid, because (\w+) is the first and only capturing group in the expression. 另一方面,\b(\w+)\s\2 無效並擲回引數例外狀況,因為沒有編號為 \2 的擷取群組。On the other hand, \b(\w+)\s\2 is invalid and throws an argument exception, because there is no capturing group numbered \2. 此外,如果 number 識別在特定序數位置的擷取群組,但擷取群組已經被指派和其序數順序不同的數值名稱,則規則運算式剖析器也會擲回 ArgumentExceptionIn addition, if number identifies a capturing group in a particular ordinal position, but that capturing group has been assigned a numeric name different than its ordinal position, the regular expression parser also throws an ArgumentException.

請注意八進位逸出字碼 (例如 \16) 與使用相同標記法之 \number 反向參考間的模稜兩可。Note the ambiguity between octal escape codes (such as \16) and \number backreferences that use the same notation. 這個模棱兩可的情況已解決,如下所示:This ambiguity is resolved as follows:

  • 運算式 \1\9 一律會解譯為反向參考,而不是八進位字碼。The expressions \1 through \9 are always interpreted as backreferences, and not as octal codes.

  • 如果多位數運算式的第一個數字是 8 或 9 (例如 \80\91),該運算式會解譯為常值。If the first digit of a multidigit expression is 8 or 9 (such as \80 or \91), the expression as interpreted as a literal.

  • \10 到更大值的運算式會視為反向參考 (如果有對應至該數字的反向參考),否則會解譯為八進位字碼。Expressions from \10 and greater are considered backreferences if there is a backreference corresponding to that number; otherwise, they are interpreted as octal codes.

  • 如果規則運算式包含未定義之群組號碼的反向參考,便會發生剖析錯誤,而規則運算式引擎會擲回 ArgumentExceptionIf a regular expression contains a backreference to an undefined group number, a parsing error occurs, and the regular expression engine throws an ArgumentException.

如果模擬兩可會造成問題,您可以使用 \k<name> 標記法,這樣就不會造成模擬兩可,而且不會與八進位字元碼混淆。If the ambiguity is a problem, you can use the \k<name> notation, which is unambiguous and cannot be confused with octal character codes. 同樣地,十六進位字碼 (例如 \xdd) 不會不明確,而且不會與反向參考混淆。Similarly, hexadecimal codes such as \xdd are unambiguous and cannot be confused with backreferences.

下列範例會在字串中尋找雙字組字元。The following example finds doubled word characters in a string. 它會定義由下列項目組成的規則運算式 (\w)\1It defines a regular expression, (\w)\1, which consists of the following elements.

項目Element 描述Description
(\w) 比對文字字元,並將其指派給第一個擷取群組。Match a word character and assign it to the first capturing group.
\1 比對與第一個擷取群組之值相同的下一個字元。Match the next character that is the same as the value of the first capturing group.
using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string pattern = @"(\w)\1";
      string input = "trellis llama webbing dresser swagger";
      foreach (Match match in Regex.Matches(input, pattern))
         Console.WriteLine("Found '{0}' at position {1}.", 
                           match.Value, match.Index);
   }
}
// The example displays the following output:
//       Found 'll' at position 3.
//       Found 'll' at position 8.
//       Found 'bb' at position 16.
//       Found 'ss' at position 25.
//       Found 'gg' at position 33.
Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Dim pattern As String = "(\w)\1"
      Dim input As String = "trellis llama webbing dresser swagger"
      For Each match As Match In Regex.Matches(input, pattern)
         Console.WriteLine("Found '{0}' at position {1}.", _
                           match.Value, match.Index)
      Next   
   End Sub
End Module
' The example displays the following output:
'       Found 'll' at position 3.
'       Found 'll' at position 8.
'       Found 'bb' at position 16.
'       Found 'ss' at position 25.
'       Found 'gg' at position 33.

具名反向參考Named Backreferences

具名反向參考是使用下列語法來定義:A named backreference is defined by using the following syntax:

\k< name >\k< name >

或:or:

\k' name '\k' name '

其中 name 是規則運算式模式中所定義之擷取群組的名稱。where name is the name of a capturing group defined in the regular expression pattern. 如果規則運算式模式中未定義 name,便會發生剖析錯誤,而規則運算式引擎會擲回 ArgumentExceptionIf name is not defined in the regular expression pattern, a parsing error occurs, and the regular expression engine throws an ArgumentException.

下列範例會在字串中尋找雙字組字元。The following example finds doubled word characters in a string. 它會定義由下列項目組成的規則運算式 (?<char>\w)\k<char>It defines a regular expression, (?<char>\w)\k<char>, which consists of the following elements.

項目Element 描述Description
(?<char>\w) 比對字組字元,並將其指派給名為 char 的擷取群組。Match a word character and assign it to a capturing group named char.
\k<char> 比對下一個與 char 擷取群組值相同的字元。Match the next character that is the same as the value of the char capturing group.
using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string pattern = @"(?<char>\w)\k<char>";
      string input = "trellis llama webbing dresser swagger";
      foreach (Match match in Regex.Matches(input, pattern))
         Console.WriteLine("Found '{0}' at position {1}.", 
                           match.Value, match.Index);
   }
}
// The example displays the following output:
//       Found 'll' at position 3.
//       Found 'll' at position 8.
//       Found 'bb' at position 16.
//       Found 'ss' at position 25.
//       Found 'gg' at position 33.
Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Dim pattern As String = "(?<char>\w)\k<char>"
      Dim input As String = "trellis llama webbing dresser swagger"
      For Each match As Match In Regex.Matches(input, pattern)
         Console.WriteLine("Found '{0}' at position {1}.", _
                           match.Value, match.Index)
      Next   
   End Sub
End Module
' The example displays the following output:
'       Found 'll' at position 3.
'       Found 'll' at position 8.
'       Found 'bb' at position 16.
'       Found 'ss' at position 25.
'       Found 'gg' at position 33.

具名的數值反向參考Named numeric backreferences

在含有 \k 的具名反向參考中,name 也可以是數字的字串表示。In a named backreference with \k, name can also be the string representation of a number. 例如,下列範例會使用規則運算式 (?<2>\w)\k<2> 來尋找字串中的雙字組字元。For example, the following example uses the regular expression (?<2>\w)\k<2> to find doubled word characters in a string. 在此案例中,範例定義了明確地命名為 "2" 的擷取群組,而反向參考也相對應地命名為 "2"。In this case, the example defines a capturing group that is explicitly named "2", and the backreference is correspondingly named "2".

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string pattern = @"(?<2>\w)\k<2>";
      string input = "trellis llama webbing dresser swagger";
      foreach (Match match in Regex.Matches(input, pattern))
         Console.WriteLine("Found '{0}' at position {1}.", 
                           match.Value, match.Index);
   }
}
// The example displays the following output:
//       Found 'll' at position 3.
//       Found 'll' at position 8.
//       Found 'bb' at position 16.
//       Found 'ss' at position 25.
//       Found 'gg' at position 33.
Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Dim pattern As String = "(?<2>\w)\k<2>"
      Dim input As String = "trellis llama webbing dresser swagger"
      For Each match As Match In Regex.Matches(input, pattern)
         Console.WriteLine("Found '{0}' at position {1}.", _
                           match.Value, match.Index)
      Next   
   End Sub
End Module
' The example displays the following output:
'       Found 'll' at position 3.
'       Found 'll' at position 8.
'       Found 'bb' at position 16.
'       Found 'ss' at position 25.
'       Found 'gg' at position 33.

如果 name 是數值的字串表示,且沒有擷取群組具有該名稱,則 \k<name> 和反向參考 \number 是相同的,其中 number 是擷取的序數位置。If name is the string representation of a number, and no capturing group has that name, \k<name> is the same as the backreference \number, where number is the ordinal position of the capture. 在下列範例中,有名為 char 的單一擷取群組。In the following example, there is a single capturing group named char. 反向參考建構將它參考為 \k<1>The backreference construct refers to it as \k<1>. 如範例的輸出所示,因為 char 是第一個擷取群組,因此對 Regex.IsMatch 的呼叫會成功。As the output from the example shows, the call to the Regex.IsMatch succeeds because char is the first capturing group.

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      Console.WriteLine(Regex.IsMatch("aa", @"(?<char>\w)\k<1>"));    
      // Displays "True".
   }
}


Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Console.WriteLine(Regex.IsMatch("aa", "(?<char>\w)\k<1>"))    
      ' Displays "True".
   End Sub
End Module

不過,如果 name 是數值的字串表示,且在該位置的擷取群組已經明確地被指派數值名稱,則規則運算式剖析器無法根據擷取群組的序數位置識別它。However, if name is the string representation of a number and the capturing group in that position has been explicitly assigned a numeric name, the regular expression parser cannot identify the capturing group by its ordinal position. 相反地,它會擲回 ArgumentExceptionInstead, it throws an ArgumentException. 下列範例中的唯一捕捉群組會命名為 "2"。The only capturing group in the following example is named "2". 因為 \k 建構是用來定義名為 "1" 的反向參考,所以規則運算式剖析器無法識別第一個擷取群組並擲回例外狀況。Because the \k construct is used to define a backreference named "1", the regular expression parser is unable to identify the first capturing group and throws an exception.

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      Console.WriteLine(Regex.IsMatch("aa", @"(?<2>\w)\k<1>"));    
      // Throws an ArgumentException.
   }
}


Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Console.WriteLine(Regex.IsMatch("aa", "(?<2>\w)\k<1>"))    
      ' Throws an ArgumentException.
   End Sub
End Module

反向參考比對的項目What Backreferences Match

反向參考會參考群組最近使用的定義 (由左至右比對時,最接近左邊的定義)。A backreference refers to the most recent definition of a group (the definition most immediately to the left, when matching left to right). 當群組進行多個擷取時,反向參考會參考最近發生的擷取。When a group makes multiple captures, a backreference refers to the most recent capture.

下列範例包含規則運算式模式 (?<1>a)(?<1>\1b)*,此模式可重新定義 \1 具名群組。The following example includes a regular expression pattern, (?<1>a)(?<1>\1b)*, which redefines the \1 named group. 下表說明規則運算式中的每個模式。The following table describes each pattern in the regular expression.

模式Pattern 描述Description
(?<1>a) 比對字元 "a",並將結果指派給名為 1 的擷取群組。Match the character "a" and assign the result to the capturing group named 1.
(?<1>\1b)* 比對 0 或多次出現名為 1 和 "b" 之群組的情況,並將結果指派給名為 1 的擷取群組。Match zero or more occurrences of the group named 1 along with a "b", and assign the result to the capturing group named 1.
using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string pattern = @"(?<1>a)(?<1>\1b)*";
      string input = "aababb";
      foreach (Match match in Regex.Matches(input, pattern))
      {
         Console.WriteLine("Match: " + match.Value);
         foreach (Group group in match.Groups)
            Console.WriteLine("   Group: " + group.Value);
      }
   }
}
// The example displays the following output:
//          Group: aababb
//          Group: abb
Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Dim pattern As String = "(?<1>a)(?<1>\1b)*"
      Dim input As String = "aababb"
      For Each match As Match In Regex.Matches(input, pattern)
         Console.WriteLine("Match: " + match.Value)
         For Each group As Group In match.Groups
            Console.WriteLIne("   Group: " + group.Value)
         Next
      Next
   End Sub
End Module
' The example display the following output:
'          Group: aababb
'          Group: abb

在比較規則運算式與輸入字串 ("aababb") 時,規則運算式引擎會執行下列作業:In comparing the regular expression with the input string ("aababb"), the regular expression engine performs the following operations:

  1. 它會從該字串的開頭開始,並且成功地比對 "a" 與運算式 (?<1>a)It starts at the beginning of the string, and successfully matches "a" with the expression (?<1>a). 1 群組的值現在是 "a"。The value of the 1 group is now "a".

  2. 它會前進到第二個字元,並且成功地比對字串 "ab" 與運算式 \1b,或是 "ab"。It advances to the second character, and successfully matches the string "ab" with the expression \1b, or "ab". 然後它會將結果 "ab" 指派給 \1It then assigns the result, "ab" to \1.

  3. 它會前進到第四個字元。It advances to the fourth character. 運算式 (?<1>\1b)* 要比對零次以上,才算成功地比對字串 "abb" 與運算式 \1bThe expression (?<1>\1b)* is to be matched zero or more times, so it successfully matches the string "abb" with the expression \1b. 然後它會將結果 "abb" 指派回到 \1It assigns the result, "abb", back to \1.

在此範例中,* 是迴圈數量詞,它會重複評估直到規則運算式引擎無法符合其所定義的模式為止。In this example, * is a looping quantifier -- it is evaluated repeatedly until the regular expression engine cannot match the pattern it defines. 迴圈數量詞並不會清除群組定義。Looping quantifiers do not clear group definitions.

如果群組沒有擷取任何子字串,該群組的反向參考會是未定義的,而且永遠不會進行比對。If a group has not captured any substrings, a backreference to that group is undefined and never matches. 這可由定義如下的規則運算式模式 \b(\p{Lu}{2})(\d{2})?(\p{Lu}{2})\b 來說明:This is illustrated by the regular expression pattern \b(\p{Lu}{2})(\d{2})?(\p{Lu}{2})\b, which is defined as follows:

模式Pattern 描述Description
\b 開始字邊界比對。Begin the match on a word boundary.
(\p{Lu}{2}) 比對兩個大寫字母。Match two uppercase letters. 這是第一個擷取群組。This is the first capturing group.
(\d{2})? 比對出現零次或一次的兩個十進位數字。Match zero or one occurrence of two decimal digits. 這是第二個擷取群組。This is the second capturing group.
(\p{Lu}{2}) 比對兩個大寫字母。Match two uppercase letters. 這是第三個擷取群組。This is the third capturing group.
\b 結束字邊界比對。End the match on a word boundary.

輸入字串可以符合這個規則運算式,即使不存在第二個擷取群組所定義的兩個十進位數字亦然。An input string can match this regular expression even if the two decimal digits that are defined by the second capturing group are not present. 下列範例顯示即使比對成功,仍會在兩個成功的擷取群組之間找到空的擷取群組。The following example shows that even though the match is successful, an empty capturing group is found between two successful capturing groups.

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string pattern = @"\b(\p{Lu}{2})(\d{2})?(\p{Lu}{2})\b";
      string[] inputs = { "AA22ZZ", "AABB" };
      foreach (string input in inputs)
      {
         Match match = Regex.Match(input, pattern);
         if (match.Success)
         {
            Console.WriteLine("Match in {0}: {1}", input, match.Value);
            if (match.Groups.Count > 1)
            {
               for (int ctr = 1; ctr <= match.Groups.Count - 1; ctr++)
               {
                  if (match.Groups[ctr].Success)
                     Console.WriteLine("Group {0}: {1}", 
                                       ctr, match.Groups[ctr].Value);
                  else
                     Console.WriteLine("Group {0}: <no match>", ctr);
               }
            }
         }
         Console.WriteLine();
      }      
   }
}
// The example displays the following output:
//       Match in AA22ZZ: AA22ZZ
//       Group 1: AA
//       Group 2: 22
//       Group 3: ZZ
//       
//       Match in AABB: AABB
//       Group 1: AA
//       Group 2: <no match>
//       Group 3: BB
Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Dim pattern As String = "\b(\p{Lu}{2})(\d{2})?(\p{Lu}{2})\b"
      Dim inputs() As String = { "AA22ZZ", "AABB" }
      For Each input As String In inputs
         Dim match As Match = Regex.Match(input, pattern)
         If match.Success Then
            Console.WriteLine("Match in {0}: {1}", input, match.Value)
            If match.Groups.Count > 1 Then
               For ctr As Integer = 1 To match.Groups.Count - 1
                  If match.Groups(ctr).Success Then
                     Console.WriteLine("Group {0}: {1}", _
                                       ctr, match.Groups(ctr).Value)
                  Else
                     Console.WriteLine("Group {0}: <no match>", ctr)
                  End If      
               Next
            End If
         End If
         Console.WriteLine()
      Next      
   End Sub
End Module
' The example displays the following output:
'       Match in AA22ZZ: AA22ZZ
'       Group 1: AA
'       Group 2: 22
'       Group 3: ZZ
'       
'       Match in AABB: AABB
'       Group 1: AA
'       Group 2: <no match>
'       Group 3: BB

請參閱See also