正则表达式中的反向引用构造Backreference Constructs in Regular Expressions

反向引用提供了标识字符串中的重复字符或子字符串的方便途径。Backreferences provide a convenient way to identify a repeated character or substring within a string. 例如,如果输入字符串包含某任意子字符串的多个匹配项,可以使用捕获组匹配第一个出现的子字符串,然后使用反向引用匹配后面出现的子字符串。For example, if the input string contains multiple occurrences of an arbitrary substring, you can match the first occurrence with a capturing group, and then use a backreference to match subsequent occurrences of the substring.

备注

单独语法用于引用替换字符串中命名的和带编号的捕获组。A separate syntax is used to refer to named and numbered capturing groups in replacement strings. 有关更多信息,请参见 替代For more information, see Substitutions.

.NET 定义引用编号和命名捕获组的单独语言元素。.NET defines separate language elements to refer to numbered and named capturing groups. 若要详细了解捕获组,请参阅分组构造For more information about capturing groups, see Grouping Constructs.

带编号的反向引用Numbered Backreferences

带编号的反向引用使用以下语法:A numbered backreference uses the following syntax:

\ 数值\ number

其中 number 是正则表达式中捕获组的序号位置。where number is the ordinal position of the capturing group in the regular expression. 例如,\4 匹配第四个捕获组的内容。For example, \4 matches the contents of the fourth capturing group. 如果正则表达式模式中未定义 number,将会发生分析错误,并且正则表达式引擎会抛出 ArgumentExceptionIf number is not defined in the regular expression pattern, a parsing error occurs, and the regular expression engine throws an ArgumentException. 例如,正则表达式 \b(\w+)\s\1 有效,因为 (\w+) 是表达式中的第一个也是唯一一个捕获组。For example, the regular expression \b(\w+)\s\1 is valid, because (\w+) is the first and only capturing group in the expression. \b(\w+)\s\2 无效,该表达式会因为没有捕获组编号 \2 而引发自变量异常。On the other hand, \b(\w+)\s\2 is invalid and throws an argument exception, because there is no capturing group numbered \2. 此外,如果 number 标识特定序号位置中的捕获组,但该捕获组已被分配了一个不同于其序号位置的数字名称,则正则表达式分析器还会引发 ArgumentExceptionIn addition, if number identifies a capturing group in a particular ordinal position, but that capturing group has been assigned a numeric name different than its ordinal position, the regular expression parser also throws an ArgumentException.

请注意八进制转义代码(如 \16)和使用相同表示法的 \number 反向引用之间的不明确问题。Note the ambiguity between octal escape codes (such as \16) and \number backreferences that use the same notation. 这种多义性可通过如下方式解决:This ambiguity is resolved as follows:

  • 表达式 \1\9 始终解释为反向应用,而不是八进制代码。The expressions \1 through \9 are always interpreted as backreferences, and not as octal codes.

  • 如果多位表达式的第一个数字是 8 或 9(如 \80\91),该表达式将解释为文本。If the first digit of a multidigit expression is 8 or 9 (such as \80 or \91), the expression as interpreted as a literal.

  • 对于编号为 \10 或更大值的表达式,如果存在与该编号对应的反向引用,则将该表达式视为反向引用;否则,将这些表达式解释为八进制代码。Expressions from \10 and greater are considered backreferences if there is a backreference corresponding to that number; otherwise, they are interpreted as octal codes.

  • 如果正则表达式包含对未定义的组成员的反向引用,将会发生分析错误,并且正则表达式引擎会抛出 ArgumentExceptionIf a regular expression contains a backreference to an undefined group number, a parsing error occurs, and the regular expression engine throws an ArgumentException.

如果存在不明确问题,可以使用 \k<name> 表示法,此表示法非常明确,不会与八进制字符代码混淆。If the ambiguity is a problem, you can use the \k<name> notation, which is unambiguous and cannot be confused with octal character codes. 同样,诸如 \xdd 的十六进制代码也是明确的,不会与反向引用混淆。Similarly, hexadecimal codes such as \xdd are unambiguous and cannot be confused with backreferences.

下面的示例查找字符串中双写的单词字符。The following example finds doubled word characters in a string. 它定义一个由下列元素组成的正则表达式 (\w)\1It defines a regular expression, (\w)\1, which consists of the following elements.

元素Element 说明Description
(\w) 匹配单词字符,并将其分配给第一个捕获组。Match a word character and assign it to the first capturing group.
\1 匹配值与第一捕获组相同的下一个字符。Match the next character that is the same as the value of the first capturing group.
using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string pattern = @"(\w)\1";
      string input = "trellis llama webbing dresser swagger";
      foreach (Match match in Regex.Matches(input, pattern))
         Console.WriteLine("Found '{0}' at position {1}.", 
                           match.Value, match.Index);
   }
}
// The example displays the following output:
//       Found 'll' at position 3.
//       Found 'll' at position 8.
//       Found 'bb' at position 16.
//       Found 'ss' at position 25.
//       Found 'gg' at position 33.
Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Dim pattern As String = "(\w)\1"
      Dim input As String = "trellis llama webbing dresser swagger"
      For Each match As Match In Regex.Matches(input, pattern)
         Console.WriteLine("Found '{0}' at position {1}.", _
                           match.Value, match.Index)
      Next   
   End Sub
End Module
' The example displays the following output:
'       Found 'll' at position 3.
'       Found 'll' at position 8.
'       Found 'bb' at position 16.
'       Found 'ss' at position 25.
'       Found 'gg' at position 33.

命名的反向引用Named Backreferences

使用以下语法定义命名的反向引用:A named backreference is defined by using the following syntax:

\k< name >\k< name >

或:or:

\k' name '\k' name '

其中,name 是正则表达式模式中定义的捕获组的名称。where name is the name of a capturing group defined in the regular expression pattern. 如果正则表达式模式中未定义 name,将会发生分析错误,并且正则表达式引擎会抛出 ArgumentExceptionIf name is not defined in the regular expression pattern, a parsing error occurs, and the regular expression engine throws an ArgumentException.

下面的示例查找字符串中双写的单词字符。The following example finds doubled word characters in a string. 它定义一个由下列元素组成的正则表达式 (?<char>\w)\k<char>It defines a regular expression, (?<char>\w)\k<char>, which consists of the following elements.

元素Element 说明Description
(?<char>\w) 匹配字词字符,并将结果分配到 char 捕获组。Match a word character and assign it to a capturing group named char.
\k<char> 匹配下一个与 char 捕获组的值相同的字符。Match the next character that is the same as the value of the char capturing group.
using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string pattern = @"(?<char>\w)\k<char>";
      string input = "trellis llama webbing dresser swagger";
      foreach (Match match in Regex.Matches(input, pattern))
         Console.WriteLine("Found '{0}' at position {1}.", 
                           match.Value, match.Index);
   }
}
// The example displays the following output:
//       Found 'll' at position 3.
//       Found 'll' at position 8.
//       Found 'bb' at position 16.
//       Found 'ss' at position 25.
//       Found 'gg' at position 33.
Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Dim pattern As String = "(?<char>\w)\k<char>"
      Dim input As String = "trellis llama webbing dresser swagger"
      For Each match As Match In Regex.Matches(input, pattern)
         Console.WriteLine("Found '{0}' at position {1}.", _
                           match.Value, match.Index)
      Next   
   End Sub
End Module
' The example displays the following output:
'       Found 'll' at position 3.
'       Found 'll' at position 8.
'       Found 'bb' at position 16.
'       Found 'ss' at position 25.
'       Found 'gg' at position 33.

已命名数值的反向引用Named numeric backreferences

在具有 \k 的已命名反向引用中,name 也可以是 number 的字符串表示形式。In a named backreference with \k, name can also be the string representation of a number. 例如,下面的示例使用正则表达式 (?<2>\w)\k<2> 查找字符串中双写的单词字符。For example, the following example uses the regular expression (?<2>\w)\k<2> to find doubled word characters in a string. 在此情况下,该示例定义了显式命名为“2”的捕获组,反向引用相应地命名为“2”。In this case, the example defines a capturing group that is explicitly named "2", and the backreference is correspondingly named "2".

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string pattern = @"(?<2>\w)\k<2>";
      string input = "trellis llama webbing dresser swagger";
      foreach (Match match in Regex.Matches(input, pattern))
         Console.WriteLine("Found '{0}' at position {1}.", 
                           match.Value, match.Index);
   }
}
// The example displays the following output:
//       Found 'll' at position 3.
//       Found 'll' at position 8.
//       Found 'bb' at position 16.
//       Found 'ss' at position 25.
//       Found 'gg' at position 33.
Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Dim pattern As String = "(?<2>\w)\k<2>"
      Dim input As String = "trellis llama webbing dresser swagger"
      For Each match As Match In Regex.Matches(input, pattern)
         Console.WriteLine("Found '{0}' at position {1}.", _
                           match.Value, match.Index)
      Next   
   End Sub
End Module
' The example displays the following output:
'       Found 'll' at position 3.
'       Found 'll' at position 8.
'       Found 'bb' at position 16.
'       Found 'ss' at position 25.
'       Found 'gg' at position 33.

如果 name 是 number 的字符串表示形式,且没有捕获组具有该名称,\k< name > 与反向引用 \number 相同,其中 number 是捕获的序号位置。If name is the string representation of a number, and no capturing group has that name, \k<name> is the same as the backreference \number, where number is the ordinal position of the capture. 在以下示例中,有名为 char 的单个捕获组。In the following example, there is a single capturing group named char. 反向引用构造将其称为 \k<1>The backreference construct refers to it as \k<1>. 正如示例中的输出所示,由于 char 是第一个捕获组,所以对 Regex.IsMatch 的调用成功。As the output from the example shows, the call to the Regex.IsMatch succeeds because char is the first capturing group.

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      Console.WriteLine(Regex.IsMatch("aa", @"(?<char>\w)\k<1>"));    
      // Displays "True".
   }
}


Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Console.WriteLine(Regex.IsMatch("aa", "(?<char>\w)\k<1>"))    
      ' Displays "True".
   End Sub
End Module

但是,如果 name 是 number 的字符串表示形式,并且已向该位置中的捕获组明确分配了数字名称,正则表达式分析器无法通过其序号位置识别捕获组。However, if name is the string representation of a number and the capturing group in that position has been explicitly assigned a numeric name, the regular expression parser cannot identify the capturing group by its ordinal position. 相反,它会引发 ArgumentException。以下示例中的唯一捕获组名为“2”。Instead, it throws an ArgumentException.The only capturing group in the following example is named "2". 由于 \k 结构用于定义名为“1”的反向引用,因此正则表达式分析器无法识别第一个捕获组并引发异常。Because the \k construct is used to define a backreference named "1", the regular expression parser is unable to identify the first capturing group and throws an exception.

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      Console.WriteLine(Regex.IsMatch("aa", @"(?<2>\w)\k<1>"));    
      // Throws an ArgumentException.
   }
}


Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Console.WriteLine(Regex.IsMatch("aa", "(?<2>\w)\k<1>"))    
      ' Throws an ArgumentException.
   End Sub
End Module

反向引用匹配什么内容What Backreferences Match

反向引用引用组的最新定义(从左向右匹配时,最靠近左侧的定义)。A backreference refers to the most recent definition of a group (the definition most immediately to the left, when matching left to right). 当组建立多个捕获时,反向引用会引用最新的捕获。When a group makes multiple captures, a backreference refers to the most recent capture.

下面的示例包含正则表达式模式 (?<1>a)(?<1>\1b)*,该模式重新定义 \1 命名组。The following example includes a regular expression pattern, (?<1>a)(?<1>\1b)*, which redefines the \1 named group. 下表描述了正则表达式中的每个模式。The following table describes each pattern in the regular expression.

模式Pattern 说明Description
(?<1>a) 匹配字符“a”,并将结果分配到 1 捕获组。Match the character "a" and assign the result to the capturing group named 1.
(?<1>\1b)* 匹配 1 组的 0 更大发生次数以及“b”,并将结果分配到 1 捕获组。Match zero or more occurrences of the group named 1 along with a "b", and assign the result to the capturing group named 1.
using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string pattern = @"(?<1>a)(?<1>\1b)*";
      string input = "aababb";
      foreach (Match match in Regex.Matches(input, pattern))
      {
         Console.WriteLine("Match: " + match.Value);
         foreach (Group group in match.Groups)
            Console.WriteLine("   Group: " + group.Value);
      }
   }
}
// The example displays the following output:
//          Group: aababb
//          Group: abb
Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Dim pattern As String = "(?<1>a)(?<1>\1b)*"
      Dim input As String = "aababb"
      For Each match As Match In Regex.Matches(input, pattern)
         Console.WriteLine("Match: " + match.Value)
         For Each group As Group In match.Groups
            Console.WriteLIne("   Group: " + group.Value)
         Next
      Next
   End Sub
End Module
' The example display the following output:
'          Group: aababb
'          Group: abb

在比较正则表达式与输入字符串(“aababb”)时,正则表达式引擎执行以下操作:In comparing the regular expression with the input string ("aababb"), the regular expression engine performs the following operations:

  1. 从该字符串的开头开始,成功将“a”与表达式 (?<1>a) 匹配。It starts at the beginning of the string, and successfully matches "a" with the expression (?<1>a). 此时,1 组的值为“a”。The value of the 1 group is now "a".

  2. 继续匹配第二个字符,成功将字符串“ab”与表达式 \1b 或“ab”匹配。It advances to the second character, and successfully matches the string "ab" with the expression \1b, or "ab". 然后,将结果“ab”分配到 \1It then assigns the result, "ab" to \1.

  3. 继续匹配第四个字符。It advances to the fourth character. 表达式 (?<1>\1b)* 要匹配零次或多次,因此会成功将字符串“abb”与表达式 \1b 匹配。The expression (?<1>\1b)* is to be matched zero or more times, so it successfully matches the string "abb" with the expression \1b. 然后,将结果“abb”分配回到 \1It assigns the result, "abb", back to \1.

在本示例中,* 是循环限定符 -- 它将被重复计算,直到正则表达式引擎不能与它定义的模式匹配为止。In this example, * is a looping quantifier -- it is evaluated repeatedly until the regular expression engine cannot match the pattern it defines. 循环限定符不会清除组定义。Looping quantifiers do not clear group definitions.

如果某个组尚未捕获任何子字符串,则对该组的反向引用是不确定的,永远不会匹配。If a group has not captured any substrings, a backreference to that group is undefined and never matches. 下面展示了正则表达式模式 \b(\p{Lu}{2})(\d{2})?(\p{Lu}{2})\b 的定义:This is illustrated by the regular expression pattern \b(\p{Lu}{2})(\d{2})?(\p{Lu}{2})\b, which is defined as follows:

模式Pattern 说明Description
\b 在单词边界处开始匹配。Begin the match on a word boundary.
(\p{Lu}{2}) 匹配两个大写字母。Match two uppercase letters. 这是第一个捕获组。This is the first capturing group.
(\d{2})? 匹配两个十进制数的零个或一个匹配项。Match zero or one occurrence of two decimal digits. 这是第二个捕获组。This is the second capturing group.
(\p{Lu}{2}) 匹配两个大写字母。Match two uppercase letters. 这是第三个捕获组。This is the third capturing group.
\b 在单词边界处结束匹配。End the match on a word boundary.

输入字符串可以匹配此正则表达式,即使第二个捕获组定义的两个十进制数字都不存在。An input string can match this regular expression even if the two decimal digits that are defined by the second capturing group are not present. 下面的示例显示了即使匹配成功,也仍会在两个成功的捕获组之间找到空捕获组。The following example shows that even though the match is successful, an empty capturing group is found between two successful capturing groups.

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string pattern = @"\b(\p{Lu}{2})(\d{2})?(\p{Lu}{2})\b";
      string[] inputs = { "AA22ZZ", "AABB" };
      foreach (string input in inputs)
      {
         Match match = Regex.Match(input, pattern);
         if (match.Success)
         {
            Console.WriteLine("Match in {0}: {1}", input, match.Value);
            if (match.Groups.Count > 1)
            {
               for (int ctr = 1; ctr <= match.Groups.Count - 1; ctr++)
               {
                  if (match.Groups[ctr].Success)
                     Console.WriteLine("Group {0}: {1}", 
                                       ctr, match.Groups[ctr].Value);
                  else
                     Console.WriteLine("Group {0}: <no match>", ctr);
               }
            }
         }
         Console.WriteLine();
      }      
   }
}
// The example displays the following output:
//       Match in AA22ZZ: AA22ZZ
//       Group 1: AA
//       Group 2: 22
//       Group 3: ZZ
//       
//       Match in AABB: AABB
//       Group 1: AA
//       Group 2: <no match>
//       Group 3: BB
Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Dim pattern As String = "\b(\p{Lu}{2})(\d{2})?(\p{Lu}{2})\b"
      Dim inputs() As String = { "AA22ZZ", "AABB" }
      For Each input As String In inputs
         Dim match As Match = Regex.Match(input, pattern)
         If match.Success Then
            Console.WriteLine("Match in {0}: {1}", input, match.Value)
            If match.Groups.Count > 1 Then
               For ctr As Integer = 1 To match.Groups.Count - 1
                  If match.Groups(ctr).Success Then
                     Console.WriteLine("Group {0}: {1}", _
                                       ctr, match.Groups(ctr).Value)
                  Else
                     Console.WriteLine("Group {0}: <no match>", ctr)
                  End If      
               Next
            End If
         End If
         Console.WriteLine()
      Next      
   End Sub
End Module
' The example displays the following output:
'       Match in AA22ZZ: AA22ZZ
'       Group 1: AA
'       Group 2: 22
'       Group 3: ZZ
'       
'       Match in AABB: AABB
'       Group 1: AA
'       Group 2: <no match>
'       Group 3: BB

请参阅See also