正则表达式中的反向引用构造

项目
2024/03/12

反向引用提供了标识字符串中的重复字符或子字符串的方便途径。例如，如果输入字符串包含某任意子字符串的多个匹配项，可以使用捕获组匹配第一个出现的子字符串，然后使用反向引用匹配后面出现的子字符串。

备注

单独语法用于引用替换字符串中命名的和带编号的捕获组。有关更多信息，请参见替代。

.NET 定义引用编号和命名捕获组的单独语言元素。若要详细了解捕获组，请参阅分组构造。

带编号的反向引用

带编号的反向引用使用以下语法：

\ number

其中 number 是正则表达式中捕获组的序号位置。例如，\4 匹配第四个捕获组的内容。如果正则表达式模式中未定义 number，将会发生分析错误，并且正则表达式引擎会抛出 ArgumentException。例如，正则表达式 \b(\w+)\s\1 有效，因为 (\w+) 是表达式中的第一个也是唯一一个捕获组。 \b(\w+)\s\2 无效，该表达式会因为没有捕获组编号 \2 而引发自变量异常。此外，如果 number 标识特定序号位置中的捕获组，但该捕获组已被分配了一个不同于其序号位置的数字名称，则正则表达式分析器还会引发 ArgumentException。

请注意八进制转义代码（如 \16）和使用相同表示法的 \number 反向引用之间的不明确问题。这种多义性可通过如下方式解决：

表达式 \1 到 \9 始终解释为反向应用，而不是八进制代码。
如果多位表达式的第一个数字是 8 或 9（如 \80 或 \91），该表达式将解释为文本。
对于编号为 \10 或更大值的表达式，如果存在与该编号对应的反向引用，则将该表达式视为反向引用；否则，将这些表达式解释为八进制代码。
如果正则表达式包含对未定义的组成员的反向引用，将会发生分析错误，并且正则表达式引擎会抛出 ArgumentException。

如果存在不明确问题，可以使用 \k<name> 表示法，此表示法非常明确，不会与八进制字符代码混淆。同样，诸如 \xdd 的十六进制代码也是明确的，不会与反向引用混淆。

下面的示例查找字符串中双写的单词字符。它定义一个由下列元素组成的正则表达式 (\w)\1。

元素	说明
`(\w)`	匹配单词字符，并将其分配给第一个捕获组。
`\1`	匹配值与第一捕获组相同的下一个字符。

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string pattern = @"(\w)\1";
      string input = "trellis llama webbing dresser swagger";
      foreach (Match match in Regex.Matches(input, pattern))
         Console.WriteLine($"Found '{match.Value}' at position {match.Index}.");
   }
}
// The example displays the following output:
//       Found 'll' at position 3.
//       Found 'll' at position 8.
//       Found 'bb' at position 16.
//       Found 'ss' at position 25.
//       Found 'gg' at position 33.

Imports System.Text.RegularExpressions

Module Example
    Public Sub Main()
        Dim pattern As String = "(\w)\1"
        Dim input As String = "trellis llama webbing dresser swagger"
        For Each match As Match In Regex.Matches(input, pattern)
            Console.WriteLine("Found '{0}' at position {1}.", _
                              match.Value, match.Index)
        Next
    End Sub
End Module
' The example displays the following output:
'       Found 'll' at position 3.
'       Found 'll' at position 8.
'       Found 'bb' at position 16.
'       Found 'ss' at position 25.
'       Found 'gg' at position 33.

命名的反向引用

使用以下语法定义命名的反向引用：

\k< name >

或：

\k' name '

其中，name 是正则表达式模式中定义的捕获组的名称。如果正则表达式模式中未定义 name，将会发生分析错误，并且正则表达式引擎会抛出 ArgumentException。

下面的示例查找字符串中双写的单词字符。它定义一个由下列元素组成的正则表达式 (?<char>\w)\k<char>。

元素	说明
`(?<char>\w)`	匹配字词字符，并将结果分配到 `char` 捕获组。
`\k<char>`	匹配下一个与 `char` 捕获组的值相同的字符。

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string pattern = @"(?<char>\w)\k<char>";
      string input = "trellis llama webbing dresser swagger";
      foreach (Match match in Regex.Matches(input, pattern))
         Console.WriteLine($"Found '{match.Value}' at position {match.Index}.");
   }
}
// The example displays the following output:
//       Found 'll' at position 3.
//       Found 'll' at position 8.
//       Found 'bb' at position 16.
//       Found 'ss' at position 25.
//       Found 'gg' at position 33.

Imports System.Text.RegularExpressions

Module Example
    Public Sub Main()
        Dim pattern As String = "(?<char>\w)\k<char>"
        Dim input As String = "trellis llama webbing dresser swagger"
        For Each match As Match In Regex.Matches(input, pattern)
            Console.WriteLine("Found '{0}' at position {1}.", _
                              match.Value, match.Index)
        Next
    End Sub
End Module
' The example displays the following output:
'       Found 'll' at position 3.
'       Found 'll' at position 8.
'       Found 'bb' at position 16.
'       Found 'ss' at position 25.
'       Found 'gg' at position 33.

已命名数值的反向引用

在具有 \k 的已命名反向引用中，name 也可以是 number 的字符串表示形式\k。例如，下面的示例使用正则表达式 (?<2>\w)\k<2> 查找字符串中双写的单词字符。在此情况下，该示例定义了显式命名为“2”的捕获组，反向引用相应地命名为“2”。

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string pattern = @"(?<2>\w)\k<2>";
      string input = "trellis llama webbing dresser swagger";
      foreach (Match match in Regex.Matches(input, pattern))
         Console.WriteLine($"Found '{match.Value}' at position {match.Index}.");
   }
}
// The example displays the following output:
//       Found 'll' at position 3.
//       Found 'll' at position 8.
//       Found 'bb' at position 16.
//       Found 'ss' at position 25.
//       Found 'gg' at position 33.

Imports System.Text.RegularExpressions

Module Example
    Public Sub Main()
        Dim pattern As String = "(?<2>\w)\k<2>"
        Dim input As String = "trellis llama webbing dresser swagger"
        For Each match As Match In Regex.Matches(input, pattern)
            Console.WriteLine("Found '{0}' at position {1}.", _
                              match.Value, match.Index)
        Next
    End Sub
End Module
' The example displays the following output:
'       Found 'll' at position 3.
'       Found 'll' at position 8.
'       Found 'bb' at position 16.
'       Found 'ss' at position 25.
'       Found 'gg' at position 33.

如果 name 是 number 的字符串表示形式，且没有捕获组具有该名称，\k< name > 与反向引用 \number 相同，其中 number 是捕获的序号位置。在以下示例中，有名为 char 的单个捕获组。反向引用构造将其称为 \k<1>。正如示例中的输出所示，由于 char 是第一个捕获组，所以对 Regex.IsMatch 的调用成功。

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      Console.WriteLine(Regex.IsMatch("aa", @"(?<char>\w)\k<1>"));
      // Displays "True".
   }
}


Imports System.Text.RegularExpressions

Module Example
    Public Sub Main()
        Console.WriteLine(Regex.IsMatch("aa", "(?<char>\w)\k<1>"))
        ' Displays "True".
    End Sub
End Module

但是，如果 name 是 number 的字符串表示形式，并且已向该位置中的捕获组明确分配了数字名称，正则表达式分析器无法通过其序号位置识别捕获组。相反，它会引发 ArgumentException。以下示例中的唯一捕获组名为“2”。由于 \k 结构用于定义名为“1”的反向引用，因此正则表达式分析器无法识别第一个捕获组并引发异常。

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      Console.WriteLine(Regex.IsMatch("aa", @"(?<2>\w)\k<1>"));
      // Throws an ArgumentException.
   }
}


Imports System.Text.RegularExpressions

Module Example
    Public Sub Main()
        Console.WriteLine(Regex.IsMatch("aa", "(?<2>\w)\k<1>"))
        ' Throws an ArgumentException.
    End Sub
End Module

反向引用匹配什么内容

反向引用引用组的最新定义（从左向右匹配时，最靠近左侧的定义）。当组建立多个捕获时，反向引用会引用最新的捕获。

下面的示例包含正则表达式模式 (?<1>a)(?<1>\1b)*，该模式重新定义 \1 命名组。下表描述了正则表达式中的每个模式。

模式	说明
`(?<1>a)`	匹配字符“a”，并将结果分配到 `1` 捕获组。
`(?<1>\1b)*`	匹配 `1` 组的 0 更大发生次数以及“b”，并将结果分配到 `1` 捕获组。

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string pattern = @"(?<1>a)(?<1>\1b)*";
      string input = "aababb";
      foreach (Match match in Regex.Matches(input, pattern))
      {
         Console.WriteLine("Match: " + match.Value);
         foreach (Group group in match.Groups)
            Console.WriteLine("   Group: " + group.Value);
      }
   }
}
// The example displays the following output:
//          Group: aababb
//          Group: abb

Imports System.Text.RegularExpressions

Module Example
    Public Sub Main()
        Dim pattern As String = "(?<1>a)(?<1>\1b)*"
        Dim input As String = "aababb"
        For Each match As Match In Regex.Matches(input, pattern)
            Console.WriteLine("Match: " + match.Value)
            For Each group As Group In match.Groups
                Console.WriteLIne("   Group: " + group.Value)
            Next
        Next
    End Sub
End Module
' The example display the following output:
'          Group: aababb
'          Group: abb

在比较正则表达式与输入字符串（“aababb”）时，正则表达式引擎执行以下操作：

从该字符串的开头开始，成功将“a”与表达式 (?<1>a) 匹配。此时，1 组的值为“a”。
继续匹配第二个字符，成功将字符串“ab”与表达式 \1b 或“ab”匹配。然后，将结果“ab”分配到 \1。
继续匹配第四个字符。表达式 (?<1>\1b)* 要匹配零次或多次，因此会成功将字符串“abb”与表达式 \1b 匹配。然后，将结果“abb”分配回到 \1。

在本示例中，* 是循环限定符 -- 它将被重复计算，直到正则表达式引擎不能与它定义的模式匹配为止。循环限定符不会清除组定义。

如果某个组尚未捕获任何子字符串，则对该组的反向引用是不确定的，永远不会匹配。下面展示了正则表达式模式 \b(\p{Lu}{2})(\d{2})?(\p{Lu}{2})\b 的定义：

模式	描述
`\b`	在单词边界处开始匹配。
`(\p{Lu}{2})`	匹配两个大写字母。这是第一个捕获组。
`(\d{2})?`	匹配两个十进制数的零个或一个匹配项。这是第二个捕获组。
`(\p{Lu}{2})`	匹配两个大写字母。这是第三个捕获组。
`\b`	在单词边界处结束匹配。

输入字符串可以匹配此正则表达式，即使第二个捕获组定义的两个十进制数字都不存在。下面的示例显示了即使匹配成功，也仍会在两个成功的捕获组之间找到空捕获组。

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string pattern = @"\b(\p{Lu}{2})(\d{2})?(\p{Lu}{2})\b";
      string[] inputs = { "AA22ZZ", "AABB" };
      foreach (string input in inputs)
      {
         Match match = Regex.Match(input, pattern);
         if (match.Success)
         {
            Console.WriteLine($"Match in {input}: {match.Value}");
            if (match.Groups.Count > 1)
            {
               for (int ctr = 1; ctr <= match.Groups.Count - 1; ctr++)
               {
                  if (match.Groups[ctr].Success)
                     Console.WriteLine($"Group {ctr}: {match.Groups[ctr].Value}");
                  else
                     Console.WriteLine($"Group {ctr}: <no match>");
               }
            }
         }
         Console.WriteLine();
      }
   }
}
// The example displays the following output:
//       Match in AA22ZZ: AA22ZZ
//       Group 1: AA
//       Group 2: 22
//       Group 3: ZZ
//
//       Match in AABB: AABB
//       Group 1: AA
//       Group 2: <no match>
//       Group 3: BB

Imports System.Text.RegularExpressions

Module Example
    Public Sub Main()
        Dim pattern As String = "\b(\p{Lu}{2})(\d{2})?(\p{Lu}{2})\b"
        Dim inputs() As String = {"AA22ZZ", "AABB"}
        For Each input As String In inputs
            Dim match As Match = Regex.Match(input, pattern)
            If match.Success Then
                Console.WriteLine("Match in {0}: {1}", input, match.Value)
                If match.Groups.Count > 1 Then
                    For ctr As Integer = 1 To match.Groups.Count - 1
                        If match.Groups(ctr).Success Then
                            Console.WriteLine("Group {0}: {1}", _
                                              ctr, match.Groups(ctr).Value)
                        Else
                            Console.WriteLine("Group {0}: <no match>", ctr)
                        End If
                    Next
                End If
            End If
            Console.WriteLine()
        Next
    End Sub
End Module
' The example displays the following output:
'       Match in AA22ZZ: AA22ZZ
'       Group 1: AA
'       Group 2: 22
'       Group 3: ZZ
'       
'       Match in AABB: AABB
'       Group 1: AA
'       Group 2: <no match>
'       Group 3: BB

请参阅

正则表达式语言 - 快速参考

AI 技能盛会

通过

正则表达式中的反向引用构造

带编号的反向引用

命名的反向引用

已命名数值的反向引用

反向引用匹配什么内容

请参阅

其他资源