正则表达式中的分组构造Grouping Constructs in Regular Expressions

分组构造描述了正则表达式的子表达式,用于捕获输入字符串的子字符串。Grouping constructs delineate the subexpressions of a regular expression and capture the substrings of an input string. 你可以使用分组构造来完成下列任务:You can use grouping constructs to do the following:

  • 匹配输入字符串中重复的子表达式。Match a subexpression that is repeated in the input string.

  • 将限定符应用于拥有多个正则表达式语言元素的子表达式。Apply a quantifier to a subexpression that has multiple regular expression language elements. 有关限定符的更多信息,请参见 QuantifiersFor more information about quantifiers, see Quantifiers.

  • 包括由 Regex.ReplaceMatch.Result 方法返回的字符串的子表达式。Include a subexpression in the string that is returned by the Regex.Replace and Match.Result methods.

  • Match.Groups 属性中检索各个子表达式,并分别从匹配的文本作为一个整体处理它们。Retrieve individual subexpressions from the Match.Groups property and process them separately from the matched text as a whole.

下表列出了 .NET 正则表达式引擎支持的分组构造,并指明它们是捕获构造,还是非捕获构造。The following table lists the grouping constructs supported by the .NET regular expression engine and indicates whether they are capturing or non-capturing.

分组构造Grouping construct 捕获或非捕获Capturing or noncapturing
匹配的子表达式Matched subexpressions 捕获Capturing
命名匹配的子表达式Named matched subexpressions 捕获Capturing
平衡组定义Balancing group definitions 捕获Capturing
非捕获组Noncapturing groups 非捕获Noncapturing
组选项Group options 非捕获Noncapturing
零宽度正预测先行断言Zero-width positive lookahead assertions 非捕获Noncapturing
零宽度负预测先行断言Zero-width negative lookahead assertions 非捕获Noncapturing
零宽度正回顾后发断言Zero-width positive lookbehind assertions 非捕获Noncapturing
零宽度负回顾后发断言Zero-width negative lookbehind assertions 非捕获Noncapturing
非回溯子表达式Nonbacktracking subexpressions 非捕获Noncapturing

有关组和正则表达式对象模型的信息,请参见 分组构造和正则表达式对象For information on groups and the regular expression object model, see Grouping constructs and regular expression objects.

匹配的子表达式Matched Subexpressions

以下分组构造捕获匹配的子表达式:The following grouping construct captures a matched subexpression:

( 子表达式 )( subexpression )

其中 子表达式 为任何有效正则表达式模式。where subexpression is any valid regular expression pattern. 使用括号的捕获按正则表达式中左括号的顺序从一开始从左到右自动编号。Captures that use parentheses are numbered automatically from left to right based on the order of the opening parentheses in the regular expression, starting from one. 捕获元素编号为零的捕获是由整个正则表达式模式匹配的文本。The capture that is numbered zero is the text matched by the entire regular expression pattern.

备注

默认情况下, (子表达式) 语言元素捕获匹配的子表达式。By default, the (subexpression) language element captures the matched subexpression. 但是,如果正则表达式模式匹配方法的 RegexOptions 参数包含 RegexOptions.ExplicitCapture 标志,或者如果 n 选项应用于此子表达式(参见本主题后面的 组选项 ),则不会捕获匹配的子表达式。But if the RegexOptions parameter of a regular expression pattern matching method includes the RegexOptions.ExplicitCapture flag, or if the n option is applied to this subexpression (see Group options later in this topic), the matched subexpression is not captured.

可以四种方法访问捕获的组:You can access captured groups in four ways:

  • 通过使用正则表达式中的反向引用构造。By using the backreference construct within the regular expression. 使用语法 \数字在同一正则表达式中引用匹配的子表达式,其中 数字 是捕获的表达式的初始数字。The matched subexpression is referenced in the same regular expression by using the syntax \number, where number is the ordinal number of the captured subexpression.

  • 通过使用正则表达式中的命名的反向引用构造。By using the named backreference construct within the regular expression. 使用语法 \k<name>在同一正则表达式中引用匹配的子表达式,其中 name 是捕获组的名称,或使用 \k<数字>在同一正则表达式中引用匹配的子表达式,其中 数字 是捕获组的初始数字。The matched subexpression is referenced in the same regular expression by using the syntax \k<name>, where name is the name of a capturing group, or \k<number>, where number is the ordinal number of a capturing group. 捕获组具有与其原始编号相同的默认名称。A capturing group has a default name that is identical to its ordinal number. 有关更多信息,请参见本主题后面的 命名匹配的子表达式For more information, see Named matched subexpressions later in this topic.

  • 通过使用 $数字 $ Regex.Replace number Match.Result 替换序列,其中 数字 是捕获的表达式的初始数字。By using the $number replacement sequence in a Regex.Replace or Match.Result method call, where number is the ordinal number of the captured subexpression.

  • 以编程的方式,通过使用 GroupCollection 对象的方式,该对象由 Match.Groups 属性返回。Programmatically, by using the GroupCollection object returned by the Match.Groups property. 集合中位置零上的成员表示正则表达式匹配。The member at position zero in the collection represents the entire regular expression match. 每个后续成员表示匹配的子表达式。Each subsequent member represents a matched subexpression. 有关更多信息,请参见 分组构造和正则表达式对象 一节。For more information, see the Grouping Constructs and Regular Expression Objects section.

以下示例阐释表示文本中重复单词的正则表达式。The following example illustrates a regular expression that identifies duplicated words in text. 正则表达式模式的两个捕获组表示重复的单词的两个实例。The regular expression pattern's two capturing groups represent the two instances of the duplicated word. 捕获第二个实例,以报告它在输入字符串的起始位置。The second instance is captured to report its starting position in the input string.

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string pattern = @"(\w+)\s(\1)";
      string input = "He said that that was the the correct answer.";
      foreach (Match match in Regex.Matches(input, pattern, RegexOptions.IgnoreCase))
         Console.WriteLine("Duplicate '{0}' found at positions {1} and {2}.", 
                           match.Groups[1].Value, match.Groups[1].Index, match.Groups[2].Index);
   }
}
// The example displays the following output:
//       Duplicate 'that' found at positions 8 and 13.
//       Duplicate 'the' found at positions 22 and 26.
Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Dim pattern As String = "(\w+)\s(\1)\W"
      Dim input As String = "He said that that was the the correct answer."
      For Each match As Match In Regex.Matches(input, pattern, RegexOptions.IgnoreCase)
         Console.WriteLine("Duplicate '{0}' found at positions {1} and {2}.", _
                           match.Groups(1).Value, match.Groups(1).Index, match.Groups(2).Index)
      Next
   End Sub
End Module
' The example displays the following output:
'       Duplicate 'that' found at positions 8 and 13.
'       Duplicate 'the' found at positions 22 and 26.

正则表达式模式为:The regular expression pattern is the following:

(\w+)\s(\1)\W

下表演示了如何解释正则表达式模式。The following table shows how the regular expression pattern is interpreted.

模式Pattern 说明Description
(\w+) 匹配一个或多个单词字符。Match one or more word characters. 这是第一个捕获组。This is the first capturing group.
\s 与空白字符匹配。Match a white-space character.
(\1) 与第一个捕获组捕获中的字符串匹配。Match the string in the first captured group. 这是第二个捕获组。This is the second capturing group. 该示例将其指定到捕获组上,以便可从 Match.Index 属性返回。The example assigns it to a captured group so that the starting position of the duplicate word can be retrieved from the Match.Index property.
\W 匹配包括空格和标点符号的一个非单词字符。Match a non-word character, including white space and punctuation. 这样可以防止正则表达式模式匹配以第一个捕获组的单词开头的单词。This prevents the regular expression pattern from matching a word that starts with the word from the first captured group.

命名匹配的子表达式Named Matched Subexpressions

以下分组构造捕获匹配的子表达式,并允许你按名称或编号访问它:The following grouping construct captures a matched subexpression and lets you access it by name or by number:

(?<name>subexpression)

或:or:

(?'name'subexpression)

其中 名称 是有效的组名称,而 子表达式 是任何有效的正则表达式模式。where name is a valid group name, and subexpression is any valid regular expression pattern. 名称 不得包含任何标点符号字符,并且不能以数字开头。name must not contain any punctuation characters and cannot begin with a number.

备注

如果正则表达式模式匹配方法的 RegexOptions 参数包含 RegexOptions.ExplicitCapture 标志,或者如果 n 选项应用于此子表达式(参见本主题后面的 组选项 ),则捕获子表达式的唯一方法就是显式命名捕获组。If the RegexOptions parameter of a regular expression pattern matching method includes the RegexOptions.ExplicitCapture flag, or if the n option is applied to this subexpression (see Group options later in this topic), the only way to capture a subexpression is to explicitly name capturing groups.

可用以下方式访问已命名的捕获组:You can access named captured groups in the following ways:

  • 通过使用正则表达式中的命名的反向引用构造。By using the named backreference construct within the regular expression. 使用语法 \k<name>在同一正则表达式中引用匹配的子表达式,其中 name 是捕获子表达式的名称。The matched subexpression is referenced in the same regular expression by using the syntax \k<name>, where name is the name of the captured subexpression.

  • 通过使用正则表达式中的反向引用构造。By using the backreference construct within the regular expression. 使用语法 \数字在同一正则表达式中引用匹配的子表达式,其中 数字 是捕获的表达式的初始数字。The matched subexpression is referenced in the same regular expression by using the syntax \number, where number is the ordinal number of the captured subexpression. 已命名的匹配子表达式在匹配子表达式后从左到右连续编号。Named matched subexpressions are numbered consecutively from left to right after matched subexpressions.

  • 通过使用 ${name} $ Regex.Replace number Match.Result 替换序列,其中 name 是捕获子表达式的名称。By using the ${name} replacement sequence in a Regex.Replace or Match.Result method call, where name is the name of the captured subexpression.

  • 通过在 Regex.ReplaceMatch.Result 方法调用中使用 $数字 替换序列,其中“数字” 为捕获的子表达式的序号。By using the $number replacement sequence in a Regex.Replace or Match.Result method call, where number is the ordinal number of the captured subexpression.

  • 以编程的方式,通过使用 GroupCollection 对象的方式,该对象由 Match.Groups 属性返回。Programmatically, by using the GroupCollection object returned by the Match.Groups property. 集合中位置零上的成员表示正则表达式匹配。The member at position zero in the collection represents the entire regular expression match. 每个后续成员表示匹配的子表达式。Each subsequent member represents a matched subexpression. 已命名的捕获组在集合中存储在已编号的捕获组后面。Named captured groups are stored in the collection after numbered captured groups.

  • 以编程方式,通过将子表达式名称提供至 GroupCollection 对象的索引器(在 C# 中),或者提供至其 Item[String] 属性(在 Visual Basic 中)。Programmatically, by providing the subexpression name to the GroupCollection object's indexer (in C#) or to its Item[String] property (in Visual Basic).

简单的正则表达式模式会阐释如何编号(未命名),并且可以以编程方式或通过正则表达式语言语法引用已命名的组。A simple regular expression pattern illustrates how numbered (unnamed) and named groups can be referenced either programmatically or by using regular expression language syntax. 正则表达式 ((?<One>abc)\d+)?(?<Two>xyz)(.*) 按编号和名称产生下列捕获组。The regular expression ((?<One>abc)\d+)?(?<Two>xyz)(.*) produces the following capturing groups by number and by name. 编号为 0 的第一个捕获组总是指整个模式。The first capturing group (number 0) always refers to the entire pattern.

数字Number nameName 模式Pattern
00 0(默认名称)0 (default name) ((?<One>abc)\d+)?(?<Two>xyz)(.*)
11 1(默认名称)1 (default name) ((?<One>abc)\d+)
22 2(默认名称)2 (default name) (.*)
33 OneOne (?<One>abc)
44 TwoTwo (?<Two>xyz)

下面的示例阐释了一个正则表达式,标识出重复的单词和紧随每个重复的单词的单词。The following example illustrates a regular expression that identifies duplicated words and the word that immediately follows each duplicated word. 正则表达式模式定义了两个命名的子表达式: duplicateWord,它表示重复的单词;和 nextWord,它表示后面跟随重复单词的单词。The regular expression pattern defines two named subexpressions: duplicateWord, which represents the duplicated word; and nextWord, which represents the word that follows the duplicated word.

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string pattern = @"(?<duplicateWord>\w+)\s\k<duplicateWord>\W(?<nextWord>\w+)";
      string input = "He said that that was the the correct answer.";
      foreach (Match match in Regex.Matches(input, pattern, RegexOptions.IgnoreCase))
         Console.WriteLine("A duplicate '{0}' at position {1} is followed by '{2}'.", 
                           match.Groups["duplicateWord"].Value, match.Groups["duplicateWord"].Index, 
                           match.Groups["nextWord"].Value);

   }
}
// The example displays the following output:
//       A duplicate 'that' at position 8 is followed by 'was'.
//       A duplicate 'the' at position 22 is followed by 'correct'.
Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Dim pattern As String = "(?<duplicateWord>\w+)\s\k<duplicateWord>\W(?<nextWord>\w+)"
      Dim input As String = "He said that that was the the correct answer."
      Console.WriteLine(Regex.Matches(input, pattern, RegexOptions.IgnoreCase).Count)
      For Each match As Match In Regex.Matches(input, pattern, RegexOptions.IgnoreCase)
         Console.WriteLine("A duplicate '{0}' at position {1} is followed by '{2}'.", _
                           match.Groups("duplicateWord").Value, match.Groups("duplicateWord").Index, _
                           match.Groups("nextWord").Value)
      Next
   End Sub
End Module
' The example displays the following output:
'    A duplicate 'that' at position 8 is followed by 'was'.
'    A duplicate 'the' at position 22 is followed by 'correct'.

正则表达式模式按如下方式定义:The regular expression pattern is as follows:

(?<duplicateWord>\w+)\s\k<duplicateWord>\W(?<nextWord>\w+)

下表演示了正则表达式的含义。The following table shows how the regular expression is interpreted.

模式Pattern 说明Description
(?<duplicateWord>\w+) 匹配一个或多个单词字符。Match one or more word characters. 命名此捕获组 duplicateWordName this capturing group duplicateWord.
\s 与空白字符匹配。Match a white-space character.
\k<duplicateWord> 匹配名为 duplicateWord的捕获的组。Match the string from the captured group that is named duplicateWord.
\W 匹配包括空格和标点符号的一个非单词字符。Match a non-word character, including white space and punctuation. 这样可以防止正则表达式模式匹配以第一个捕获组的单词开头的单词。This prevents the regular expression pattern from matching a word that starts with the word from the first captured group.
(?<nextWord>\w+) 匹配一个或多个单词字符。Match one or more word characters. 命名此捕获组 nextWordName this capturing group nextWord.

请注意可在正则表达式中重复组名。Note that a group name can be repeated in a regular expression. 例如,可将多个组命名为 digit,如下面的示例所示。For example, it is possible for more than one group to be named digit, as the following example illustrates. 在名称重复的情况下, Group 对象的值由输入字符串中最后一个成功的捕获确定。In the case of duplicate names, the value of the Group object is determined by the last successful capture in the input string. 此外,如果组名不重复,则使用有关每个捕获的信息填充 CaptureCollectionIn addition, the CaptureCollection is populated with information about each capture just as it would be if the group name was not duplicated.

在下面的示例中,正则表达式 \D+(?<digit>\d+)\D+(?<digit>\d+)? 中两次出现了名为 digit的组。In the following example, the regular expression \D+(?<digit>\d+)\D+(?<digit>\d+)? includes two occurrences of a group named digit. 第一个名为 digit 的组捕获一个或多个数字字符。The first digit named group captures one or more digit characters. 第二个名为 digit 的组捕获一个或多个数字字符的零个或一个匹配项。The second digit named group captures either zero or one occurrence of one or more digit characters. 如示例的输出所示,如果第二个捕获组成功匹配文本,则文本的值定义 Group 对象的值。As the output from the example shows, if the second capturing group successfully matches text, the value of that text defines the value of the Group object. 如果第二个捕获组无法匹配输入字符串,则最后一个成功匹配的值定义 Group 对象的值。If the second capturing group cannot does not match the input string, the value of the last successful match defines the value of the Group object.

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      String pattern = @"\D+(?<digit>\d+)\D+(?<digit>\d+)?";
      String[] inputs = { "abc123def456", "abc123def" };
      foreach (var input in inputs) {
         Match m = Regex.Match(input, pattern);
         if (m.Success) {
            Console.WriteLine("Match: {0}", m.Value);
            for (int grpCtr = 1; grpCtr < m.Groups.Count; grpCtr++) {
               Group grp = m.Groups[grpCtr];
               Console.WriteLine("Group {0}: {1}", grpCtr, grp.Value);
               for (int capCtr = 0; capCtr < grp.Captures.Count; capCtr++)
                  Console.WriteLine("   Capture {0}: {1}", capCtr,
                                    grp.Captures[capCtr].Value);
            }
         }
         else {
            Console.WriteLine("The match failed.");
         }
         Console.WriteLine();
      }
   }
}
// The example displays the following output:
//       Match: abc123def456
//       Group 1: 456
//          Capture 0: 123
//          Capture 1: 456
//
//       Match: abc123def
//       Group 1: 123
//          Capture 0: 123
Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Dim pattern As String = "\D+(?<digit>\d+)\D+(?<digit>\d+)?"
      Dim inputs() As String = { "abc123def456", "abc123def" }
      For Each input As String In inputs
         Dim m As Match = Regex.Match(input, pattern)
         If m.Success Then
            Console.WriteLine("Match: {0}", m.Value)
            For grpCtr As Integer = 1 to m.Groups.Count - 1
               Dim grp As Group = m.Groups(grpCtr)
               Console.WriteLine("Group {0}: {1}", grpCtr, grp.Value)
               For capCtr As Integer = 0 To grp.Captures.Count - 1
                  Console.WriteLine("   Capture {0}: {1}", capCtr,
                                    grp.Captures(capCtr).Value)
               Next
            Next
         Else
            Console.WriteLine("The match failed.")
         End If
         Console.WriteLine()
      Next
   End Sub
End Module
' The example displays the following output:
'       Match: abc123def456
'       Group 1: 456
'          Capture 0: 123
'          Capture 1: 456
'
'       Match: abc123def
'       Group 1: 123
'          Capture 0: 123

下表演示了正则表达式的含义。The following table shows how the regular expression is interpreted.

模式Pattern 说明Description
\D+ 匹配一个或多个非十进制数字字符。Match one or more non-decimal digit characters.
(?<digit>\d+) 匹配一个或多个十进制数字字符。Match one or more decimal digit characters. 将匹配分配到 digit 命名组。Assign the match to the digit named group.
\D+ 匹配一个或多个非十进制数字字符。Match one or more non-decimal digit characters.
(?<digit>\d+)? 匹配一个或多个十进制数字字符的零个或一个匹配项。Match zero or one occurrence of one or more decimal digit characters. 将匹配分配到 digit 命名组。Assign the match to the digit named group.

平衡组定义Balancing Group Definitions

平衡组定义将删除以前定义的组和存储的定义,并在当前组中存储以前定义的组和当前组之间的间隔。A balancing group definition deletes the definition of a previously defined group and stores, in the current group, the interval between the previously defined group and the current group. 此分组构造具有以下格式:This grouping construct has the following format:

(?<name1-name2>subexpression)

或:or:

(?'name1-name2' subexpression)

name1 位置是当前的组(可选), name2 是一个以前定义的组,而 子表达式 是任何有效的正则表达式模式。where name1 is the current group (optional), name2 is a previously defined group, and subexpression is any valid regular expression pattern. 平衡组定义删除 name2 的定义并在 name1 中保存 name2name1之间的间隔。The balancing group definition deletes the definition of name2 and stores the interval between name2 and name1 in name1. 如果未定义 name2 组,则匹配将回溯。If no name2 group is defined, the match backtracks. 由于删除 name2 的最后一个定义会显示 name2以前的定义,因此该构造允许将 name2 组的捕获堆栈用作计数器,用于跟踪嵌套构造(如括号或者左括号和右括号)。Because deleting the last definition of name2 reveals the previous definition of name2, this construct lets you use the stack of captures for group name2 as a counter for keeping track of nested constructs such as parentheses or opening and closing brackets.

平衡组定义将 name2 作为堆栈使用。The balancing group definition uses name2 as a stack. 将每个嵌套构造的开头字符放在组中,并放在其 Group.Captures 集合中。The beginning character of each nested construct is placed in the group and in its Group.Captures collection. 当匹配结束字符时,从组中删除其相应的开始字符,并且 Captures 集合减少 1。When the closing character is matched, its corresponding opening character is removed from the group, and the Captures collection is decreased by one. 所有嵌套构造的开始和结束字符匹配完后,name2 为空。After the opening and closing characters of all nested constructs have been matched, name2 is empty.

备注

通过修改下面示例中的正则表达式来使用合适的嵌套构造的开始和结束字符后,你可以用它来处理多数嵌套构造,如数学表达式或包括多个嵌套方法调用的程序代码行。After you modify the regular expression in the following example to use the appropriate opening and closing character of a nested construct, you can use it to handle most nested constructs, such as mathematical expressions or lines of program code that include multiple nested method calls.

下面的示例使用平衡组定义匹配输入字符串中的左右尖括号 (<>)。The following example uses a balancing group definition to match left and right angle brackets (<>) in an input string. 该示例定义两个已命名的组, OpenClose,用作堆栈来跟踪配对的尖括号。The example defines two named groups, Open and Close, that are used like a stack to track matching pairs of angle brackets. 将每个已捕获的左尖括号推入到 Open 组的捕获集合,而将每个已捕获的右尖括号推入到 Close 组的捕获集合。Each captured left angle bracket is pushed into the capture collection of the Open group, and each captured right angle bracket is pushed into the capture collection of the Close group. 平衡组定义确保每个左尖括号都有一个匹配的右尖角括号。The balancing group definition ensures that there is a matching right angle bracket for each left angle bracket. 如果没有,则仅会在 (?(Open)(?!))组不为空的情况下计算最终子模式 Open 的值(因此,如果所有嵌套构造尚未关闭)。If there is not, the final subpattern, (?(Open)(?!)), is evaluated only if the Open group is not empty (and, therefore, if all nested constructs have not been closed). 如果计算了最终子模式的值,则匹配将失败,因为 (?!) 子模式是始终失败的零宽度负预测先行断言。If the final subpattern is evaluated, the match fails, because the (?!) subpattern is a zero-width negative lookahead assertion that always fails.

using System;
using System.Text.RegularExpressions;

class Example
{
   public static void Main() 
   {
      string pattern = "^[^<>]*" +
                       "(" + 
                       "((?'Open'<)[^<>]*)+" +
                       "((?'Close-Open'>)[^<>]*)+" +
                       ")*" +
                       "(?(Open)(?!))$";
      string input = "<abc><mno<xyz>>";

      Match m = Regex.Match(input, pattern);
      if (m.Success == true)
      {
         Console.WriteLine("Input: \"{0}\" \nMatch: \"{1}\"", input, m);
         int grpCtr = 0;
         foreach (Group grp in m.Groups)
         {
            Console.WriteLine("   Group {0}: {1}", grpCtr, grp.Value);
            grpCtr++;
            int capCtr = 0;
            foreach (Capture cap in grp.Captures)
            {            
                Console.WriteLine("      Capture {0}: {1}", capCtr, cap.Value);
                capCtr++;
            }
          }
      }
      else
      {
         Console.WriteLine("Match failed.");
      }   
    }
}
// The example displays the following output:
//    Input: "<abc><mno<xyz>>"
//    Match: "<abc><mno<xyz>>"
//       Group 0: <abc><mno<xyz>>
//          Capture 0: <abc><mno<xyz>>
//       Group 1: <mno<xyz>>
//          Capture 0: <abc>
//          Capture 1: <mno<xyz>>
//       Group 2: <xyz
//          Capture 0: <abc
//          Capture 1: <mno
//          Capture 2: <xyz
//       Group 3: >
//          Capture 0: >
//          Capture 1: >
//          Capture 2: >
//       Group 4:
//       Group 5: mno<xyz>
//          Capture 0: abc
//          Capture 1: xyz
//          Capture 2: mno<xyz>
Imports System.Text.RegularExpressions

Module Example
   Public Sub Main() 
        Dim pattern As String = "^[^<>]*" & _
                                "(" + "((?'Open'<)[^<>]*)+" & _
                                "((?'Close-Open'>)[^<>]*)+" + ")*" & _
                                "(?(Open)(?!))$"
        Dim input As String = "<abc><mno<xyz>>"
        Dim rgx AS New Regex(pattern)'
        Dim m As Match = Regex.Match(input, pattern)
        If m.Success Then
            Console.WriteLine("Input: ""{0}"" " & vbCrLf & "Match: ""{1}""", _
                               input, m)
            Dim grpCtr As Integer = 0
            For Each grp As Group In m.Groups
               Console.WriteLine("   Group {0}: {1}", grpCtr, grp.Value)
               grpCtr += 1
               Dim capCtr As Integer = 0
               For Each cap As Capture In grp.Captures            
                  Console.WriteLine("      Capture {0}: {1}", capCtr, cap.Value)
                  capCtr += 1
               Next
            Next
        Else
            Console.WriteLine("Match failed.")
        End If
    End Sub
End Module  
' The example displays the following output:
'       Input: "<abc><mno<xyz>>"
'       Match: "<abc><mno<xyz>>"
'          Group 0: <abc><mno<xyz>>
'             Capture 0: <abc><mno<xyz>>
'          Group 1: <mno<xyz>>
'             Capture 0: <abc>
'             Capture 1: <mno<xyz>>
'          Group 2: <xyz
'             Capture 0: <abc
'             Capture 1: <mno
'             Capture 2: <xyz
'          Group 3: >
'             Capture 0: >
'             Capture 1: >
'             Capture 2: >
'          Group 4:
'          Group 5: mno<xyz>
'             Capture 0: abc
'             Capture 1: xyz
'             Capture 2: mno<xyz>

正则表达式模式为:The regular expression pattern is:

^[^<>]*(((?'Open'<)[^<>]*)+((?'Close-Open'>)[^<>]*)+)*(?(Open)(?!))$

正则表达式按如下方式解释:The regular expression is interpreted as follows:

模式Pattern 说明Description
^ 从字符串的开头部分开始。Begin at the start of the string.
[^<>]* 匹配零个或多个不是左侧或右侧角度方括号的字符。Match zero or more characters that are not left or right angle brackets.
(?'Open'<) 匹配左尖括号并分配给名为 Open的组。Match a left angle bracket and assign it to a group named Open.
[^<>]* 匹配零个或多个不是左侧或右侧角度方括号的字符。Match zero or more characters that are not left or right angle brackets.
((?'Open'<)[^<>]*)+ 匹配跟在非左尖括号或非右尖括号的零个或多个字符后面的一个或多个左尖括号匹配项。Match one or more occurrences of a left angle bracket followed by zero or more characters that are not left or right angle brackets. 这是第二个捕获组。This is the second capturing group.
(?'Close-Open'>) 匹配右尖括号,将 Open 组和当前组分配给 Close 组并删除 Open 组的定义。Match a right angle bracket, assign the substring between the Open group and the current group to the Close group, and delete the definition of the Open group.
[^<>]* 匹配非左尖括号或非右尖括号的任何字符的零个或多个匹配项。Match zero or more occurrences of any character that is neither a left nor a right angle bracket.
((?'Close-Open'>)[^<>]*)+ 匹配跟在零后面或跟在非左尖括号或非右尖括号的多个字符后面的一个或多个右尖括号匹配项。Match one or more occurrences of a right angle bracket, followed by zero or more occurrences of any character that is neither a left nor a right angle bracket. 在匹配右尖括号时,将 Open 组和当前组分配给 Close 组并删除 Open 组的定义。When matching the right angle bracket, assign the substring between the Open group and the current group to the Close group, and delete the definition of the Open group. 这是第三个捕获组。This is the third capturing group.
(((?'Open'<)[^<>]*)+((?'Close-Open'>)[^<>]*)+)* 匹配零个或多个下列模式的匹配项:一个或多个左尖括号匹配项,后跟零个或多个非尖括号字符,后跟一个或多个右尖括号的匹配项,后跟零个或多个非尖括号的匹配项。Match zero or more occurrences of the following pattern: one or more occurrences of a left angle bracket, followed by zero or more non-angle bracket characters, followed by one or more occurrences of a right angle bracket, followed by zero or more occurrences of non-angle brackets. 在匹配右尖括号时,删除 Open 组的定义,并将 Open 组和当前组之间的子字符串分配给 Close 组。When matching the right angle bracket, delete the definition of the Open group, and assign the substring between the Open group and the current group to the Close group. 这是第一个捕获组。This is the first capturing group.
(?(Open)(?!)) 如果 Open 组存在,并可以匹配空字符串,则放弃匹配,但不前移字符串中的正则表达式引擎的位置。If the Open group exists, abandon the match if an empty string can be matched, but do not advance the position of the regular expression engine in the string. 这是零宽度负预测先行断言。This is a zero-width negative lookahead assertion. 因为空字符串总是隐式地存在于输入字符串中,所以此匹配始终失败。Because an empty string is always implicitly present in an input string, this match always fails. 此匹配的失败表示尖括号不平衡。Failure of this match indicates that the angle brackets are not balanced.
$ 匹配输入字符串的末尾部分。Match the end of the input string.

最终子表达式 (?(Open)(?!)),指示是否正确平衡输入字符串中的嵌套构造(例如,是否每个左尖括号由右键括号匹配)。The final subexpression, (?(Open)(?!)), indicates whether the nesting constructs in the input string are properly balanced (for example, whether each left angle bracket is matched by a right angle bracket). 它使用基于有效的捕获组的条件匹配,有关详细信息请参阅 替换构造It uses conditional matching based on a valid captured group; for more information, see Alternation Constructs. 如果定义了 Open 组,则正则表达式引擎会尝试匹配输入字符串中的子表达式 (?!)If the Open group is defined, the regular expression engine attempts to match the subexpression (?!) in the input string. 仅当嵌套构造不均衡时,才应该定义 Open 组。The Open group should be defined only if nesting constructs are unbalanced. 因此,要在输入字符串中匹配的模式应该是一个始终导致匹配失败的模式。Therefore, the pattern to be matched in the input string should be one that always causes the match to fail. 在此情况下, (?!) 是始终失败的零宽度负预测先行断言,因为空字符串总是隐式地存在于输入字符串中的下一个位置。In this case, (?!) is a zero-width negative lookahead assertion that always fails, because an empty string is always implicitly present at the next position in the input string.

在此示例中,正则表达式引擎计算输入字符串“<abc><mno<xyz>>”,如下表所示。In the example, the regular expression engine evaluates the input string "<abc><mno<xyz>>" as shown in the following table.

步骤Step 模式Pattern 结果Result
11 ^ 从输入字符串的开头部分开始匹配。Starts the match at the beginning of the input string
22 [^<>]* 查找左尖括号之前的非尖括号字符;未找到匹配项。Looks for non-angle bracket characters before the left angle bracket;finds no matches.
33 (((?'Open'<) 匹配“<abc>”中的左尖括号,并将它分配给 Open 组。Matches the left angle bracket in "<abc>" and assigns it to the Open group.
44 [^<>]* 与“abc”匹配。Matches "abc".
55 )+ “<abc”是第二个捕获组的值。"<abc" is the value of the second captured group.

输入字符串中的下一个字符不是左尖括号,因此正则表达式引擎不会循环回到 (?'Open'<)[^<>]*) 子模式。The next character in the input string is not a left angle bracket, so the regular expression engine does not loop back to the (?'Open'<)[^<>]*) subpattern.
66 ((?'Close-Open'>) 匹配“<abc>”中的右尖括号,将“abc”(Open 组和右尖括号之间的子字符串)分配给 Close 组,并删除 Open 组的当前值(“<”),同时保持它为空。Matches the right angle bracket in "<abc>", assigns "abc", which is the substring between the Open group and the right angle bracket, to the Close group, and deletes the current value ("<") of the Open group, leaving it empty.
77 [^<>]* 查找右尖括号之后的非尖括号字符;未找到匹配项。Looks for non-angle bracket characters after the right angle bracket; finds no matches.
88 )+ 第三个捕获组的值是“>”。The value of the third captured group is ">".

输入字符串中的下一个字符不是右尖括号,因此正则表达式引擎不会循环回到 ((?'Close-Open'>)[^<>]*) 子模式。The next character in the input string is not a right angle bracket, so the regular expression engine does not loop back to the ((?'Close-Open'>)[^<>]*) subpattern.
99 )* 第一个捕获组的值是“<abc>”。The value of the first captured group is "<abc>".

输入字符串中的下一个字符是左尖括号,因此正则表达式引擎会循环回到 (((?'Open'<) 子模式。The next character in the input string is a left angle bracket, so the regular expression engine loops back to the (((?'Open'<) subpattern.
1010 (((?'Open'<) 匹配“<mno”中的左尖括号,并将它分配给 Open 组。Matches the left angle bracket in "<mno" and assigns it to the Open group. Group.Captures 集合现在具有单个值“<”。Its Group.Captures collection now has a single value, "<".
1111 [^<>]* 与“mno”匹配。Matches "mno".
1212 )+ “<mno”是第二个捕获组的值。"<mno" is the value of the second captured group.

输入字符串中的下一个字符是左尖括号,因此正则表达式引擎会循环回到 (?'Open'<)[^<>]*) 子模式。The next character in the input string is an left angle bracket, so the regular expression engine loops back to the (?'Open'<)[^<>]*) subpattern.
1313 (((?'Open'<) 匹配“<xyz>”中的左尖括号,并将它分配给 Open 组。Matches the left angle bracket in "<xyz>" and assigns it to the Open group. Open 组的 Group.Captures 集合现在包括两个捕获:“<mno”中的左尖括号和“<xyz>”中的左尖括号。The Group.Captures collection of the Open group now includes two captures: the left angle bracket from "<mno", and the left angle bracket from "<xyz>".
1414 [^<>]* 与“xyz”匹配。Matches "xyz".
1515 )+ “<xyz”是第二个捕获组的值。"<xyz" is the value of the second captured group.

输入字符串中的下一个字符不是左尖括号,因此正则表达式引擎不会循环回到 (?'Open'<)[^<>]*) 子模式。The next character in the input string is not a left angle bracket, so the regular expression engine does not loop back to the (?'Open'<)[^<>]*) subpattern.
1616 ((?'Close-Open'>) 匹配“<xyz>”中的右尖括号。Matches the right angle bracket in "<xyz>". “xyz”将 Open 组合右尖括号之间的子字符串分配给 Close 组,并删除 Open 组的当前值。"xyz", assigns the substring between the Open group and the right angle bracket to the Close group, and deletes the current value of the Open group. 前一个捕获的值(“<mno”中的左尖括号)成为 Open 组的当前值。The value of the previous capture (the left angle bracket in "<mno") becomes the current value of the Open group. Open 组的 Captures 集合现在包括一个捕获,即“<xyz>”中的左尖括号。The Captures collection of the Open group now includes a single capture, the left angle bracket from "<xyz>".
1717 [^<>]* 查找非尖括号字符;未找到匹配项。Looks for non-angle bracket characters; finds no matches.
1818 )+ 第三个捕获组的值是“>”。The value of the third captured group is ">".

输入字符串中的下一个字符是右尖括号,因此正则表达式引擎会循环回到 ((?'Close-Open'>)[^<>]*) 子模式。The next character in the input string is a right angle bracket, so the regular expression engine loops back to the ((?'Close-Open'>)[^<>]*) subpattern.
1919 ((?'Close-Open'>) 匹配“xyz>>”中的最后一个右尖括号,将“mno<xyz>”(Open 组和右尖括号之间的子字符串)分配给 Close 组,并删除 Open 组的当前值。Matches the final right angle bracket in "xyz>>", assigns "mno<xyz>" (the substring between the Open group and the right angle bracket) to the Close group, and deletes the current value of the Open group. Open 组现在为空。The Open group is now empty.
2020 [^<>]* 查找非尖括号字符;未找到匹配项。Looks for non-angle bracket characters; finds no matches.
2121 )+ 第三个捕获组的值是“>”。The value of the third captured group is ">".

输入字符串中的下一个字符不是右尖括号,因此正则表达式引擎不会循环回到 ((?'Close-Open'>)[^<>]*) 子模式。The next character in the input string is not a right angle bracket, so the regular expression engine does not loop back to the ((?'Close-Open'>)[^<>]*) subpattern.
2222 )* 第一个捕获组的值是“<mno<xyz>>”。The value of the first captured group is "<mno<xyz>>".

输入字符串中的下一个字符不是左尖括号,因此正则表达式引擎不会循环回到 (((?'Open'<) 子模式。The next character in the input string is not a left angle bracket, so the regular expression engine does not loop back to the (((?'Open'<) subpattern.
2323 (?(Open)(?!)) Open 组是未定义的,因此没有尝试匹配。The Open group is not defined, so no match is attempted.
2424 $ 匹配输入字符串的末尾部分。Matches the end of the input string.

非捕获组Noncapturing Groups

以下分组构造不会捕获由子表达式匹配的子字符串:The following grouping construct does not capture the substring that is matched by a subexpression:

(?:subexpression)

其中 子表达式 为任何有效正则表达式模式。where subexpression is any valid regular expression pattern. 当一个限定符应用到一个组,但组捕获的子字符串并非所需时,通常会使用非捕获组构造。The noncapturing group construct is typically used when a quantifier is applied to a group, but the substrings captured by the group are of no interest.

备注

如果正则表达式包含嵌套的分组构造,则外部非捕获组构造不适用于内部嵌套组构造。If a regular expression includes nested grouping constructs, an outer noncapturing group construct does not apply to the inner nested group constructs.

下面的示例阐释包括非捕获组的正则表达式。The following example illustrates a regular expression that includes noncapturing groups. 请注意输出不包含任何已捕获的组。Note that the output does not include any captured groups.

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string pattern = @"(?:\b(?:\w+)\W*)+\.";
      string input = "This is a short sentence.";
      Match match = Regex.Match(input, pattern);
      Console.WriteLine("Match: {0}", match.Value);
      for (int ctr = 1; ctr < match.Groups.Count; ctr++)
         Console.WriteLine("   Group {0}: {1}", ctr, match.Groups[ctr].Value);
   }
}
// The example displays the following output:
//       Match: This is a short sentence.
Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Dim pattern As String = "(?:\b(?:\w+)\W*)+\."
      Dim input As String = "This is a short sentence."
      Dim match As Match = Regex.Match(input, pattern)
      Console.WriteLine("Match: {0}", match.Value)
      For ctr As Integer = 1 To match.Groups.Count - 1
         Console.WriteLine("   Group {0}: {1}", ctr, match.Groups(ctr).Value)
      Next
   End Sub
End Module
' The example displays the following output:
'       Match: This is a short sentence.

正则表达式 (?:\b(?:\w+)\W*)+\. 匹配由句号终止的语句。The regular expression (?:\b(?:\w+)\W*)+\. matches a sentence that is terminated by a period. 因为正则表达式重点介绍句子,而不是个别单词,所以分组构造以独占方式用作限定符。Because the regular expression focuses on sentences and not on individual words, grouping constructs are used exclusively as quantifiers. 正则表达式模式可以解释为下表中所示内容。The regular expression pattern is interpreted as shown in the following table.

模式Pattern 说明Description
\b 在单词边界处开始匹配。Begin the match at a word boundary.
(?:\w+) 匹配一个或多个单词字符。Match one or more word characters. 不将匹配的文本分配给捕获的组。Do not assign the matched text to a captured group.
\W* 匹配零个或多个非单词字符。Match zero or more non-word characters.
(?:\b(?:\w+)\W*)+ 一次或多次匹配跟在零个或多个非单词字符后面以单词边界开头的一个或多个单词字符的模式。Match the pattern of one or more word characters starting at a word boundary, followed by zero or more non-word characters, one or more times. 不将匹配的文本分配给捕获的组。Do not assign the matched text to a captured group.
\. 匹配句点。Match a period.

组选项Group Options

以下分组构造应用或禁用子表达式中指定的选项:The following grouping construct applies or disables the specified options within a subexpression:

(?imnsx-imnsx: 子表达式 )(?imnsx-imnsx: subexpression )

其中 子表达式 为任何有效正则表达式模式。where subexpression is any valid regular expression pattern. 例如, (?i-s:) 将打开不区分大小写并禁用单行模式。For example, (?i-s:) turns on case insensitivity and disables single-line mode. 有关可以指定的内联选项的更多信息,请参见 正则表达式选项For more information about the inline options you can specify, see Regular Expression Options.

备注

可以指定将应用于整个正则表达式,而不是子表达式的选项,方法是使用 System.Text.RegularExpressions.Regex 类构造函数或静态方法。You can specify options that apply to an entire regular expression rather than a subexpression by using a System.Text.RegularExpressions.Regex class constructor or a static method. 也可指定在正则表达式特定点后使用的内联选项,方法是使用 (?imnsx-imnsx) 语言构造。You can also specify inline options that apply after a specific point in a regular expression by using the (?imnsx-imnsx) language construct.

组的选项构造并非捕获组。The group options construct is not a capturing group. 即尽管 子表达式 捕获的字符串的任意部分包含在匹配中,但不会包含在捕获的组中也不会用于填充 GroupCollection 对象。That is, although any portion of a string that is captured by subexpression is included in the match, it is not included in a captured group nor used to populate the GroupCollection object.

例如,以下示例中的正则表达式 \b(?ix: d \w+)\s 使用分组构造中的内联选项,以启用不区分大小写的匹配和在识别所有以字母“d”开头的单词时忽略模式空白。For example, the regular expression \b(?ix: d \w+)\s in the following example uses inline options in a grouping construct to enable case-insensitive matching and ignore pattern white space in identifying all words that begin with the letter "d". 该正则表达式的定义如下表所示。The regular expression is defined as shown in the following table.

模式Pattern 说明Description
\b 在单词边界处开始匹配。Begin the match at a word boundary.
(?ix: d \w+) 使用不区分大小写的匹配并忽略此模式中的空白,匹配后跟一个或多个单词字符的“d”。Using case-insensitive matching and ignoring white space in this pattern, match a "d" followed by one or more word characters.
\s 与空白字符匹配。Match a white-space character.
string pattern = @"\b(?ix: d \w+)\s";
string input = "Dogs are decidedly good pets.";

foreach (Match match in Regex.Matches(input, pattern))
   Console.WriteLine("'{0}// found at index {1}.", match.Value, match.Index);
// The example displays the following output:
//    'Dogs // found at index 0.
//    'decidedly // found at index 9.      
Dim pattern As String = "\b(?ix: d \w+)\s"
Dim input As String = "Dogs are decidedly good pets."

For Each match As Match In Regex.Matches(input, pattern)
   Console.WriteLine("'{0}' found at index {1}.", match.Value, match.Index)
Next
' The example displays the following output:
'    'Dogs ' found at index 0.
'    'decidedly ' found at index 9.      

零宽度正预测先行断言Zero-Width Positive Lookahead Assertions

以下分组构造定义零宽度正预测先行断言:The following grouping construct defines a zero-width positive lookahead assertion:

(?= 子表达式 )(?= subexpression )

其中 子表达式 为任何正则表达式模式。where subexpression is any regular expression pattern. 若要成功匹配,则输入字符串必须匹配 子表达式中的正则表达式模式,尽管匹配的子字符串未包含在匹配结果中。For a match to be successful, the input string must match the regular expression pattern in subexpression, although the matched substring is not included in the match result. 零宽度正预测先行断言不会回溯。A zero-width positive lookahead assertion does not backtrack.

通常,零宽度正预测先行断言是在正则表达式模式的末尾找到的。Typically, a zero-width positive lookahead assertion is found at the end of a regular expression pattern. 它定义了一个子字符串,该子字符串必须出现在匹配字符串的末尾但又不能包含在匹配结果中。It defines a substring that must be found at the end of a string for a match to occur but that should not be included in the match. 还有助于防止过度回溯。It is also useful for preventing excessive backtracking. 可使用零宽度正预测先行断言来确保特定捕获组以与专为该捕获组定义的模式的子集相匹配的文本开始。You can use a zero-width positive lookahead assertion to ensure that a particular captured group begins with text that matches a subset of the pattern defined for that captured group. 例如,如果捕获组与连续单词字符相匹配,可以使用零宽度正预测先行断言要求第一个字符是按字母顺序排列的大写字符。For example, if a capturing group matches consecutive word characters, you can use a zero-width positive lookahead assertion to require that the first character be an alphabetical uppercase character.

下面的示例使用零宽度正预测先行断言,以匹配输入字符串中谓词“is”前的单词。The following example uses a zero-width positive lookahead assertion to match the word that precedes the verb "is" in the input string.

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string pattern = @"\b\w+(?=\sis\b)";
      string[] inputs = { "The dog is a Malamute.", 
                          "The island has beautiful birds.", 
                          "The pitch missed home plate.", 
                          "Sunday is a weekend day." };

      foreach (string input in inputs)
      {
         Match match = Regex.Match(input, pattern);
         if (match.Success)
            Console.WriteLine("'{0}' precedes 'is'.", match.Value);
         else
            Console.WriteLine("'{0}' does not match the pattern.", input); 
      }
   }
}
// The example displays the following output:
//    'dog' precedes 'is'.
//    'The island has beautiful birds.' does not match the pattern.
//    'The pitch missed home plate.' does not match the pattern.
//    'Sunday' precedes 'is'.
Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Dim pattern As String = "\b\w+(?=\sis\b)"
      Dim inputs() As String = { "The dog is a Malamute.", _
                                 "The island has beautiful birds.", _
                                 "The pitch missed home plate.", _
                                 "Sunday is a weekend day." }

      For Each input As String In inputs
         Dim match As Match = Regex.Match(input, pattern)
         If match.Success Then
            Console.WriteLine("'{0}' precedes 'is'.", match.Value)
         Else
            Console.WriteLine("'{0}' does not match the pattern.", input) 
         End If     
      Next
   End Sub
End Module
' The example displays the following output:
'       'dog' precedes 'is'.
'       'The island has beautiful birds.' does not match the pattern.
'       'The pitch missed home plate.' does not match the pattern.
'       'Sunday' precedes 'is'.

正则表达式 \b\w+(?=\sis\b) 可以解释为下表中所示内容。The regular expression \b\w+(?=\sis\b) is interpreted as shown in the following table.

模式Pattern 说明Description
\b 在单词边界处开始匹配。Begin the match at a word boundary.
\w+ 匹配一个或多个单词字符。Match one or more word characters.
(?=\sis\b) 确定单词字符是否后接空白字符和字符串“is”,其在单词边界处结束。Determine whether the word characters are followed by a white-space character and the string "is", which ends on a word boundary. 如果如此,则匹配成功。If so, the match is successful.

零宽度负预测先行断言Zero-Width Negative Lookahead Assertions

以下分组构造定义零宽度负预测先行断言:The following grouping construct defines a zero-width negative lookahead assertion:

(?! 子表达式 )(?! subexpression )

其中 子表达式 为任何正则表达式模式。where subexpression is any regular expression pattern. 若要成功匹配,则输入字符串不得匹配 子表达式中的正则表达式模式,尽管匹配的子字符串未包含在匹配结果中。For the match to be successful, the input string must not match the regular expression pattern in subexpression, although the matched string is not included in the match result.

零宽度负预测先行断言通常用在正则表达式的开头或结尾。A zero-width negative lookahead assertion is typically used either at the beginning or at the end of a regular expression. 正则表达式的开头可以定义当其定义了要被匹配的相似但更常规的模式时,不应被匹配的特定模式。At the beginning of a regular expression, it can define a specific pattern that should not be matched when the beginning of the regular expression defines a similar but more general pattern to be matched. 在这种情况下,它通常用于限制回溯。In this case, it is often used to limit backtracking. 正则表达式的末尾可以定义不能出现在匹配项末尾处的子表达式。At the end of a regular expression, it can define a subexpression that cannot occur at the end of a match.

下面的示例定义了正则表达式匹配,其在正则表达式的开头使用零宽度预测先行断言,以匹配未以“un”开头的单词。The following example defines a regular expression that uses a zero-width lookahead assertion at the beginning of the regular expression to match words that do not begin with "un".

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string pattern = @"\b(?!un)\w+\b";
      string input = "unite one unethical ethics use untie ultimate";
      foreach (Match match in Regex.Matches(input, pattern, RegexOptions.IgnoreCase))
         Console.WriteLine(match.Value);
   }
}
// The example displays the following output:
//       one
//       ethics
//       use
//       ultimate
Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Dim pattern As String = "\b(?!un)\w+\b"
      Dim input As String = "unite one unethical ethics use untie ultimate"
      For Each match As Match In Regex.Matches(input, pattern, RegexOptions.IgnoreCase)
         Console.WriteLine(match.Value)
      Next
   End Sub
End Module
' The example displays the following output:
'       one
'       ethics
'       use
'       ultimate

正则表达式 \b(?!un)\w+\b 可以解释为下表中所示内容。The regular expression \b(?!un)\w+\b is interpreted as shown in the following table.

模式Pattern 说明Description
\b 在单词边界处开始匹配。Begin the match at a word boundary.
(?!un) 确定接下来的两个的字符是否为“un”。Determine whether the next two characters are "un". 如果没有,则可能匹配。If they are not, a match is possible.
\w+ 匹配一个或多个单词字符。Match one or more word characters.
\b 在单词边界处结束匹配。End the match at a word boundary.

下面的示例定义了正则表达式匹配,其在正则表达式的末尾使用零宽度预测先行断言,以匹配未以标点字符结束的单词。The following example defines a regular expression that uses a zero-width lookahead assertion at the end of the regular expression to match words that do not end with a punctuation character.

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string pattern = @"\b\w+\b(?!\p{P})";
      string input = "Disconnected, disjointed thoughts in a sentence fragment.";
      foreach (Match match in Regex.Matches(input, pattern))
         Console.WriteLine(match.Value);
   }
}
// The example displays the following output:
//       disjointed
//       thoughts
//       in
//       a
//       sentence
Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Dim pattern As String = "\b\w+\b(?!\p{P})"
      Dim input As String = "Disconnected, disjointed thoughts in a sentence fragment."
      For Each match As Match In Regex.Matches(input, pattern)
         Console.WriteLine(match.Value)
      Next   
   End Sub
End Module
' The example displays the following output:
'       disjointed
'       thoughts
'       in
'       a
'       sentence

正则表达式 \b\w+\b(?!\p{P}) 可以解释为下表中所示内容。The regular expression \b\w+\b(?!\p{P}) is interpreted as shown in the following table.

模式Pattern 说明Description
\b 在单词边界处开始匹配。Begin the match at a word boundary.
\w+ 匹配一个或多个单词字符。Match one or more word characters.
\b 在单词边界处结束匹配。End the match at a word boundary.
\p{P}) 如果下个字符不是一个标点符号(如句点或逗号),则匹配成功。If the next character is not a punctuation symbol (such as a period or a comma), the match succeeds.

零宽度正回顾后发断言Zero-Width Positive Lookbehind Assertions

以下分组构造定义零宽度正回顾后发断言:The following grouping construct defines a zero-width positive lookbehind assertion:

(?<= 子表达式 )(?<= subexpression )

其中 子表达式 为任何正则表达式模式。where subexpression is any regular expression pattern. 若要成功匹配,则 子表达式 必须在输入字符串当前位置左侧出现,尽管 subexpression 未包含在匹配结果中。For a match to be successful, subexpression must occur at the input string to the left of the current position, although subexpression is not included in the match result. 零宽度正回顾后发断言不会回溯。A zero-width positive lookbehind assertion does not backtrack.

零宽度正预测后发断言通常在正则表达式的开头使用。Zero-width positive lookbehind assertions are typically used at the beginning of regular expressions. 它们定义的模式是一个匹配的前提条件,但它不是匹配结果的一部分。The pattern that they define is a precondition for a match, although it is not a part of the match result.

例如,下面的示例匹配二十一世纪年份的最后两个数字(也就是说,数字“20”要在匹配的字符串之前)。For example, the following example matches the last two digits of the year for the twenty first century (that is, it requires that the digits "20" precede the matched string).

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string input = "2010 1999 1861 2140 2009";
      string pattern = @"(?<=\b20)\d{2}\b";
      
      foreach (Match match in Regex.Matches(input, pattern))
         Console.WriteLine(match.Value);
   }
}
// The example displays the following output:
//       10
//       09
Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Dim input As String = "2010 1999 1861 2140 2009"
      Dim pattern As String = "(?<=\b20)\d{2}\b"
      
      For Each match As Match In Regex.Matches(input, pattern)
         Console.WriteLine(match.Value)
      Next      
   End Sub
End Module
' The example displays the following output:
'       10
'       09

正则表达式模式 (?<=\b20)\d{2}\b 的含义如下表所示。The regular expression pattern (?<=\b20)\d{2}\b is interpreted as shown in the following table.

模式Pattern 说明Description
\d{2} 匹配两个十进制数字。Match two decimal digits.
(?<=\b20) 如果两个十进制数字的字边界以小数位数“20”开头,则继续匹配。Continue the match if the two decimal digits are preceded by the decimal digits "20" on a word boundary.
\b 在单词边界处结束匹配。End the match at a word boundary.

零宽度正回顾后发断言还用于在捕获组中的最后一个或多个字符不得为与该捕获组的正则表达式模式相匹配的字符的子集时限制回溯。Zero-width positive lookbehind assertions are also used to limit backtracking when the last character or characters in a captured group must be a subset of the characters that match that group's regular expression pattern. 例如,如果组捕获所有的连续单词字符,可以使用零宽度正回顾后发断言要求最后一个字符时按字母顺序的。For example, if a group captures all consecutive word characters, you can use a zero-width positive lookbehind assertion to require that the last character be alphabetical.

零宽度负回顾后发断言Zero-Width Negative Lookbehind Assertions

以下组构造定义零宽度负回顾后发断言:The following grouping construct defines a zero-width negative lookbehind assertion:

(?<! 子表达式 )(?<! subexpression )

其中 子表达式 为任何正则表达式模式。where subexpression is any regular expression pattern. 若要成功匹配,则 子表达式 不得在输入字符串当前位置的左侧出现。For a match to be successful, subexpression must not occur at the input string to the left of the current position. 但是,任何不匹配 subexpression 的子字符串不包含在匹配结果中。However, any substring that does not match subexpression is not included in the match result.

零宽度负回顾后发断言通常在正则表达式的开头使用。Zero-width negative lookbehind assertions are typically used at the beginning of regular expressions. 它们定义的模式预先排除在后面的字符串中的匹配项。The pattern that they define precludes a match in the string that follows. 它们还用于在捕获组中的最后一个或多个字符不得为与该捕获组的正则表达式模式相匹配的其中一个或多个字符时限制回溯。They are also used to limit backtracking when the last character or characters in a captured group must not be one or more of the characters that match that group's regular expression pattern. 例如,如果如果组捕获了所有的连续单词字符,可以使用零宽度正回顾后发断言要求最后一个字符不是下划线 (_)。For example, if a group captures all consecutive word characters, you can use a zero-width positive lookbehind assertion to require that the last character not be an underscore (_).

下面的示例匹配除周末之外的一周的任何一天(也就是星期六和星期日都没有)。The following example matches the date for any day of the week that is not a weekend (that is, that is neither Saturday nor Sunday).

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string[] dates = { "Monday February 1, 2010", 
                         "Wednesday February 3, 2010", 
                         "Saturday February 6, 2010", 
                         "Sunday February 7, 2010", 
                         "Monday, February 8, 2010" };
      string pattern = @"(?<!(Saturday|Sunday) )\b\w+ \d{1,2}, \d{4}\b";
      
      foreach (string dateValue in dates)
      {
         Match match = Regex.Match(dateValue, pattern);
         if (match.Success)
            Console.WriteLine(match.Value);
      }      
   }
}
// The example displays the following output:
//       February 1, 2010
//       February 3, 2010
//       February 8, 2010
Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Dim dates() As String = { "Monday February 1, 2010", _
                                "Wednesday February 3, 2010", _
                                "Saturday February 6, 2010", _
                                "Sunday February 7, 2010", _
                                "Monday, February 8, 2010" }
      Dim pattern As String = "(?<!(Saturday|Sunday) )\b\w+ \d{1,2}, \d{4}\b"
      
      For Each dateValue As String In dates
         Dim match As Match = Regex.Match(dateValue, pattern)
         If match.Success Then
            Console.WriteLine(match.Value)
         End If   
      Next      
   End Sub
End Module
' The example displays the following output:
'       February 1, 2010
'       February 3, 2010
'       February 8, 2010

正则表达式模式 (?<!(Saturday|Sunday) )\b\w+ \d{1,2}, \d{4}\b 的含义如下表所示。The regular expression pattern (?<!(Saturday|Sunday) )\b\w+ \d{1,2}, \d{4}\b is interpreted as shown in the following table.

模式Pattern 说明Description
\b 在单词边界处开始匹配。Begin the match at a word boundary.
\w+ 匹配一个或多个后跟空白字符的单词字符。Match one or more word characters followed by a white-space character.
\d{1,2}, 匹配空白字符和逗号后面的一个或两个十进制数字。Match either one or two decimal digits followed by a white-space character and a comma.
\d{4}\b 匹配四个十进制数字并在单词边界处结束匹配。Match four decimal digits, and end the match at a word boundary.
(?<!(Saturday|Sunday) ) 如果匹配以字符串“星期六”或者“星期日”开头,后跟一个空格,则匹配成功。If the match is preceded by something other than the strings "Saturday" or "Sunday" followed by a space, the match is successful.

非回溯子表达式Nonbacktracking Subexpressions

以下分组构造表示非回溯子表达式(也称为一个“贪婪”子表达式):The following grouping construct represents a nonbacktracking subexpression (also known as a "greedy" subexpression):

(?> 子表达式 )(?> subexpression )

其中 子表达式 为任何正则表达式模式。where subexpression is any regular expression pattern.

通常,如果正则表达式包含一个可选或可替代匹配模式并且备选不成功的话,正则表达式引擎可以在多个方向上分支以将输入的字符串与某种模式进行匹配。Ordinarily, if a regular expression includes an optional or alternative matching pattern and a match does not succeed, the regular expression engine can branch in multiple directions to match an input string with a pattern. 如果未找到使用第一个分支的匹配项,则正则表达式引擎可以备份或回溯到使用第一个匹配项的点并尝试使用第二个分支的匹配项。If a match is not found when it takes the first branch, the regular expression engine can back up or backtrack to the point where it took the first match and attempt the match using the second branch. 此过程可继续进行,直到尝试所有分支。This process can continue until all branches have been tried.

仅当嵌套构造不均衡时,才应该定义 (?>子表达式) 语言构造禁用回溯。The (?>subexpression) language construct disables backtracking. 正则表达式引擎将在输入字符串中匹配尽可能多的字符。The regular expression engine will match as many characters in the input string as it can. 在没有任何进一步匹配可用时,它将不回溯以尝试备用模式匹配。When no further match is possible, it will not backtrack to attempt alternate pattern matches. (也就是说,该子表达式仅与可由该子表达式单独匹配的字符串匹配;子表达式不会尝试与基于该子表达式的字符串和任何该子表达式之后的子表达式匹配。)(That is, the subexpression matches only strings that would be matched by the subexpression alone; it does not attempt to match a string based on the subexpression and any subexpressions that follow it.)

如果你知道回溯不会成功,则建议使用此选项。This option is recommended if you know that backtracking will not succeed. 防止正则表达式引擎执行不需要的搜索可以提高性能。Preventing the regular expression engine from performing unnecessary searching improves performance.

下面的示例阐释非回溯子表达式如何修改模式匹配的结果。The following example illustrates how a nonbacktracking subexpression modifies the results of a pattern match. 回溯正则表达式成功匹配一系列重复字符,在字边界上其后为相同字符,但非回溯正则表达式不会匹配。The backtracking regular expression successfully matches a series of repeated characters followed by one more occurrence of the same character on a word boundary, but the nonbacktracking regular expression does not.

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string[] inputs = { "cccd.", "aaad", "aaaa" };
      string back = @"(\w)\1+.\b";
      string noback = @"(?>(\w)\1+).\b";
      
      foreach (string input in inputs)
      {
         Match match1 = Regex.Match(input, back);
         Match match2 = Regex.Match(input, noback);
         Console.WriteLine("{0}: ", input);

         Console.Write("   Backtracking : ");
         if (match1.Success)
            Console.WriteLine(match1.Value);
         else
            Console.WriteLine("No match");
         
         Console.Write("   Nonbacktracking: ");
         if (match2.Success)
            Console.WriteLine(match2.Value);
         else
            Console.WriteLine("No match");
      }
   }
}
// The example displays the following output:
//    cccd.:
//       Backtracking : cccd
//       Nonbacktracking: cccd
//    aaad:
//       Backtracking : aaad
//       Nonbacktracking: aaad
//    aaaa:
//       Backtracking : aaaa
//       Nonbacktracking: No match
Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Dim inputs() As String = { "cccd.", "aaad", "aaaa" }
      Dim back As String = "(\w)\1+.\b"
      Dim noback As String = "(?>(\w)\1+).\b"
      
      For Each input As String In inputs
         Dim match1 As Match = Regex.Match(input, back)
         Dim match2 As Match = Regex.Match(input, noback)
         Console.WriteLine("{0}: ", input)

         Console.Write("   Backtracking : ")
         If match1.Success Then
            Console.WriteLine(match1.Value)
         Else
            Console.WriteLine("No match")
         End If
         
         Console.Write("   Nonbacktracking: ")
         If match2.Success Then
            Console.WriteLine(match2.Value)
         Else
            Console.WriteLine("No match")
         End If
      Next
   End Sub
End Module
' The example displays the following output:
'    cccd.:
'       Backtracking : cccd
'       Nonbacktracking: cccd
'    aaad:
'       Backtracking : aaad
'       Nonbacktracking: aaad
'    aaaa:
'       Backtracking : aaaa
'       Nonbacktracking: No match

非回溯正则表达式 (?>(\w)\1+).\b 的定义如下表所示。The nonbacktracking regular expression (?>(\w)\1+).\b is defined as shown in the following table.

模式Pattern 说明Description
(\w) 匹配单个单词字符,并将其分配给第一捕获组。Match a single word character and assign it to the first capturing group.
\1+ 一次或多次匹配的第一个捕获子字符串的值。Match the value of the first captured substring one or more times.
. 匹配任意字符。Match any character.
\b 在单词边界处结束匹配。End the match on a word boundary.
(?>(\w)\1+) 匹配一个重复的单词字符的一个或多个匹配项,但不执行回溯以匹配在单词边界上的最后一个字符。Match one or more occurrences of a duplicated word character, but do not backtrack to match the last character on a word boundary.

分组构造和正则表达式对象Grouping Constructs and Regular Expression Objects

由正则表达式捕获组匹配的子字符串由 System.Text.RegularExpressions.Group 对象表示,其从 System.Text.RegularExpressions.GroupCollection 对象检索,其由 Match.Groups 属性返回。Substrings that are matched by a regular expression capturing group are represented by System.Text.RegularExpressions.Group objects, which can be retrieved from the System.Text.RegularExpressions.GroupCollection object that is returned by the Match.Groups property. 填充 GroupCollection 对象,如下所示:The GroupCollection object is populated as follows:

  • 集合中的第一个 Group 对象(位于索引零的对象)表示整个匹配。The first Group object in the collection (the object at index zero) represents the entire match.

  • 下一组 Group 对象表示未命名(编号)的捕获组。The next set of Group objects represent unnamed (numbered) capturing groups. 它们以在正则表达式中定义的顺序出现,从左至右。They appear in the order in which they are defined in the regular expression, from left to right. 这些组的索引值范围从 1 到集合中未命名捕获组的数目。The index values of these groups range from 1 to the number of unnamed capturing groups in the collection. (特定组索引等效于其带编号的反向引用。(The index of a particular group is equivalent to its numbered backreference. 有关向后引用的更多信息,请参见 反向引用构造。)For more information about backreferences, see Backreference Constructs.)

  • 最后的 Group 对象组表示命名的捕获组。The final set of Group objects represent named capturing groups. 它们以在正则表达式中定义的顺序出现,从左至右。They appear in the order in which they are defined in the regular expression, from left to right. 第一个名为捕获组的索引值是一个大于最后一个未命名的捕获组的索引。The index value of the first named capturing group is one greater than the index of the last unnamed capturing group. 如果正则表达式中没有未命名捕获组,则第一个命名的捕获组的索引值为 1。If there are no unnamed capturing groups in the regular expression, the index value of the first named capturing group is one.

如果将限定符应用于捕获组,则对应的 Group 对象的 Capture.ValueCapture.IndexCapture.Length 属性反映捕获组捕获的最后一个子字符串。If you apply a quantifier to a capturing group, the corresponding Group object's Capture.Value, Capture.Index, and Capture.Length properties reflect the last substring that is captured by a capturing group. 可以检索一整组子字符串,其是按组捕获的并具有来自 CaptureCollection 对象的限定符,其由 Group.Captures 属性返回。You can retrieve a complete set of substrings that are captured by groups that have quantifiers from the CaptureCollection object that is returned by the Group.Captures property.

下面的示例阐释 GroupCapture 对象之间的关系。The following example clarifies the relationship between the Group and Capture objects.

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string pattern = @"(\b(\w+)\W+)+";
      string input = "This is a short sentence.";
      Match match = Regex.Match(input, pattern);
      Console.WriteLine("Match: '{0}'", match.Value);
      for (int ctr = 1; ctr < match.Groups.Count; ctr++)
      {
         Console.WriteLine("   Group {0}: '{1}'", ctr, match.Groups[ctr].Value);
         int capCtr = 0;
         foreach (Capture capture in match.Groups[ctr].Captures)
         {
            Console.WriteLine("      Capture {0}: '{1}'", capCtr, capture.Value);
            capCtr++;
         }
      }
   }
}
// The example displays the following output:
//       Match: 'This is a short sentence.'
//          Group 1: 'sentence.'
//             Capture 0: 'This '
//             Capture 1: 'is '
//             Capture 2: 'a '
//             Capture 3: 'short '
//             Capture 4: 'sentence.'
//          Group 2: 'sentence'
//             Capture 0: 'This'
//             Capture 1: 'is'
//             Capture 2: 'a'
//             Capture 3: 'short'
//             Capture 4: 'sentence'
Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Dim pattern As String = "(\b(\w+)\W+)+"
      Dim input As String = "This is a short sentence."
      Dim match As Match = Regex.Match(input, pattern)
      Console.WriteLine("Match: '{0}'", match.Value)
      For ctr As Integer = 1 To match.Groups.Count - 1
         Console.WriteLine("   Group {0}: '{1}'", ctr, match.Groups(ctr).Value)
         Dim capCtr As Integer = 0
         For Each capture As Capture In match.Groups(ctr).Captures
            Console.WriteLine("      Capture {0}: '{1}'", capCtr, capture.Value)
            capCtr += 1
         Next
      Next
   End Sub
End Module
' The example displays the following output:
'       Match: 'This is a short sentence.'
'          Group 1: 'sentence.'
'             Capture 0: 'This '
'             Capture 1: 'is '
'             Capture 2: 'a '
'             Capture 3: 'short '
'             Capture 4: 'sentence.'
'          Group 2: 'sentence'
'             Capture 0: 'This'
'             Capture 1: 'is'
'             Capture 2: 'a'
'             Capture 3: 'short'
'             Capture 4: 'sentence'

正则表达式模式 (\b(\w+)\W+)+ 从字符串提取各个单词。The regular expression pattern (\b(\w+)\W+)+ extracts individual words from a string. 其定义如下表所示。It is defined as shown in the following table.

模式Pattern 说明Description
\b 在单词边界处开始匹配。Begin the match at a word boundary.
(\w+) 匹配一个或多个单词字符。Match one or more word characters. 这些字符一起构成一个单词。Together, these characters form a word. 这是第二个捕获组。This is the second capturing group.
\W+ 匹配一个或多个非单词字符。Match one or more non-word characters.
(\b(\w+)\W+) 一次或多次匹配跟在一个或多个非单词字符后面的一个或多个单词字符的模式。Match the pattern of one or more word characters followed by one or more non-word characters one or more times. 这是第一个捕获组。This is the first capturing group.

第二个捕获组匹配句子的每个单词。The second capturing group matches each word of the sentence. 第一个捕获组匹配每个单词,连同标点符号和该单词后的空白区域。The first capturing group matches each word along with the punctuation and white space that follow the word. Group 对象的索引是 2,提供了有关由第二个捕获组匹配的文本的信息。The Group object whose index is 2 provides information about the text matched by the second capturing group. 可从 CaptureCollection 对象获取捕获组捕获的整组单词,该对象由 Group.Captures 属性返回。The complete set of words captured by the capturing group are available from the CaptureCollection object returned by the Group.Captures property.

请参阅See also