正则表达式中的替换构造Alternation Constructs in Regular Expressions

替换构造可修改正则表达式以启用 either/or 或条件匹配。Alternation constructs modify a regular expression to enable either/or or conditional matching. .NET 支持三种替换构造:.NET supports three alternation constructs:

利用 | 的模式匹配Pattern Matching with |

可以使用竖线 (|) 字符匹配一系列模式中的任何一种模式,其中 | 字符用于分隔每个模式。You can use the vertical bar (|) character to match any one of a series of patterns, where the | character separates each pattern.

与正向字符集一样, | 字符可用于匹配多个字符中的任意一个字符。Like the positive character class, the | character can be used to match any one of a number of single characters. 以下示例使用正向字符集和 either/or 模式匹配(使用 | 字符)查找字符串中单词“gray”或“grey”的匹配项。The following example uses both a positive character class and either/or pattern matching with the | character to locate occurrences of the words "gray" or "grey" in a string. 在该示例中, | 字符生成了更为详细的正则表达式。In this case, the | character produces a regular expression that is more verbose.

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      // Regular expression using character class.
      string pattern1 = @"\bgr[ae]y\b";
      // Regular expression using either/or.
      string pattern2 = @"\bgr(a|e)y\b";

      string input = "The gray wolf blended in among the grey rocks.";
      foreach (Match match in Regex.Matches(input, pattern1))
         Console.WriteLine("'{0}' found at position {1}",
                           match.Value, match.Index);
      Console.WriteLine();
      foreach (Match match in Regex.Matches(input, pattern2))
         Console.WriteLine("'{0}' found at position {1}",
                           match.Value, match.Index);
   }
}
// The example displays the following output:
//       'gray' found at position 4
//       'grey' found at position 35
//
//       'gray' found at position 4
//       'grey' found at position 35
Imports System.Text.RegularExpressions

Module Example
    Public Sub Main()
        ' Regular expression using character class.
        Dim pattern1 As String = "\bgr[ae]y\b"
        ' Regular expression using either/or.
        Dim pattern2 As String = "\bgr(a|e)y\b"

        Dim input As String = "The gray wolf blended in among the grey rocks."
        For Each match As Match In Regex.Matches(input, pattern1)
            Console.WriteLine("'{0}' found at position {1}", _
                              match.Value, match.Index)
        Next
        Console.WriteLine()
        For Each match As Match In Regex.Matches(input, pattern2)
            Console.WriteLine("'{0}' found at position {1}", _
                              match.Value, match.Index)
        Next
    End Sub
End Module
' The example displays the following output:
'       'gray' found at position 4
'       'grey' found at position 35
'       
'       'gray' found at position 4
'       'grey' found at position 35           

使用 | 字符的正则表达式 \bgr(a|e)y\b 的解释如下表所示:The regular expression that uses the | character, \bgr(a|e)y\b, is interpreted as shown in the following table:

模式Pattern 描述Description
\b 在单词边界处开始。Start at a word boundary.
gr 匹配字符“gr”。Match the characters "gr".
(a|e) 匹配“a”或“e”。Match either an "a" or an "e".
y\b 匹配单词边界中的“y”。Match a "y" on a word boundary.

还可以使用 | 字符执行具有多个字符或子表达式(包含任意组合的字符常量和正则表达式语言元素)的 either/or 匹配。The | character can also be used to perform an either/or match with multiple characters or subexpressions, which can include any combination of character literals and regular expression language elements. (字符类不提供此功能。)下面的示例使用 | 字符提取美国社会安全号码 (SSN)(格式为 ddd-dd-dddd 的 9 位数字),或美国雇主标识号 (EIN)(格式为 dd-ddddddd 的 9 位数字) 。(The character class does not provide this functionality.) The following example uses the | character to extract either a U.S. Social Security Number (SSN), which is a 9-digit number with the format ddd-dd-dddd, or a U.S. Employer Identification Number (EIN), which is a 9-digit number with the format dd-ddddddd.

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string pattern = @"\b(\d{2}-\d{7}|\d{3}-\d{2}-\d{4})\b";
      string input = "01-9999999 020-333333 777-88-9999";
      Console.WriteLine("Matches for {0}:", pattern);
      foreach (Match match in Regex.Matches(input, pattern))
         Console.WriteLine("   {0} at position {1}", match.Value, match.Index);
   }
}
// The example displays the following output:
//       Matches for \b(\d{2}-\d{7}|\d{3}-\d{2}-\d{4})\b:
//          01-9999999 at position 0
//          777-88-9999 at position 22
Imports System.Text.RegularExpressions

Module Example
    Public Sub Main()
        Dim pattern As String = "\b(\d{2}-\d{7}|\d{3}-\d{2}-\d{4})\b"
        Dim input As String = "01-9999999 020-333333 777-88-9999"
        Console.WriteLine("Matches for {0}:", pattern)
        For Each match As Match In Regex.Matches(input, pattern)
            Console.WriteLine("   {0} at position {1}", match.Value, match.Index)
        Next
    End Sub
End Module
' The example displays the following output:
'       Matches for \b(\d{2}-\d{7}|\d{3}-\d{2}-\d{4})\b:
'          01-9999999 at position 0
'          777-88-9999 at position 22

正则表达式 \b(\d{2}-\d{7}|\d{3}-\d{2}-\d{4})\b 可以解释为下表中所示内容:The regular expression \b(\d{2}-\d{7}|\d{3}-\d{2}-\d{4})\b is interpreted as shown in the following table:

模式Pattern 描述Description
\b 在单词边界处开始。Start at a word boundary.
(\d{2}-\d{7}|\d{3}-\d{2}-\d{4}) 匹配以下其中一个内容:连字符连接的两个十进制数字和七个十进制数字;或三个十进制数字后接连字符,后接两个十进制数字,后接另一个连字符,然后再接四个十进制数字。Match either of the following: two decimal digits followed by a hyphen followed by seven decimal digits; or three decimal digits, a hyphen, two decimal digits, another hyphen, and four decimal digits.
\b 在单词边界处结束匹配。End the match at a word boundary.

条件匹配的表达式Conditional matching with an expression

此语言元素尝试根据是否可以匹配初始模式来匹配两种模式之一。This language element attempts to match one of two patterns depending on whether it can match an initial pattern. 语法为:Its syntax is:

(?( expression ) yes | no )(?( expression ) yes | no )

其中, expression 是要匹配的初始模式, yes 是当匹配 expression 时要匹配的模式,而 no 是未匹配 expression 时要匹配的可选模式。where expression is the initial pattern to match, yes is the pattern to match if expression is matched, and no is the optional pattern to match if expression is not matched. 正则表达式引擎将 expression 视为一个宽度为零的断言;也就是说,正则表达式引擎在计算 expression之后,不再处理输入流的后续数据。The regular expression engine treats expression as a zero-width assertion; that is, the regular expression engine does not advance in the input stream after it evaluates expression. 因此,该构造是等效于以下语法:Therefore, this construct is equivalent to the following:

(?(?= expression ) yes | no )(?(?= expression ) yes | no )

其中 (?=expression ) 是宽度为零的断言构造。where (?=expression) is a zero-width assertion construct. (有关详细信息,请参阅分组构造。)由于正则表达式引擎将 expression 解释为定位点(零宽断言),因此 expression 必须是零宽断言(有关详细信息,请参阅定位标记),或者是也包含在 yes 中的子表达式。(For more information, see Grouping Constructs.) Because the regular expression engine interprets expression as an anchor (a zero-width assertion), expression must either be a zero-width assertion (for more information, see Anchors) or a subexpression that is also contained in yes. 否则,无法匹配 yes 模式。Otherwise, the yes pattern cannot be matched.

备注

如果 expression 是命名捕获组或带编号的捕获组,则备用构造将被解释为捕获测试;有关详细信息,请参阅下一部分基于有效捕获组的条件匹配If expression is a named or numbered capturing group, the alternation construct is interpreted as a capture test; for more information, see the next section, Conditional Matching Based on a Valid Capture Group. 换而言之,正则表达式引擎不会尝试匹配捕获的子字符串,而是测试该组是否存在。In other words, the regular expression engine does not attempt to match the captured substring, but instead tests for the presence or absence of the group.

下面的示例是利用 | 的 Either/Or 模式匹配一节中的示例变体。The following example is a variation of the example that appears in the Either/Or Pattern Matching with | section. 它使用条件匹配来确定单词边界之后的前三个字符是否是后接一个连字符的两个数字。It uses conditional matching to determine whether the first three characters after a word boundary are two digits followed by a hyphen. 如果是,则将尝试匹配美国雇主标识号 (EIN)。If they are, it attempts to match a U.S. Employer Identification Number (EIN). 如果不是,则将尝试匹配美国社会保障号 (SSN)。If not, it attempts to match a U.S. Social Security Number (SSN).

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string pattern = @"\b(?(\d{2}-)\d{2}-\d{7}|\d{3}-\d{2}-\d{4})\b";
      string input = "01-9999999 020-333333 777-88-9999";
      Console.WriteLine("Matches for {0}:", pattern);
      foreach (Match match in Regex.Matches(input, pattern))
         Console.WriteLine("   {0} at position {1}", match.Value, match.Index);
   }
}
// The example displays the following output:
//       Matches for \b(\d{2}-\d{7}|\d{3}-\d{2}-\d{4})\b:
//          01-9999999 at position 0
//          777-88-9999 at position 22
Imports System.Text.RegularExpressions

Module Example
    Public Sub Main()
        Dim pattern As String = "\b(?(\d{2}-)\d{2}-\d{7}|\d{3}-\d{2}-\d{4})\b"
        Dim input As String = "01-9999999 020-333333 777-88-9999"
        Console.WriteLine("Matches for {0}:", pattern)
        For Each match As Match In Regex.Matches(input, pattern)
            Console.WriteLine("   {0} at position {1}", match.Value, match.Index)
        Next
    End Sub
End Module
' The example displays the following output:
'       Matches for \b(?(\d{2}-)\d{2}-\d{7}|\d{3}-\d{2}-\d{4})\b:
'          01-9999999 at position 0
'          777-88-9999 at position 22

正则表达式模式 \b(?(\d{2}-)\d{2}-\d{7}|\d{3}-\d{2}-\d{4})\b 的释义如下表所示:The regular expression pattern \b(?(\d{2}-)\d{2}-\d{7}|\d{3}-\d{2}-\d{4})\b is interpreted as shown in the following table:

模式Pattern 描述Description
\b 在单词边界处开始。Start at a word boundary.
(?(\d{2}-) 确定接下来的三个字符是否由两个数字后接一个连字符组成。Determine whether the next three characters consist of two digits followed by a hyphen.
\d{2}-\d{7} 如果前面的模式匹配,则匹配后接一个连字符和七个数字的两个数字。If the previous pattern matches, match two digits followed by a hyphen followed by seven digits.
\d{3}-\d{2}-\d{4} 如果前面的模式不匹配,则匹配三个十进制数字,后接一个连字符,再接两个十进制数字,再接另一个连字符,再接四个十进制数字。If the previous pattern does not match, match three decimal digits, a hyphen, two decimal digits, another hyphen, and four decimal digits.
\b 与字边界匹配。Match a word boundary.

基于有效的捕获组的条件匹配Conditional matching based on a valid captured group

此语言元素尝试根据是否已经匹配指定的捕获组来匹配两种模式之一。This language element attempts to match one of two patterns depending on whether it has matched a specified capturing group. 语法为:Its syntax is:

(?( name ) yes | no )(?( name ) yes | no )

oror

(?( number ) yes | no )(?( number ) yes | no )

其中, name 是捕获组的名称, number 是捕获组的编号; yes 是当 namenumber 具有匹配项时要匹配的表达式; no 是当不具有匹配项时要匹配的可选表达式。where name is the name and number is the number of a capturing group, yes is the expression to match if name or number has a match, and no is the optional expression to match if it does not.

如果 name 与正则表达式模式中所用捕获组的名称不对应,则替换构造将解释为表达式测试,如上一节中所述。If name does not correspond to the name of a capturing group that is used in the regular expression pattern, the alternation construct is interpreted as an expression test, as explained in the previous section. 通常,这意味着 expression 的计算结果为 falseTypically, this means that expression evaluates to false. 如果 number 与正则表达式模式中所用带编号的捕获组不对应,则正则表达式引擎将引发 ArgumentExceptionIf number does not correspond to a numbered capturing group that is used in the regular expression pattern, the regular expression engine throws an ArgumentException.

下面的示例是利用 | 的 Either/Or 模式匹配一节中的示例变体。The following example is a variation of the example that appears in the Either/Or Pattern Matching with | section. 它使用一个名为 n2 的捕获组,其中包含两个数字,后接一个连字符。It uses a capturing group named n2 that consists of two digits followed by a hyphen. 替换构造测试此捕获组是否在输入字符串中找到匹配项。The alternation construct tests whether this capturing group has been matched in the input string. 如果有匹配项,则替换构造会尝试匹配九位数的美国雇主标识号 (EIN)。If it has, the alternation construct attempts to match the last seven digits of a nine-digit U.S. Employer Identification Number (EIN). 如果没有匹配项,则将尝试匹配九位数的美国社会保障号 (SSN)。If it has not, it attempts to match a nine-digit U.S. Social Security Number (SSN).

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string pattern = @"\b(?<n2>\d{2}-)?(?(n2)\d{7}|\d{3}-\d{2}-\d{4})\b";
      string input = "01-9999999 020-333333 777-88-9999";
      Console.WriteLine("Matches for {0}:", pattern);
      foreach (Match match in Regex.Matches(input, pattern))
         Console.WriteLine("   {0} at position {1}", match.Value, match.Index);
   }
}
// The example displays the following output:
//       Matches for \b(?<n2>\d{2}-)?(?(n2)\d{7}|\d{3}-\d{2}-\d{4})\b:
//          01-9999999 at position 0
//          777-88-9999 at position 22
Imports System.Text.RegularExpressions

Module Example
    Public Sub Main()
        Dim pattern As String = "\b(?<n2>\d{2}-)?(?(n2)\d{7}|\d{3}-\d{2}-\d{4})\b"
        Dim input As String = "01-9999999 020-333333 777-88-9999"
        Console.WriteLine("Matches for {0}:", pattern)
        For Each match As Match In Regex.Matches(input, pattern)
            Console.WriteLine("   {0} at position {1}", match.Value, match.Index)
        Next
    End Sub
End Module

正则表达式模式 \b(?<n2>\d{2}-)?(?(n2)\d{7}|\d{3}-\d{2}-\d{4})\b 的释义如下表所示:The regular expression pattern \b(?<n2>\d{2}-)?(?(n2)\d{7}|\d{3}-\d{2}-\d{4})\b is interpreted as shown in the following table:

模式Pattern 描述Description
\b 在单词边界处开始。Start at a word boundary.
(?<n2>\d{2}-)? 匹配两个数字后接一个连字符的零或一个匹配项。Match zero or one occurrence of two digits followed by a hyphen. 命名此捕获组 n2Name this capturing group n2.
(?(n2) 测试输入字符串中是否有 n2 的匹配项。Test whether n2 was matched in the input string.
\d{7} 如果找到 n2 的匹配项,则匹配 7 个十进制数字。If n2 was matched, match seven decimal digits.
|\d{3}-\d{2}-\d{4} 如果未找到 n2 的匹配项,则匹配 3 个十进制数字,后接一个连字符,再接 2 个十进制数字,再接另一个连字符,再接 4 个十进制数字。If n2 was not matched, match three decimal digits, a hyphen, two decimal digits, another hyphen, and four decimal digits.
\b 与字边界匹配。Match a word boundary.

下面示例中显示此示例变体使用编号组而非命名组。A variation of this example that uses a numbered group instead of a named group is shown in the following example. 正则表达式模式为 \b(\d{2}-)?(?(1)\d{7}|\d{3}-\d{2}-\d{4})\bIts regular expression pattern is \b(\d{2}-)?(?(1)\d{7}|\d{3}-\d{2}-\d{4})\b.

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string pattern = @"\b(\d{2}-)?(?(1)\d{7}|\d{3}-\d{2}-\d{4})\b";
      string input = "01-9999999 020-333333 777-88-9999";
      Console.WriteLine("Matches for {0}:", pattern);
      foreach (Match match in Regex.Matches(input, pattern))
         Console.WriteLine("   {0} at position {1}", match.Value, match.Index);
   }
}
// The example display the following output:
//       Matches for \b(\d{2}-)?(?(1)\d{7}|\d{3}-\d{2}-\d{4})\b:
//          01-9999999 at position 0
//          777-88-9999 at position 22
Imports System.Text.RegularExpressions

Module Example
    Public Sub Main()
        Dim pattern As String = "\b(\d{2}-)?(?(1)\d{7}|\d{3}-\d{2}-\d{4})\b"
        Dim input As String = "01-9999999 020-333333 777-88-9999"
        Console.WriteLine("Matches for {0}:", pattern)
        For Each match As Match In Regex.Matches(input, pattern)
            Console.WriteLine("   {0} at position {1}", match.Value, match.Index)
        Next
    End Sub
End Module
' The example displays the following output:
'       Matches for \b(\d{2}-)?(?(1)\d{7}|\d{3}-\d{2}-\d{4})\b:
'          01-9999999 at position 0
'          777-88-9999 at position 22

请参阅See also