正则表达式选项Regular Expression Options

默认情况下,正则表达式模式中带有任意文本字符的输入字符串比较区分大小写,正则表达式模式中的空白将被解释为文本空白字符且正则表达式中的捕获组通过隐式和显式命名。By default, the comparison of an input string with any literal characters in a regular expression pattern is case sensitive, white space in a regular expression pattern is interpreted as literal white-space characters, and capturing groups in a regular expression are named implicitly as well as explicitly. 可通过指定正则表达式选项修改默认正则表达式行为的这些和其他数个方面。You can modify these and several other aspects of default regular expression behavior by specifying regular expression options. 列于下表的这些选项,可将内联作为正则表达式的一部分包含,或者可将它们作为 System.Text.RegularExpressions.Regex 枚举值提供给 System.Text.RegularExpressions.RegexOptions 类构造函数或静态模式匹配方法。These options, which are listed in the following table, can be included inline as part of the regular expression pattern, or they can be supplied to a System.Text.RegularExpressions.Regex class constructor or static pattern matching method as a System.Text.RegularExpressions.RegexOptions enumeration value.

RegexOptions 成员RegexOptions member 内联字符Inline character 效果Effect
None 不可用Not available 使用默认行为。Use default behavior. 有关更多信息,请参见默认选项For more information, see Default Options.
IgnoreCase i 使用不区分大小写的匹配。Use case-insensitive matching. 有关更多信息,请参见不区分大小写的匹配For more information, see Case-Insensitive Matching.
Multiline m 使用多线模式,其中 ^$ 匹配每行的开头和末尾(不是输入字符串的开头和末尾)。Use multiline mode, where ^ and $ match the beginning and end of each line (instead of the beginning and end of the input string). 有关更多信息,请参见多行模式For more information, see Multiline Mode.
Singleline s 使用单行模式,其中的句号 (.) 匹配每个字符(而不是除了 \n 以外的每个字符)。Use single-line mode, where the period (.) matches every character (instead of every character except \n). 有关详细信息,请参阅单行模式For more information, see Single-line Mode.
ExplicitCapture n 不捕获未命名的组。Do not capture unnamed groups. 唯一有效的捕获是显式命名或编号的 (?<name> subexpression) 形式的组 。The only valid captures are explicitly named or numbered groups of the form (?<name> subexpression). 有关更多信息,请参见仅显式捕获For more information, see Explicit Captures Only.
Compiled 不可用Not available 将正则表达式编译为程序集。Compile the regular expression to an assembly. 有关更多信息,请参见已编译的正则表达式For more information, see Compiled Regular Expressions.
IgnorePatternWhitespace x 从模式中排除保留的空白并启用数字符号 (#) 后的注释。Exclude unescaped white space from the pattern, and enable comments after a number sign (#). 有关更多信息,请参见忽略空白For more information, see Ignore White Space.
RightToLeft 不可用Not available 更改搜索方向。Change the search direction. 搜索是从右向左而不是从左向右进行。Search moves from right to left instead of from left to right. 有关更多信息,请参见从右向左模式For more information, see Right-to-Left Mode.
ECMAScript 不可用Not available 为表达式启用符合 ECMAScript 的行为。Enable ECMAScript-compliant behavior for the expression. 有关更多信息,请参见 ECMAScript 匹配行为For more information, see ECMAScript Matching Behavior.
CultureInvariant 不可用Not available 忽略语言的区域性差异。Ignore cultural differences in language. 有关更多信息,请参见使用固定区域性的比较For more information, see Comparison Using the Invariant Culture.

指定选项Specifying the Options

可以用下面三种方法之一指定正则表达式的选项:You can specify options for regular expressions in one of three ways:

  • options 类构造函数或静态(在 Visual Basic 中为 System.Text.RegularExpressions.Regex)模式匹配方法的 Shared 参数中,如 Regex(String, RegexOptions)Regex.Match(String, String, RegexOptions)In the options parameter of a System.Text.RegularExpressions.Regex class constructor or static (Shared in Visual Basic) pattern-matching method, such as Regex(String, RegexOptions) or Regex.Match(String, String, RegexOptions). options 参数是 System.Text.RegularExpressions.RegexOptions 枚举值的按位“或”组合。The options parameter is a bitwise OR combination of System.Text.RegularExpressions.RegexOptions enumerated values.

    当通过使用类构造函数的 options 参数,将选项提供给 Regex 实例时,这些选项将分配给 System.Text.RegularExpressions.RegexOptions 属性。When options are supplied to a Regex instance by using the options parameter of a class constructor, the options are assigned to the System.Text.RegularExpressions.RegexOptions property. 然而,System.Text.RegularExpressions.RegexOptions 属性不会在正则表达式模式本身中反映内联选项。However, the System.Text.RegularExpressions.RegexOptions property does not reflect inline options in the regular expression pattern itself.

    下面的示例进行了这方面的演示。The following example provides an illustration. 在标识以字母“d”开头的单词时,它使用 options 方法的 Regex.Match(String, String, RegexOptions) 参数来启用不区分大小写匹配和忽略模式空白。It uses the options parameter of the Regex.Match(String, String, RegexOptions) method to enable case-insensitive matching and to ignore pattern white space when identifying words that begin with the letter "d".

    string pattern = @"d \w+ \s";
    string input = "Dogs are decidedly good pets.";
    RegexOptions options = RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace;
    
    foreach (Match match in Regex.Matches(input, pattern, options))
       Console.WriteLine("'{0}// found at index {1}.", match.Value, match.Index);
    // The example displays the following output:
    //    'Dogs // found at index 0.
    //    'decidedly // found at index 9.
    
    Dim pattern As String = "d \w+ \s"
    Dim input As String = "Dogs are decidedly good pets."
    Dim options As RegexOptions = RegexOptions.IgnoreCase Or RegexOptions.IgnorePatternWhitespace
    
    For Each match As Match In Regex.Matches(input, pattern, options)
        Console.WriteLine("'{0}' found at index {1}.", match.Value, match.Index)
    Next
    ' The example displays the following output:
    '    'Dogs ' found at index 0.
    '    'decidedly ' found at index 9.      
    
  • 通过在包含语法 (?imnsx-imnsx) 的正则表达式模式中应用内联选项。By applying inline options in a regular expression pattern with the syntax (?imnsx-imnsx). 该选项从选项定义为模式末尾的点应用于该模式,或应用于另一内联选项未定义选项的点。The option applies to the pattern from the point that the option is defined to either the end of the pattern or to the point at which the option is undefined by another inline option. 请注意,Regex 实例的 System.Text.RegularExpressions.RegexOptions 属性不会反映这些内联选项。Note that the System.Text.RegularExpressions.RegexOptions property of a Regex instance does not reflect these inline options. 有关详细信息,请参阅其他构造主题。For more information, see the Miscellaneous Constructs topic.

    下面的示例进行了这方面的演示。The following example provides an illustration. 在标识以字母“d”开头的单词时,它使用内联选项来启用不区分大小写匹配和忽略模式空白。It uses inline options to enable case-insensitive matching and to ignore pattern white space when identifying words that begin with the letter "d".

    string pattern = @"(?ix) d \w+ \s";
    string input = "Dogs are decidedly good pets.";
    
    foreach (Match match in Regex.Matches(input, pattern))
       Console.WriteLine("'{0}// found at index {1}.", match.Value, match.Index);
    // The example displays the following output:
    //    'Dogs // found at index 0.
    //    'decidedly // found at index 9.
    
    Dim pattern As String = "\b(?ix) d \w+ \s"
    Dim input As String = "Dogs are decidedly good pets."
    
    For Each match As Match In Regex.Matches(input, pattern)
        Console.WriteLine("'{0}' found at index {1}.", match.Value, match.Index)
    Next
    ' The example displays the following output:
    '    'Dogs ' found at index 0.
    '    'decidedly ' found at index 9.      
    
  • 通过在包含语法 (?imnsx-imnsx:subexpression) 的正则表达式模式的特定分组构造中,应用内联选项。By applying inline options in a particular grouping construct in a regular expression pattern with the syntax (?imnsx-imnsx:subexpression). 一组选项前面没有符号用于打开该设置;一组选项前面的减号用于关闭该设置。No sign before a set of options turns the set on; a minus sign before a set of options turns the set off. (无论选项是启用还是禁用,? 都是所需的语言构造语法的固定部分。)选项只应用于该组。(? is a fixed part of the language construct's syntax that is required whether options are enabled or disabled.) The option applies only to that group. 有关详细信息,请参阅分组构造For more information, see Grouping Constructs.

    下面的示例进行了这方面的演示。The following example provides an illustration. 在标识以字母“d”开头的单词时,它使用分组构造中的内联选项来启用不区分大小写匹配和忽略模式空白。It uses inline options in a grouping construct to enable case-insensitive matching and to ignore pattern white space when identifying words that begin with the letter "d".

    string pattern = @"\b(?ix: d \w+)\s";
    string input = "Dogs are decidedly good pets.";
    
    foreach (Match match in Regex.Matches(input, pattern))
       Console.WriteLine("'{0}// found at index {1}.", match.Value, match.Index);
    // The example displays the following output:
    //    'Dogs // found at index 0.
    //    'decidedly // found at index 9.
    
    Dim pattern As String = "\b(?ix: d \w+)\s"
    Dim input As String = "Dogs are decidedly good pets."
    
    For Each match As Match In Regex.Matches(input, pattern)
        Console.WriteLine("'{0}' found at index {1}.", match.Value, match.Index)
    Next
    ' The example displays the following output:
    '    'Dogs ' found at index 0.
    '    'decidedly ' found at index 9.      
    

如果选项指定为内联,一个选项或一组选项前面的减号 (-) 用于关闭这些选项。If options are specified inline, a minus sign (-) before an option or set of options turns off those options. 例如,内联构造 (?ix-ms) 将打开 RegexOptions.IgnoreCaseRegexOptions.IgnorePatternWhitespace 选项而关闭 RegexOptions.MultilineRegexOptions.Singleline 选项。For example, the inline construct (?ix-ms) turns on the RegexOptions.IgnoreCase and RegexOptions.IgnorePatternWhitespace options and turns off the RegexOptions.Multiline and RegexOptions.Singleline options. 默认情况下,关闭所有正则表达式选项。All regular expression options are turned off by default.

备注

如果构造函数或方法调用的 options 形参中指定的正则表达式选项与正则表达式模式中的内联指定的选项冲突,那么将使用该内联选项。If the regular expression options specified in the options parameter of a constructor or method call conflict with the options specified inline in a regular expression pattern, the inline options are used.

可为下面的五个正则表达式选项同时设置选项形参和内联:The following five regular expression options can be set both with the options parameter and inline:

可为下面的五个正则表达式选项设置使用 options 形参,但不能为其设置内联:The following five regular expression options can be set using the options parameter but cannot be set inline:

确定选项Determining the Options

可以确定向 Regex 对象提供哪些选项,在通过检索只读 Regex.Options 属性的值将其实例化时。You can determine which options were provided to a Regex object when it was instantiated by retrieving the value of the read-only Regex.Options property. 该属性尤其可用于确定为编译的正则表达式定义的选项,该正则表达式由 Regex.CompileToAssembly 方法创建。This property is particularly useful for determining the options that are defined for a compiled regular expression created by the Regex.CompileToAssembly method.

要测试除 RegexOptions.None 之外的任何选项的存在,使用 Regex.Options 属性的值和需要的 RegexOptions 值执行 AND 运算。To test for the presence of any option except RegexOptions.None, perform an AND operation with the value of the Regex.Options property and the RegexOptions value in which you are interested. 然后测试结果是否等于该 RegexOptions 值。Then test whether the result equals that RegexOptions value. 下面的示例测试是否设置了 RegexOptions.IgnoreCase 选项。The following example tests whether the RegexOptions.IgnoreCase option has been set.

if ((rgx.Options & RegexOptions.IgnoreCase) == RegexOptions.IgnoreCase)
   Console.WriteLine("Case-insensitive pattern comparison.");
else
   Console.WriteLine("Case-sensitive pattern comparison.");
If (rgx.Options And RegexOptions.IgnoreCase) = RegexOptions.IgnoreCase Then
    Console.WriteLine("Case-insensitive pattern comparison.")
Else
    Console.WriteLine("Case-sensitive pattern comparison.")
End If

要测试 RegexOptions.None,确定 Regex.Options 属性的值是否等于 RegexOptions.None,如以下示例所示。To test for RegexOptions.None, determine whether the value of the Regex.Options property is equal to RegexOptions.None, as the following example illustrates.

if (rgx.Options == RegexOptions.None)
   Console.WriteLine("No options have been set.");
If rgx.Options = RegexOptions.None Then
    Console.WriteLine("No options have been set.")
End If

下面各部分列出了 .NET 正则表达式支持的选项。The following sections list the options supported by regular expression in .NET.

默认选项Default Options

RegexOptions.None 选项指示尚未指定任何选项,正则表达式引擎使用其默认行为。The RegexOptions.None option indicates that no options have been specified, and the regular expression engine uses its default behavior. 这包括:This includes the following:

  • 该模式将被解释为一个规范而非 ECMAScript 正则表达式。The pattern is interpreted as a canonical rather than an ECMAScript regular expression.

  • 从左到右在输入字符串中匹配的正则表达式模式。The regular expression pattern is matched in the input string from left to right.

  • 比较区分大小写。Comparisons are case-sensitive.

  • ^$ 语言元素与输入字符串的开头和结尾匹配。The ^ and $ language elements match the beginning and end of the input string.

  • . 语言元素与除 \n 之外的每个字符匹配。The . language element matches every character except \n.

  • 正则表达式模式中的任意空白均解释为文本空白字符。Any white space in a regular expression pattern is interpreted as a literal space character.

  • 将模式与输入字符串进行比较时将使用当前区域性的约定。The conventions of the current culture are used when comparing the pattern to the input string.

  • 正则表达式模式中的捕获组可以是隐式的,也可以是显式的。Capturing groups in the regular expression pattern are implicit as well as explicit.

备注

RegexOptions.None 选项没有内联等效项。The RegexOptions.None option has no inline equivalent. 当内联应用正则表达式选项时,默认行为通过关闭特定选项以逐个选项方式存储。When regular expression options are applied inline, the default behavior is restored on an option-by-option basis, by turning a particular option off. 例如, (?i) 打开不区分大小写的比较,(?-i) 还原默认区分大小写的比较。For example, (?i) turns on case-insensitive comparison, and (?-i) restores the default case-sensitive comparison.

因为 RegexOptions.None 选项表示正则表达式引擎的默认行为,因此它很少显式地在方法调用中指定。Because the RegexOptions.None option represents the default behavior of the regular expression engine, it is rarely explicitly specified in a method call. 而改为调用构造函数或静态模式匹配的方法,其中不包含 options 参数。A constructor or static pattern-matching method without an options parameter is called instead.

不区分大小写的匹配Case-Insensitive Matching

IgnoreCase 选项或 i 内联选项提供了不区分大小写匹配。The IgnoreCase option, or the i inline option, provides case-insensitive matching. 默认情况下,使用当前区域性的大小写约定。By default, the casing conventions of the current culture are used.

下面的示例定义与以“the”开头的所有单词匹配的正则表达式模式 \bthe\w*\bThe following example defines a regular expression pattern, \bthe\w*\b, that matches all words starting with "the". 因为对 Match 方法的第一次调用使用默认区分大小写的比较,因此输出会指示以字符串“The”开头的句子不匹配。Because the first call to the Match method uses the default case-sensitive comparison, the output indicates that the string "The" that begins the sentence is not matched. 通过将选项设置为 Match,调用 IgnoreCase 方法时对其进行匹配。It is matched when the Match method is called with options set to IgnoreCase.

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string pattern = @"\bthe\w*\b";
      string input = "The man then told them about that event.";
      foreach (Match match in Regex.Matches(input, pattern))
         Console.WriteLine("Found {0} at index {1}.", match.Value, match.Index);

      Console.WriteLine();
      foreach (Match match in Regex.Matches(input, pattern,
                                            RegexOptions.IgnoreCase))
         Console.WriteLine("Found {0} at index {1}.", match.Value, match.Index);
   }
}
// The example displays the following output:
//       Found then at index 8.
//       Found them at index 18.
//
//       Found The at index 0.
//       Found then at index 8.
//       Found them at index 18.
Imports System.Text.RegularExpressions

Module Example
    Public Sub Main()
        Dim pattern As String = "\bthe\w*\b"
        Dim input As String = "The man then told them about that event."
        For Each match As Match In Regex.Matches(input, pattern)
            Console.WriteLine("Found {0} at index {1}.", match.Value, match.Index)
        Next
        Console.WriteLine()
        For Each match As Match In Regex.Matches(input, pattern, _
                                                 RegexOptions.IgnoreCase)
            Console.WriteLine("Found {0} at index {1}.", match.Value, match.Index)
        Next
    End Sub
End Module
' The example displays the following output:
'       Found then at index 8.
'       Found them at index 18.
'       
'       Found The at index 0.
'       Found then at index 8.
'       Found them at index 18.

下面的示例修改了上一示例中的正则表达式模式,以使用内联选项而不是 options 参数来提供不区分大小写的比较。The following example modifies the regular expression pattern from the previous example to use inline options instead of the options parameter to provide case-insensitive comparison. 第一个模式定义只应用于字符串“the”中的字母“t”的分组构造中的不区分大小写的选项。The first pattern defines the case-insensitive option in a grouping construct that applies only to the letter "t" in the string "the". 因为选项构造在模式的开始处出现,所以第二个模式将不区分大小写的选项应用于整个正则表达式。Because the option construct occurs at the beginning of the pattern, the second pattern applies the case-insensitive option to the entire regular expression.

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string pattern = @"\b(?i:t)he\w*\b";
      string input = "The man then told them about that event.";
      foreach (Match match in Regex.Matches(input, pattern))
         Console.WriteLine("Found {0} at index {1}.", match.Value, match.Index);

      Console.WriteLine();
      pattern = @"(?i)\bthe\w*\b";
      foreach (Match match in Regex.Matches(input, pattern,
                                            RegexOptions.IgnoreCase))
         Console.WriteLine("Found {0} at index {1}.", match.Value, match.Index);
   }
}
// The example displays the following output:
//       Found The at index 0.
//       Found then at index 8.
//       Found them at index 18.
//
//       Found The at index 0.
//       Found then at index 8.
//       Found them at index 18.
Imports System.Text.RegularExpressions

Module Example
    Public Sub Main()
        Dim pattern As String = "\b(?i:t)he\w*\b"
        Dim input As String = "The man then told them about that event."
        For Each match As Match In Regex.Matches(input, pattern)
            Console.WriteLine("Found {0} at index {1}.", match.Value, match.Index)
        Next
        Console.WriteLine()
        pattern = "(?i)\bthe\w*\b"
        For Each match As Match In Regex.Matches(input, pattern)
            Console.WriteLine("Found {0} at index {1}.", match.Value, match.Index)
        Next
    End Sub
End Module
' The example displays the following output:
'       Found The at index 0.
'       Found then at index 8.
'       Found them at index 18.
'       
'       Found The at index 0.
'       Found then at index 8.
'       Found them at index 18.

多行模式Multiline Mode

RegexOptions.Multiline 选项或 m 内联选项使正则表达式引擎能够处理由多个行组成的输入字符串。The RegexOptions.Multiline option, or the m inline option, enables the regular expression engine to handle an input string that consists of multiple lines. 它更改了 ^$ 语言元素的解释,以使它们分别与行的开头和结尾匹配,而不是与输入字符串的开头和结尾匹配。It changes the interpretation of the ^ and $ language elements so that they match the beginning and end of a line, instead of the beginning and end of the input string.

默认情况下,$ 仅与输入字符串的末尾匹配。By default, $ matches only the end of the input string. 如果指定了 RegexOptions.Multiline 选项,它将与换行符 (\n) 或输入字符串的末尾匹配。If you specify the RegexOptions.Multiline option, it matches either the newline character (\n) or the end of the input string. 但是,它并不与回车符/换行符的组合匹配。It does not, however, match the carriage return/line feed character combination. 若要成功匹配它们,使用子表达式 \r?$ 只替代 $To successfully match them, use the subexpression \r?$ instead of just $.

下面的示例提取投手的姓名和分数,并将它们添加到 SortedList<TKey,TValue> 集合中,该集合将按降序顺序对它们进行排序。The following example extracts bowlers' names and scores and adds them to a SortedList<TKey,TValue> collection that sorts them in descending order. 调用了两次 Matches 方法。The Matches method is called twice. 在第一个方法调用中,正则表达式是 ^(\w+)\s(\d+)$,且没有设置任何选项。In the first method call, the regular expression is ^(\w+)\s(\d+)$ and no options are set. 如输出所示,因为正则表达式引擎与输入模式及输入字符串的开头和结尾均不匹配,因此没有找到匹配。As the output shows, because the regular expression engine cannot match the input pattern along with the beginning and end of the input string, no matches are found. 在第二个方法调用中,正则表达式更改为 ^(\w+)\s(\d+)\r?$,选项设置为 RegexOptions.MultilineIn the second method call, the regular expression is changed to ^(\w+)\s(\d+)\r?$ and the options are set to RegexOptions.Multiline. 如输出所示,姓名和分数成功匹配,且分数按降序顺序显示。As the output shows, the names and scores are successfully matched, and the scores are displayed in descending order.

using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      SortedList<int, string> scores = new SortedList<int, string>(new DescendingComparer<int>());

      string input = "Joe 164\n" +
                     "Sam 208\n" +
                     "Allison 211\n" +
                     "Gwen 171\n";
      string pattern = @"^(\w+)\s(\d+)$";
      bool matched = false;

      Console.WriteLine("Without Multiline option:");
      foreach (Match match in Regex.Matches(input, pattern))
      {
         scores.Add(Int32.Parse(match.Groups[2].Value), (string) match.Groups[1].Value);
         matched = true;
      }
      if (! matched)
         Console.WriteLine("   No matches.");
      Console.WriteLine();

      // Redefine pattern to handle multiple lines.
      pattern = @"^(\w+)\s(\d+)\r*$";
      Console.WriteLine("With multiline option:");
      foreach (Match match in Regex.Matches(input, pattern, RegexOptions.Multiline))
         scores.Add(Int32.Parse(match.Groups[2].Value), (string) match.Groups[1].Value);

      // List scores in descending order.
      foreach (KeyValuePair<int, string> score in scores)
         Console.WriteLine("{0}: {1}", score.Value, score.Key);
   }
}

public class DescendingComparer<T> : IComparer<T>
{
   public int Compare(T x, T y)
   {
      return Comparer<T>.Default.Compare(x, y) * -1;
   }
}
// The example displays the following output:
//   Without Multiline option:
//      No matches.
//
//   With multiline option:
//   Allison: 211
//   Sam: 208
//   Gwen: 171
//   Joe: 164
Imports System.Collections.Generic
Imports System.Text.RegularExpressions

Module Example
    Public Sub Main()
        Dim scores As New SortedList(Of Integer, String)(New DescendingComparer(Of Integer)())

        Dim input As String = "Joe 164" + vbCrLf + _
                              "Sam 208" + vbCrLf + _
                              "Allison 211" + vbCrLf + _
                              "Gwen 171" + vbCrLf
        Dim pattern As String = "^(\w+)\s(\d+)$"
        Dim matched As Boolean = False

        Console.WriteLine("Without Multiline option:")
        For Each match As Match In Regex.Matches(input, pattern)
            scores.Add(CInt(match.Groups(2).Value), match.Groups(1).Value)
            matched = True
        Next
        If Not matched Then Console.WriteLine("   No matches.")
        Console.WriteLine()

        ' Redefine pattern to handle multiple lines.
        pattern = "^(\w+)\s(\d+)\r*$"
        Console.WriteLine("With multiline option:")
        For Each match As Match In Regex.Matches(input, pattern, RegexOptions.Multiline)
            scores.Add(CInt(match.Groups(2).Value), match.Groups(1).Value)
        Next
        ' List scores in descending order. 
        For Each score As KeyValuePair(Of Integer, String) In scores
            Console.WriteLine("{0}: {1}", score.Value, score.Key)
        Next
    End Sub
End Module

Public Class DescendingComparer(Of T) : Implements IComparer(Of T)
    Public Function Compare(x As T, y As T) As Integer _
           Implements IComparer(Of T).Compare
        Return Comparer(Of T).Default.Compare(x, y) * -1
    End Function
End Class
' The example displays the following output:
'    Without Multiline option:
'       No matches.
'    
'    With multiline option:
'    Allison: 211
'    Sam: 208
'    Gwen: 171
'    Joe: 164

正则表达式模式 ^(\w+)\s(\d+)\r*$ 的定义如下表所示。The regular expression pattern ^(\w+)\s(\d+)\r*$ is defined as shown in the following table.

模式Pattern 描述Description
^ 从行首开始。Begin at the start of the line.
(\w+) 匹配一个或多个单词字符。Match one or more word characters. 这是第一个捕获组。This is the first capturing group.
\s 与空白字符匹配。Match a white-space character.
(\d+) 匹配一个或多个十进制数字。Match one or more decimal digits. 这是第二个捕获组。This is the second capturing group.
\r? 与零个或一个回车符匹配。Match zero or one carriage return character.
$ 在行尾结束。End at the end of the line.

下面的示例与上一示例等效,不同之处是下面的示例使用内联选项 (?m) 来设置多行选项。The following example is equivalent to the previous one, except that it uses the inline option (?m) to set the multiline option.

using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      SortedList<int, string> scores = new SortedList<int, string>(new DescendingComparer<int>());

      string input = "Joe 164\n" +
                     "Sam 208\n" +
                     "Allison 211\n" +
                     "Gwen 171\n";
      string pattern = @"(?m)^(\w+)\s(\d+)\r*$";

      foreach (Match match in Regex.Matches(input, pattern, RegexOptions.Multiline))
         scores.Add(Convert.ToInt32(match.Groups[2].Value), match.Groups[1].Value);

      // List scores in descending order.
      foreach (KeyValuePair<int, string> score in scores)
         Console.WriteLine("{0}: {1}", score.Value, score.Key);
   }
}

public class DescendingComparer<T> : IComparer<T>
{
   public int Compare(T x, T y)
   {
      return Comparer<T>.Default.Compare(x, y) * -1;
   }
}
// The example displays the following output:
//    Allison: 211
//    Sam: 208
//    Gwen: 171
//    Joe: 164
Imports System.Collections.Generic
Imports System.Text.RegularExpressions

Module Example
    Public Sub Main()
        Dim scores As New SortedList(Of Integer, String)(New DescendingComparer(Of Integer)())

        Dim input As String = "Joe 164" + vbCrLf + _
                              "Sam 208" + vbCrLf + _
                              "Allison 211" + vbCrLf + _
                              "Gwen 171" + vbCrLf
        Dim pattern As String = "(?m)^(\w+)\s(\d+)\r*$"

        For Each match As Match In Regex.Matches(input, pattern, RegexOptions.Multiline)
            scores.Add(CInt(match.Groups(2).Value), match.Groups(1).Value)
        Next
        ' List scores in descending order. 
        For Each score As KeyValuePair(Of Integer, String) In scores
            Console.WriteLine("{0}: {1}", score.Value, score.Key)
        Next
    End Sub
End Module

Public Class DescendingComparer(Of T) : Implements IComparer(Of T)
    Public Function Compare(x As T, y As T) As Integer _
           Implements IComparer(Of T).Compare
        Return Comparer(Of T).Default.Compare(x, y) * -1
    End Function
End Class
' The example displays the following output:
'    Allison: 211
'    Sam: 208
'    Gwen: 171
'    Joe: 164

单行模式Single-line Mode

RegexOptions.Singleline 选项或 s 内联选项导致正则表达式引擎将输入字符串视为由单行组成。The RegexOptions.Singleline option, or the s inline option, causes the regular expression engine to treat the input string as if it consists of a single line. 它通过更改句号 (.) 语言元素的行为,使其与每个字符匹配,而不是与除换行符 \n 或 \u000A 之外的每个字符匹配来执行此操作。It does this by changing the behavior of the period (.) language element so that it matches every character, instead of matching every character except for the newline character \n or \u000A.

下面的示例演示了在使用 . 选项时如何更改 RegexOptions.Singleline 语言元素的行为。The following example illustrates how the behavior of the . language element changes when you use the RegexOptions.Singleline option. 正则表达式 ^.+ 在字符串开头开始并匹配每个字符。The regular expression ^.+ starts at the beginning of the string and matches every character. 默认情况下,匹配在第一行的结尾结束;正则表达式模式匹配回车符、\r 或 \u000D,但不匹配 \nBy default, the match ends at the end of the first line; the regular expression pattern matches the carriage return character, \r or \u000D, but it does not match \n. 由于 RegexOptions.Singleline 选项将整个输入字符串解释为单行,因此它匹配输入字符串中的每个字符,包括 \nBecause the RegexOptions.Singleline option interprets the entire input string as a single line, it matches every character in the input string, including \n.

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string pattern = "^.+";
      string input = "This is one line and" + Environment.NewLine + "this is the second.";
      foreach (Match match in Regex.Matches(input, pattern))
         Console.WriteLine(Regex.Escape(match.Value));

      Console.WriteLine();
      foreach (Match match in Regex.Matches(input, pattern, RegexOptions.Singleline))
         Console.WriteLine(Regex.Escape(match.Value));
   }
}
// The example displays the following output:
//       This\ is\ one\ line\ and\r
//
//       This\ is\ one\ line\ and\r\nthis\ is\ the\ second\.
Imports System.Text.RegularExpressions

Module Example
    Public Sub Main()
        Dim pattern As String = "^.+"
        Dim input As String = "This is one line and" + vbCrLf + "this is the second."
        For Each match As Match In Regex.Matches(input, pattern)
            Console.WriteLine(Regex.Escape(match.Value))
        Next
        Console.WriteLine()
        For Each match As Match In Regex.Matches(input, pattern, RegexOptions.SingleLine)
            Console.WriteLine(Regex.Escape(match.Value))
        Next
    End Sub
End Module
' The example displays the following output:
'       This\ is\ one\ line\ and\r
'       
'       This\ is\ one\ line\ and\r\nthis\ is\ the\ second\.

下面的示例与上一示例等效,不同之处是下面的示例使用内联选项 (?s) 来启用单行模式。The following example is equivalent to the previous one, except that it uses the inline option (?s) to enable single-line mode.

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string pattern = "(?s)^.+";
      string input = "This is one line and" + Environment.NewLine + "this is the second.";

      foreach (Match match in Regex.Matches(input, pattern))
         Console.WriteLine(Regex.Escape(match.Value));
   }
}
// The example displays the following output:
//       This\ is\ one\ line\ and\r\nthis\ is\ the\ second\.
Imports System.Text.RegularExpressions

Module Example
    Public Sub Main()
        Dim pattern As String = "(?s)^.+"
        Dim input As String = "This is one line and" + vbCrLf + "this is the second."

        For Each match As Match In Regex.Matches(input, pattern)
            Console.WriteLine(Regex.Escape(match.Value))
        Next
    End Sub
End Module
' The example displays the following output:
'       This\ is\ one\ line\ and\r\nthis\ is\ the\ second\.

仅显式捕获Explicit Captures Only

默认情况下,通过在正则表达式模式中使用括号来定义捕获组。By default, capturing groups are defined by the use of parentheses in the regular expression pattern. 通过 (?<name>subexpression) 语言选项为命名组指定名称或编号,而未命名组按索引进行访问。Named groups are assigned a name or number by the (?<name>subexpression) language option, whereas unnamed groups are accessible by index. GroupCollection 对象中,未命名的组先于已命名的组。In the GroupCollection object, unnamed groups precede named groups.

分组构造通常仅用于将限定符应用于多个语言元素,而非应用于捕获的子字符串。Grouping constructs are often used only to apply quantifiers to multiple language elements, and the captured substrings are of no interest. 例如,如果下面的正则表达式:For example, if the following regular expression:

\b\(?((\w+),?\s?)+[\.!?]\)?

旨在仅从文档提取末尾有句号、感叹点或问号的句子,仅产生的句子(这由 Match 对象表示)有意义。is intended only to extract sentences that end with a period, exclamation point, or question mark from a document, only the resulting sentence (which is represented by the Match object) is of interest. 集合中的各单词不是。The individual words in the collection are not.

随后未使用的捕获组可能很昂贵,因为正则表达式引擎必须填充 GroupCollectionCaptureCollection 集合对象。Capturing groups that are not subsequently used can be expensive, because the regular expression engine must populate both the GroupCollection and CaptureCollection collection objects. 作为替换方法,也可以使用 RegexOptions.ExplicitCapture 选项或 n 内联选项,指定显式命名的唯一有效捕获,或由 (?<名称> 子表达式) 构造指定的编号组 。As an alternative, you can use either the RegexOptions.ExplicitCapture option or the n inline option to specify that the only valid captures are explicitly named or numbered groups that are designated by the (?<name> subexpression) construct.

以下示例显示 \b\(?((\w+),?\s?)+[\.!?]\)? 正则表达式模式在 Match 方法被调用且没有 RegexOptions.ExplicitCapture 选项时返回的匹配信息。The following example displays information about the matches returned by the \b\(?((\w+),?\s?)+[\.!?]\)? regular expression pattern when the Match method is called with and without the RegexOptions.ExplicitCapture option. 如第一个方法调用输出所示,正则表达式引擎使用有关已捕获的子字符串的信息完全填充 GroupCollectionCaptureCollection 集合对象。As the output from the first method call shows, the regular expression engine fully populates the GroupCollection and CaptureCollection collection objects with information about captured substrings. 因为第二个方法使用设置为 optionsRegexOptions.ExplicitCapture 进行调用,所以它不会捕获有关组的信息。Because the second method is called with options set to RegexOptions.ExplicitCapture, it does not capture information on groups.

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string input = "This is the first sentence. Is it the beginning " +
                     "of a literary masterpiece? I think not. Instead, " +
                     "it is a nonsensical paragraph.";
      string pattern = @"\b\(?((?>\w+),?\s?)+[\.!?]\)?";
      Console.WriteLine("With implicit captures:");
      foreach (Match match in Regex.Matches(input, pattern))
      {
         Console.WriteLine("The match: {0}", match.Value);
         int groupCtr = 0;
         foreach (Group group in match.Groups)
         {
            Console.WriteLine("   Group {0}: {1}", groupCtr, group.Value);
            groupCtr++;
            int captureCtr = 0;
            foreach (Capture capture in group.Captures)
            {
               Console.WriteLine("      Capture {0}: {1}", captureCtr, capture.Value);
               captureCtr++;
            }
         }
      }
      Console.WriteLine();
      Console.WriteLine("With explicit captures only:");
      foreach (Match match in Regex.Matches(input, pattern, RegexOptions.ExplicitCapture))
      {
         Console.WriteLine("The match: {0}", match.Value);
         int groupCtr = 0;
         foreach (Group group in match.Groups)
         {
            Console.WriteLine("   Group {0}: {1}", groupCtr, group.Value);
            groupCtr++;
            int captureCtr = 0;
            foreach (Capture capture in group.Captures)
            {
               Console.WriteLine("      Capture {0}: {1}", captureCtr, capture.Value);
               captureCtr++;
            }
         }
      }
   }
}
// The example displays the following output:
//    With implicit captures:
//    The match: This is the first sentence.
//       Group 0: This is the first sentence.
//          Capture 0: This is the first sentence.
//       Group 1: sentence
//          Capture 0: This
//          Capture 1: is
//          Capture 2: the
//          Capture 3: first
//          Capture 4: sentence
//       Group 2: sentence
//          Capture 0: This
//          Capture 1: is
//          Capture 2: the
//          Capture 3: first
//          Capture 4: sentence
//    The match: Is it the beginning of a literary masterpiece?
//       Group 0: Is it the beginning of a literary masterpiece?
//          Capture 0: Is it the beginning of a literary masterpiece?
//       Group 1: masterpiece
//          Capture 0: Is
//          Capture 1: it
//          Capture 2: the
//          Capture 3: beginning
//          Capture 4: of
//          Capture 5: a
//          Capture 6: literary
//          Capture 7: masterpiece
//       Group 2: masterpiece
//          Capture 0: Is
//          Capture 1: it
//          Capture 2: the
//          Capture 3: beginning
//          Capture 4: of
//          Capture 5: a
//          Capture 6: literary
//          Capture 7: masterpiece
//    The match: I think not.
//       Group 0: I think not.
//          Capture 0: I think not.
//       Group 1: not
//          Capture 0: I
//          Capture 1: think
//          Capture 2: not
//       Group 2: not
//          Capture 0: I
//          Capture 1: think
//          Capture 2: not
//    The match: Instead, it is a nonsensical paragraph.
//       Group 0: Instead, it is a nonsensical paragraph.
//          Capture 0: Instead, it is a nonsensical paragraph.
//       Group 1: paragraph
//          Capture 0: Instead,
//          Capture 1: it
//          Capture 2: is
//          Capture 3: a
//          Capture 4: nonsensical
//          Capture 5: paragraph
//       Group 2: paragraph
//          Capture 0: Instead
//          Capture 1: it
//          Capture 2: is
//          Capture 3: a
//          Capture 4: nonsensical
//          Capture 5: paragraph
//
//    With explicit captures only:
//    The match: This is the first sentence.
//       Group 0: This is the first sentence.
//          Capture 0: This is the first sentence.
//    The match: Is it the beginning of a literary masterpiece?
//       Group 0: Is it the beginning of a literary masterpiece?
//          Capture 0: Is it the beginning of a literary masterpiece?
//    The match: I think not.
//       Group 0: I think not.
//          Capture 0: I think not.
//    The match: Instead, it is a nonsensical paragraph.
//       Group 0: Instead, it is a nonsensical paragraph.
//          Capture 0: Instead, it is a nonsensical paragraph.
Imports System.Text.RegularExpressions

Module Example
    Public Sub Main()
        Dim input As String = "This is the first sentence. Is it the beginning " + _
                              "of a literary masterpiece? I think not. Instead, " + _
                              "it is a nonsensical paragraph."
        Dim pattern As String = "\b\(?((?>\w+),?\s?)+[\.!?]\)?"
        Console.WriteLine("With implicit captures:")
        For Each match As Match In Regex.Matches(input, pattern)
            Console.WriteLine("The match: {0}", match.Value)
            Dim groupCtr As Integer = 0
            For Each group As Group In match.Groups
                Console.WriteLine("   Group {0}: {1}", groupCtr, group.Value)
                groupCtr += 1
                Dim captureCtr As Integer = 0
                For Each capture As Capture In group.Captures
                    Console.WriteLine("      Capture {0}: {1}", captureCtr, capture.Value)
                    captureCtr += 1
                Next
            Next
        Next
        Console.WriteLine()
        Console.WriteLine("With explicit captures only:")
        For Each match As Match In Regex.Matches(input, pattern, RegexOptions.ExplicitCapture)
            Console.WriteLine("The match: {0}", match.Value)
            Dim groupCtr As Integer = 0
            For Each group As Group In match.Groups
                Console.WriteLine("   Group {0}: {1}", groupCtr, group.Value)
                groupCtr += 1
                Dim captureCtr As Integer = 0
                For Each capture As Capture In group.Captures
                    Console.WriteLine("      Capture {0}: {1}", captureCtr, capture.Value)
                    captureCtr += 1
                Next
            Next
        Next
    End Sub
End Module
' The example displays the following output:
'    With implicit captures:
'    The match: This is the first sentence.
'       Group 0: This is the first sentence.
'          Capture 0: This is the first sentence.
'       Group 1: sentence
'          Capture 0: This
'          Capture 1: is
'          Capture 2: the
'          Capture 3: first
'          Capture 4: sentence
'       Group 2: sentence
'          Capture 0: This
'          Capture 1: is
'          Capture 2: the
'          Capture 3: first
'          Capture 4: sentence
'    The match: Is it the beginning of a literary masterpiece?
'       Group 0: Is it the beginning of a literary masterpiece?
'          Capture 0: Is it the beginning of a literary masterpiece?
'       Group 1: masterpiece
'          Capture 0: Is
'          Capture 1: it
'          Capture 2: the
'          Capture 3: beginning
'          Capture 4: of
'          Capture 5: a
'          Capture 6: literary
'          Capture 7: masterpiece
'       Group 2: masterpiece
'          Capture 0: Is
'          Capture 1: it
'          Capture 2: the
'          Capture 3: beginning
'          Capture 4: of
'          Capture 5: a
'          Capture 6: literary
'          Capture 7: masterpiece
'    The match: I think not.
'       Group 0: I think not.
'          Capture 0: I think not.
'       Group 1: not
'          Capture 0: I
'          Capture 1: think
'          Capture 2: not
'       Group 2: not
'          Capture 0: I
'          Capture 1: think
'          Capture 2: not
'    The match: Instead, it is a nonsensical paragraph.
'       Group 0: Instead, it is a nonsensical paragraph.
'          Capture 0: Instead, it is a nonsensical paragraph.
'       Group 1: paragraph
'          Capture 0: Instead,
'          Capture 1: it
'          Capture 2: is
'          Capture 3: a
'          Capture 4: nonsensical
'          Capture 5: paragraph
'       Group 2: paragraph
'          Capture 0: Instead
'          Capture 1: it
'          Capture 2: is
'          Capture 3: a
'          Capture 4: nonsensical
'          Capture 5: paragraph
'    
'    With explicit captures only:
'    The match: This is the first sentence.
'       Group 0: This is the first sentence.
'          Capture 0: This is the first sentence.
'    The match: Is it the beginning of a literary masterpiece?
'       Group 0: Is it the beginning of a literary masterpiece?
'          Capture 0: Is it the beginning of a literary masterpiece?
'    The match: I think not.
'       Group 0: I think not.
'          Capture 0: I think not.
'    The match: Instead, it is a nonsensical paragraph.
'       Group 0: Instead, it is a nonsensical paragraph.
'          Capture 0: Instead, it is a nonsensical paragraph.

正则表达式模式 \b\(?((?>\w+),?\s?)+[\.!?]\)? 的定义如下表所示。The regular expression pattern\b\(?((?>\w+),?\s?)+[\.!?]\)? is defined as shown in the following table.

模式Pattern 描述Description
\b 在单词边界处开始。Begin at a word boundary.
\(? 匹配左括号(“(”)的零或一个匹配项。Match zero or one occurrences of the opening parenthesis ("(").
(?>\w+),? 匹配一个或多个单词字符,后跟零或一个逗号。Match one or more word characters, followed by zero or one commas. 当匹配单词字符请不要回溯。Do not backtrack when matching word characters.
\s? 匹配零个或一个空白字符。Match zero or one white-space characters.
((\w+),?\s?)+ 一次或多次匹配一个或多个单词字符、零或一个逗号以及零或一个空白字符的组合。Match the combination of one or more word characters, zero or one commas, and zero or one white-space characters one or more times.
[\.!?]\)? 与后无右括号或后跟一个右括号(“)”)的三个标点符号匹配。Match any of the three punctuation symbols, followed by zero or one closing parentheses (")").

还可以使用 (?n) 内联元素来禁止自动捕获。You can also use the (?n) inline element to suppress automatic captures. 以下示例修改了上一示例中的正则表达式模式,使用的是内联元素 (?n) 而非 RegexOptions.ExplicitCapture 选项。The following example modifies the previous regular expression pattern to use the (?n) inline element instead of the RegexOptions.ExplicitCapture option.

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string input = "This is the first sentence. Is it the beginning " +
                     "of a literary masterpiece? I think not. Instead, " +
                     "it is a nonsensical paragraph.";
      string pattern = @"(?n)\b\(?((?>\w+),?\s?)+[\.!?]\)?";

      foreach (Match match in Regex.Matches(input, pattern))
      {
         Console.WriteLine("The match: {0}", match.Value);
         int groupCtr = 0;
         foreach (Group group in match.Groups)
         {
            Console.WriteLine("   Group {0}: {1}", groupCtr, group.Value);
            groupCtr++;
            int captureCtr = 0;
            foreach (Capture capture in group.Captures)
            {
               Console.WriteLine("      Capture {0}: {1}", captureCtr, capture.Value);
               captureCtr++;
            }
         }
      }
   }
}
// The example displays the following output:
//       The match: This is the first sentence.
//          Group 0: This is the first sentence.
//             Capture 0: This is the first sentence.
//       The match: Is it the beginning of a literary masterpiece?
//          Group 0: Is it the beginning of a literary masterpiece?
//             Capture 0: Is it the beginning of a literary masterpiece?
//       The match: I think not.
//          Group 0: I think not.
//             Capture 0: I think not.
//       The match: Instead, it is a nonsensical paragraph.
//          Group 0: Instead, it is a nonsensical paragraph.
//             Capture 0: Instead, it is a nonsensical paragraph.
Imports System.Text.RegularExpressions

Module Example
    Public Sub Main()
        Dim input As String = "This is the first sentence. Is it the beginning " + _
                              "of a literary masterpiece? I think not. Instead, " + _
                              "it is a nonsensical paragraph."
        Dim pattern As String = "(?n)\b\(?((?>\w+),?\s?)+[\.!?]\)?"

        For Each match As Match In Regex.Matches(input, pattern)
            Console.WriteLine("The match: {0}", match.Value)
            Dim groupCtr As Integer = 0
            For Each group As Group In match.Groups
                Console.WriteLine("   Group {0}: {1}", groupCtr, group.Value)
                groupCtr += 1
                Dim captureCtr As Integer = 0
                For Each capture As Capture In group.Captures
                    Console.WriteLine("      Capture {0}: {1}", captureCtr, capture.Value)
                    captureCtr += 1
                Next
            Next
        Next
    End Sub
End Module
' The example displays the following output:
'       The match: This is the first sentence.
'          Group 0: This is the first sentence.
'             Capture 0: This is the first sentence.
'       The match: Is it the beginning of a literary masterpiece?
'          Group 0: Is it the beginning of a literary masterpiece?
'             Capture 0: Is it the beginning of a literary masterpiece?
'       The match: I think not.
'          Group 0: I think not.
'             Capture 0: I think not.
'       The match: Instead, it is a nonsensical paragraph.
'          Group 0: Instead, it is a nonsensical paragraph.
'             Capture 0: Instead, it is a nonsensical paragraph.

最后,可以使用内联组元素 (?n:) 禁止逐组进行自动捕获。Finally, you can use the inline group element (?n:) to suppress automatic captures on a group-by-group basis. 下面的示例修改了之前的模式,以取消外部组 ((?>\w+),?\s?) 中的非命名捕获。The following example modifies the previous pattern to suppress unnamed captures in the outer group, ((?>\w+),?\s?). 请注意,这也取消了内部组中的非命名捕获。Note that this suppresses unnamed captures in the inner group as well.

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string input = "This is the first sentence. Is it the beginning " +
                     "of a literary masterpiece? I think not. Instead, " +
                     "it is a nonsensical paragraph.";
      string pattern = @"\b\(?(?n:(?>\w+),?\s?)+[\.!?]\)?";

      foreach (Match match in Regex.Matches(input, pattern))
      {
         Console.WriteLine("The match: {0}", match.Value);
         int groupCtr = 0;
         foreach (Group group in match.Groups)
         {
            Console.WriteLine("   Group {0}: {1}", groupCtr, group.Value);
            groupCtr++;
            int captureCtr = 0;
            foreach (Capture capture in group.Captures)
            {
               Console.WriteLine("      Capture {0}: {1}", captureCtr, capture.Value);
               captureCtr++;
            }
         }
      }
   }
}
// The example displays the following output:
//       The match: This is the first sentence.
//          Group 0: This is the first sentence.
//             Capture 0: This is the first sentence.
//       The match: Is it the beginning of a literary masterpiece?
//          Group 0: Is it the beginning of a literary masterpiece?
//             Capture 0: Is it the beginning of a literary masterpiece?
//       The match: I think not.
//          Group 0: I think not.
//             Capture 0: I think not.
//       The match: Instead, it is a nonsensical paragraph.
//          Group 0: Instead, it is a nonsensical paragraph.
//             Capture 0: Instead, it is a nonsensical paragraph.
Imports System.Text.RegularExpressions

Module Example
    Public Sub Main()
        Dim input As String = "This is the first sentence. Is it the beginning " + _
                              "of a literary masterpiece? I think not. Instead, " + _
                              "it is a nonsensical paragraph."
        Dim pattern As String = "\b\(?(?n:(?>\w+),?\s?)+[\.!?]\)?"

        For Each match As Match In Regex.Matches(input, pattern)
            Console.WriteLine("The match: {0}", match.Value)
            Dim groupCtr As Integer = 0
            For Each group As Group In match.Groups
                Console.WriteLine("   Group {0}: {1}", groupCtr, group.Value)
                groupCtr += 1
                Dim captureCtr As Integer = 0
                For Each capture As Capture In group.Captures
                    Console.WriteLine("      Capture {0}: {1}", captureCtr, capture.Value)
                    captureCtr += 1
                Next
            Next
        Next
    End Sub
End Module
' The example displays the following output:
'       The match: This is the first sentence.
'          Group 0: This is the first sentence.
'             Capture 0: This is the first sentence.
'       The match: Is it the beginning of a literary masterpiece?
'          Group 0: Is it the beginning of a literary masterpiece?
'             Capture 0: Is it the beginning of a literary masterpiece?
'       The match: I think not.
'          Group 0: I think not.
'             Capture 0: I think not.
'       The match: Instead, it is a nonsensical paragraph.
'          Group 0: Instead, it is a nonsensical paragraph.
'             Capture 0: Instead, it is a nonsensical paragraph.

已编译的正则表达式Compiled Regular Expressions

默认情况下,.NET 中的正则表达式会有解释。By default, regular expressions in .NET are interpreted. 当实例化 Regex 对象或者调用静态 Regex 方法时,将把正则表达式模式解析为一组自定义操作代码,并且解释器使用这些操作代码来运行正则表达式。When a Regex object is instantiated or a static Regex method is called, the regular expression pattern is parsed into a set of custom opcodes, and an interpreter uses these opcodes to run the regular expression. 这涉及一个权衡:初始化正则表达式引擎的成本通过运行时性能的消耗而最小化。This involves a tradeoff: The cost of initializing the regular expression engine is minimized at the expense of run-time performance.

通过使用 RegexOptions.Compiled 选项可以使用编译的而非解释的正则表达式。You can use compiled instead of interpreted regular expressions by using the RegexOptions.Compiled option. 在此情况下,当模式传递给正则表达式引擎时,它将分析为一组操作码,然后转换为 Microsoft 中间语言 (MSIL),该语言可以被直接传递到公共语言运行时。In this case, when a pattern is passed to the regular expression engine, it is parsed into a set of opcodes and then converted to Microsoft intermediate language (MSIL), which can be passed directly to the common language runtime. 已编译的正则表达式最大限度地提高运行时性能,代价是会影响初始化时间。Compiled regular expressions maximize run-time performance at the expense of initialization time.

备注

仅可以通过将 RegexOptions.Compiled 值提供给 options 类构造函数或静态模式匹配方法的 Regex 参数来编译正则表达式。A regular expression can be compiled only by supplying the RegexOptions.Compiled value to the options parameter of a Regex class constructor or a static pattern-matching method. 它不可作为内联选项使用。It is not available as an inline option.

在调用静态和实例正则表达式时,可使用编译的正则表达式。You can use compiled regular expressions in calls to both static and instance regular expressions. 在静态正则表达式中,RegexOptions.Compiled 选项将传递到正则表达式模式匹配方法的 options 参数。In static regular expressions, the RegexOptions.Compiled option is passed to the options parameter of the regular expression pattern-matching method. 在实例正则表达式中,将它传递到 options 类构造函数的 Regex 参数。In instance regular expressions, it is passed to the options parameter of the Regex class constructor. 在这两种情况中它将导致性能增强。In both cases, it results in enhanced performance.

但是,这种性能改进只有在以下情况下才发生:However, this improvement in performance occurs only under the following conditions:

  • 表示特定正则表达式的 Regex 对象可用于多个正则表达式模式匹配方法调用。A Regex object that represents a particular regular expression is used in multiple calls to regular expression pattern-matching methods.

  • 不允许 Regex 对象超出范围,以便可以重用它。The Regex object is not allowed to go out of scope, so it can be reused.

  • 静态正则表达式在对正则表达式模式匹配方法的多个调用中使用。A static regular expression is used in multiple calls to regular expression pattern-matching methods. (之所以能够提高性能,是因为静态方法调用中使用的正则表达式由正则表达式引擎缓存。)(The performance improvement is possible because regular expressions used in static method calls are cached by the regular expression engine.)

备注

RegexOptions.Compiled 选项与 Regex.CompileToAssembly 方法无关,该方法创建一个特殊用途的程序集,其中包含预定义的已编译的正则表达式。The RegexOptions.Compiled option is unrelated to the Regex.CompileToAssembly method, which creates a special-purpose assembly that contains predefined compiled regular expressions.

忽略空白Ignore White Space

默认情况下,正则表达式模式中的空白非常重要;它会强制正则表达式引擎与输入字符串中的空白字符相匹配。By default, white space in a regular expression pattern is significant; it forces the regular expression engine to match a white-space character in the input string. 因此,正则表达式“\b\w+\s”和“\b\w+”是大致等效的正则表达式。Because of this, the regular expression "\b\w+\s" and "\b\w+ " are roughly equivalent regular expressions. 此外,正则表达式模式中出现数字符号 (#) 时,它被解释为要进行匹配的原义字符。In addition, when the number sign (#) is encountered in a regular expression pattern, it is interpreted as a literal character to be matched.

RegexOptions.IgnorePatternWhitespace 选项或 x 内联选项更改此默认行为,如下所示:The RegexOptions.IgnorePatternWhitespace option, or the x inline option, changes this default behavior as follows:

  • 正则表达式模式中的非转义的空白将被忽略。Unescaped white space in the regular expression pattern is ignored. 作为正则表达式模式的部分,必须避开空白字符(例如 \s 或“\”)。To be part of a regular expression pattern, white-space characters must be escaped (for example, as \s or "\ ").

  • 数字符号 (#) 被解释为注释的开头,而不是原义字符。The number sign (#) is interpreted as the beginning of a comment, rather than as a literal character. 正则表达式模式中的所有文本,从 # 字符到字符串的结尾都解释为注释。All text in the regular expression pattern from the # character to the end of the string is interpreted as a comment.

但是,在下列情况下,不会忽略正则表达式中的空白字符,即使使用 RegexOptions.IgnorePatternWhitespace 选项也是如此:However, in the following cases, white-space characters in a regular expression aren't ignored, even if you use the RegexOptions.IgnorePatternWhitespace option:

  • 始终按原义解释字符内的空格。White space within a character class is always interpreted literally. 例如,正则表达式模式 [ .,;:] 匹配任意单个空白字符、句号、逗号、分号或冒号。For example, the regular expression pattern [ .,;:] matches any single white-space character, period, comma, semicolon, or colon.

  • 加括号的限定符内不允许有空格,如 {n}{n,}{n,m}White space isn't allowed within a bracketed quantifier, such as {n}, {n,}, and {n,m}. 例如,因为它包含一个空白字符,所以正则表达式模式 \d{1, 3} 与任何从 1 到 3 位数的数字序列不匹配。For example, the regular expression pattern \d{1, 3} fails to match any sequences of digits from one to three digits because it contains a white-space character.

  • 引入语言元素的字符序列内不允许有空格。White space isn't allowed within a character sequence that introduces a language element. 例如:For example:

    • 语言元素 (?:subexpression) 表示非捕获组,并且该元素的 (?: 部分不能有嵌入空格。The language element (?:subexpression) represents a noncapturing group, and the (?: portion of the element can't have embedded spaces. 模式 (? :子表达式) 在运行时抛出 ArgumentException,因为正则表达式引擎无法分析此模式,且模式 ( ?:子表达式) 与子表达式不匹配。The pattern (? :subexpression) throws an ArgumentException at run time because the regular expression engine can't parse the pattern, and the pattern ( ?:subexpression) fails to match subexpression.

    • 语言元素 \p{name} 表示一个 Unicode 类别或命名块,它不能在元素的 \p{ 部分中包括嵌入空格。The language element \p{name}, which represents a Unicode category or named block, can't include embedded spaces in the \p{ portion of the element. 如果你包括了空格,则该元素会在运行时引发 ArgumentException 异常。If you do include a white space, the element throws an ArgumentException at run time.

启用此选项有助于简化通常很难分析和理解的正则表达式。Enabling this option helps simplify regular expressions that are often difficult to parse and to understand. 它提高了可读性,并可以记录正则表达式。It improves readability, and makes it possible to document a regular expression.

下面的示例定义以下正则表达式模式:The following example defines the following regular expression pattern:

\b \(? ( (?>\w+) ,?\s? )+ [\.!?] \)? # Matches an entire sentence.

此模式与仅显式捕获部分中定义的模式相似,不同之处在于它使用 RegexOptions.IgnorePatternWhitespace 选项忽略模式空格。This pattern is similar to the pattern defined in the Explicit Captures Only section, except that it uses the RegexOptions.IgnorePatternWhitespace option to ignore pattern white space.

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string input = "This is the first sentence. Is it the beginning " +
                     "of a literary masterpiece? I think not. Instead, " +
                     "it is a nonsensical paragraph.";
      string pattern = @"\b \(? ( (?>\w+) ,?\s? )+ [\.!?] \)? # Matches an entire sentence.";

      foreach (Match match in Regex.Matches(input, pattern, RegexOptions.IgnorePatternWhitespace))
         Console.WriteLine(match.Value);
   }
}
// The example displays the following output:
//       This is the first sentence.
//       Is it the beginning of a literary masterpiece?
//       I think not.
//       Instead, it is a nonsensical paragraph.
Imports System.Text.RegularExpressions

Module Example
    Public Sub Main()
        Dim input As String = "This is the first sentence. Is it the beginning " + _
                              "of a literary masterpiece? I think not. Instead, " + _
                              "it is a nonsensical paragraph."
        Dim pattern As String = "\b \(? ( (?>\w+) ,?\s? )+  [\.!?] \)? # Matches an entire sentence."

        For Each match As Match In Regex.Matches(input, pattern, RegexOptions.IgnorePatternWhitespace)
            Console.WriteLine(match.Value)
        Next
    End Sub
End Module
' The example displays the following output:
'       This is the first sentence.
'       Is it the beginning of a literary masterpiece?
'       I think not.
'       Instead, it is a nonsensical paragraph.

下面的示例使用内联选项 (?x) 来忽略模式空白。The following example uses the inline option (?x) to ignore pattern white space.

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string input = "This is the first sentence. Is it the beginning " +
                     "of a literary masterpiece? I think not. Instead, " +
                     "it is a nonsensical paragraph.";
      string pattern = @"(?x)\b \(? ( (?>\w+) ,?\s? )+  [\.!?] \)? # Matches an entire sentence.";

      foreach (Match match in Regex.Matches(input, pattern))
         Console.WriteLine(match.Value);
   }
}
// The example displays the following output:
//       This is the first sentence.
//       Is it the beginning of a literary masterpiece?
//       I think not.
//       Instead, it is a nonsensical paragraph.
Imports System.Text.RegularExpressions

Module Example
    Public Sub Main()
        Dim input As String = "This is the first sentence. Is it the beginning " + _
                              "of a literary masterpiece? I think not. Instead, " + _
                              "it is a nonsensical paragraph."
        Dim pattern As String = "(?x)\b \(? ( (?>\w+) ,?\s? )+  [\.!?] \)? # Matches an entire sentence."

        For Each match As Match In Regex.Matches(input, pattern)
            Console.WriteLine(match.Value)
        Next
    End Sub
End Module
' The example displays the following output:
'       This is the first sentence.
'       Is it the beginning of a literary masterpiece?
'       I think not.
'       Instead, it is a nonsensical paragraph.

从右到左模式Right-to-Left Mode

默认情况下,正则表达式引擎从左向右进行搜索。By default, the regular expression engine searches from left to right. 可通过使用 RegexOptions.RightToLeft 选项反转搜索方向。You can reverse the search direction by using the RegexOptions.RightToLeft option. 搜索在字符串的最后一个字符位置自动开始。The search automatically begins at the last character position of the string. 对于包括起始位置参数的模式匹配方法,例如 Regex.Match(String, Int32),起始位置是最右边字符位置(即搜索开始位置)的索引。For pattern-matching methods that include a starting position parameter, such as Regex.Match(String, Int32), the starting position is the index of the rightmost character position at which the search is to begin.

备注

仅能通过将 RegexOptions.RightToLeft 值提供给 options 类构造函数或静态模式匹配方法的 Regex 参数来提供从右到左模式。Right-to-left pattern mode is available only by supplying the RegexOptions.RightToLeft value to the options parameter of a Regex class constructor or static pattern-matching method. 它不可作为内联选项使用。It is not available as an inline option.

RegexOptions.RightToLeft 选项仅更改搜索方向;它不解释正则表达式模式是从右到左。The RegexOptions.RightToLeft option changes the search direction only; it does not interpret the regular expression pattern from right to left. 例如,正则表达式 \bb\w+\s 匹配以字母“b”开头的单词,且后跟一个空白字符。For example, the regular expression \bb\w+\s matches words that begin with the letter "b" and are followed by a white-space character. 在下面的示例中,输入字符串由其中包括一个或多个“b”字符的三个单词组成。In the following example, the input string consists of three words that include one or more "b" characters. 第一个单词以“b”开头,第二个单词以“b”结尾,第三个单词的中间包括两个“b”字符。The first word begins with "b", the second ends with "b", and the third includes two "b" characters in the middle of the word. 如示例输出所示,只有第一个词与正则表达式模式匹配。As the output from the example shows, only the first word matches the regular expression pattern.

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string pattern = @"\bb\w+\s";
      string input = "builder rob rabble";
      foreach (Match match in Regex.Matches(input, pattern, RegexOptions.RightToLeft))
         Console.WriteLine("'{0}' found at position {1}.", match.Value, match.Index);
   }
}
// The example displays the following output:
//       'builder ' found at position 0.
Imports System.Text.RegularExpressions

Module Example
    Public Sub Main()
        Dim pattern As String = "\bb\w+\s"
        Dim input As String = "builder rob rabble"
        For Each match As Match In Regex.Matches(input, pattern, RegexOptions.RightToLeft)
            Console.WriteLine("'{0}' found at position {1}.", match.Value, match.Index)
        Next
    End Sub
End Module
' The example displays the following output:
'       'builder ' found at position 0.

另请注意,预测先行断言((?=subexpression) 语言元素)和回顾后发断言((?<=subexpression) 语言元素)不会更改方向。Also note that the lookahead assertion (the (?=subexpression) language element) and the lookbehind assertion (the (?<=subexpression) language element) do not change direction. 预测先行断言向右搜索;回顾后发断言向左搜索。The lookahead assertions look to the right; the lookbehind assertions look to the left. 例如,正则表达式 (?<=\d{1,2}\s)\w+,?\s\d{4} 使用回顾后发断言测试月份名称前面的日期。For example, the regular expression (?<=\d{1,2}\s)\w+,?\s\d{4} uses the lookbehind assertion to test for a date that precedes a month name. 然后该正则表达式匹配月份和年份。The regular expression then matches the month and the year. 有关预测先行和回顾后发断言的信息,请参阅分组构造For information on lookahead and lookbehind assertions, see Grouping Constructs.

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string[] inputs = { "1 May 1917", "June 16, 2003" };
      string pattern = @"(?<=\d{1,2}\s)\w+,?\s\d{4}";

      foreach (string input in inputs)
      {
         Match match = Regex.Match(input, pattern, RegexOptions.RightToLeft);
         if (match.Success)
            Console.WriteLine("The date occurs in {0}.", match.Value);
         else
            Console.WriteLine("{0} does not match.", input);
      }
   }
}
// The example displays the following output:
//       The date occurs in May 1917.
//       June 16, 2003 does not match.
Imports System.Text.RegularExpressions

Module Example
    Public Sub Main()
        Dim inputs() As String = {"1 May 1917", "June 16, 2003"}
        Dim pattern As String = "(?<=\d{1,2}\s)\w+,?\s\d{4}"

        For Each input As String In inputs
            Dim match As Match = Regex.Match(input, pattern, RegexOptions.RightToLeft)
            If match.Success Then
                Console.WriteLine("The date occurs in {0}.", match.Value)
            Else
                Console.WriteLine("{0} does not match.", input)
            End If
        Next
    End Sub
End Module
' The example displays the following output:
'       The date occurs in May 1917.
'       June 16, 2003 does not match.

正则表达式模式的定义如下表所示。The regular expression pattern is defined as shown in the following table.

模式Pattern 描述Description
(?<=\d{1,2}\s) 匹配项的开头必须有后跟一个空格的一个或两个十进制数字。The beginning of the match must be preceded by one or two decimal digits followed by a space.
\w+ 匹配一个或多个单词字符。Match one or more word characters.
,? 匹配零个或一个逗号字符。Match zero or one comma characters.
\s 与空白字符匹配。Match a white-space character.
\d{4} 匹配四个十进制数字。Match four decimal digits.

ECMAScript 匹配行为ECMAScript Matching Behavior

默认情况下,当正则表达式模式与输入文本匹配时,正则表达式引擎会采用规范行为。By default, the regular expression engine uses canonical behavior when matching a regular expression pattern to input text. 但是,可以指示正则表达式引擎通过指定 RegexOptions.ECMAScript 选项使用 ECMAScript 匹配行为。However, you can instruct the regular expression engine to use ECMAScript matching behavior by specifying the RegexOptions.ECMAScript option.

备注

仅在通过将 RegexOptions.ECMAScript 值提供给 options 类构造函数造函数或静态模式匹配方法的 Regex 参数后,符合 ECMAScript 的行为才可用。ECMAScript-compliant behavior is available only by supplying the RegexOptions.ECMAScript value to the options parameter of a Regex class constructor or static pattern-matching method. 它不可作为内联选项使用。It is not available as an inline option.

RegexOptions.ECMAScript 选项只能与 RegexOptions.IgnoreCaseRegexOptions.Multiline 选项结合使用。The RegexOptions.ECMAScript option can be combined only with the RegexOptions.IgnoreCase and RegexOptions.Multiline options. 在正则表达式中使用其他选项会导致 ArgumentOutOfRangeExceptionThe use of any other option in a regular expression results in an ArgumentOutOfRangeException.

ECMAScript 和规范化正则表达式的行为在三个方面不同:字符类语法、自引用捕获组和八进制与反向引用的解释。The behavior of ECMAScript and canonical regular expressions differs in three areas: character class syntax, self-referencing capturing groups, and octal versus backreference interpretation.

  • 字符类语法。Character class syntax. 因为规范的正则表达式支持 Unicode,却不支持 ECMAScript,ECMAScript 中的字符类具有一个受限更多的语法且某些字符类语言元素具有不同的含义。Because canonical regular expressions support Unicode whereas ECMAScript does not, character classes in ECMAScript have a more limited syntax, and some character class language elements have a different meaning. 例如,ECMAScript 不支持语言元素(例如 Unicode 类别或块元素 \p\P)。For example, ECMAScript does not support language elements such as the Unicode category or block elements \p and \P. 同样,使用 ECMAScript 时,与单词字符匹配的 \w 元素等效于 [a-zA-Z_0-9] 字符类,使用规范化行为时,该元素等效于 [\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}\p{Pc}\p{Lm}]Similarly, the \w element, which matches a word character, is equivalent to the [a-zA-Z_0-9] character class when using ECMAScript and [\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}\p{Pc}\p{Lm}] when using canonical behavior. 有关更多信息,请参见 字符类For more information, see Character Classes.

    下面的示例阐释了规范化与 ECMAScript 模式匹配之间的差异。The following example illustrates the difference between canonical and ECMAScript pattern matching. 它定义了正则表达式 \b(\w+\s*)+,该表达式与后跟空白字符的单词匹配。It defines a regular expression, \b(\w+\s*)+, that matches words followed by white-space characters. 由两个字符串组成的输入,其中一个字符串使用拉丁字符集,另一个则使用西里尔字符集。The input consists of two strings, one that uses the Latin character set and the other that uses the Cyrillic character set. 如输出所示,对使用 ECMAScript 匹配的 Regex.IsMatch(String, String, RegexOptions) 方法的调用无法与西里尔文的单词匹配,而使用规范化匹配的方法调用与这些单词匹配。As the output shows, the call to the Regex.IsMatch(String, String, RegexOptions) method that uses ECMAScript matching fails to match the Cyrillic words, whereas the method call that uses canonical matching does match these words.

    using System;
    using System.Text.RegularExpressions;
    
    public class Example
    {
       public static void Main()
       {
          string[] values = { "целый мир", "the whole world" };
          string pattern = @"\b(\w+\s*)+";
          foreach (var value in values)
          {
             Console.Write("Canonical matching: ");
             if (Regex.IsMatch(value, pattern))
                Console.WriteLine("'{0}' matches the pattern.", value);
             else
                Console.WriteLine("{0} does not match the pattern.", value);
    
             Console.Write("ECMAScript matching: ");
             if (Regex.IsMatch(value, pattern, RegexOptions.ECMAScript))
                Console.WriteLine("'{0}' matches the pattern.", value);
             else
                Console.WriteLine("{0} does not match the pattern.", value);
             Console.WriteLine();
          }
       }
    }
    // The example displays the following output:
    //       Canonical matching: 'целый мир' matches the pattern.
    //       ECMAScript matching: целый мир does not match the pattern.
    //
    //       Canonical matching: 'the whole world' matches the pattern.
    //       ECMAScript matching: 'the whole world' matches the pattern.
    
    Imports System.Text.RegularExpressions
    
    Module Example
        Public Sub Main()
            Dim values() As String = {"целый мир", "the whole world"}
            Dim pattern As String = "\b(\w+\s*)+"
            For Each value In values
                Console.Write("Canonical matching: ")
                If Regex.IsMatch(value, pattern)
                    Console.WriteLine("'{0}' matches the pattern.", value)
                Else
                    Console.WriteLine("{0} does not match the pattern.", value)
                End If
    
                Console.Write("ECMAScript matching: ")
                If Regex.IsMatch(value, pattern, RegexOptions.ECMAScript)
                    Console.WriteLine("'{0}' matches the pattern.", value)
                Else
                    Console.WriteLine("{0} does not match the pattern.", value)
                End If
                Console.WriteLine()
            Next
        End Sub
    End Module
    ' The example displays the following output:
    '       Canonical matching: 'целый мир' matches the pattern.
    '       ECMAScript matching: целый мир does not match the pattern.
    '       
    '       Canonical matching: 'the whole world' matches the pattern.
    '       ECMAScript matching: 'the whole world' matches the pattern.
    
  • 自引用捕获组。Self-referencing capturing groups. 自身具有后向引用的正则表达式捕获类必须在每次捕获迭代时得到更新。A regular expression capture class with a backreference to itself must be updated with each capture iteration. 如以下示例所示,此功能将在使用 ECMAScript 时使正则表达式 ((a+)(\1) ?)+ 与输入字符串“aa aaaa aaaaaa”匹配,但在使用规范化匹配时则不会匹配。As the following example shows, this feature enables the regular expression ((a+)(\1) ?)+ to match the input string " aa aaaa aaaaaa " when using ECMAScript, but not when using canonical matching.

    using System;
    using System.Text.RegularExpressions;
    
    public class Example
    {
       static string pattern;
    
       public static void Main()
       {
          string input = "aa aaaa aaaaaa ";
          pattern = @"((a+)(\1) ?)+";
    
          // Match input using canonical matching.
          AnalyzeMatch(Regex.Match(input, pattern));
    
          // Match input using ECMAScript.
          AnalyzeMatch(Regex.Match(input, pattern, RegexOptions.ECMAScript));
       }
    
       private static void AnalyzeMatch(Match m)
       {
          if (m.Success)
          {
             Console.WriteLine("'{0}' matches {1} at position {2}.",
                               pattern, m.Value, m.Index);
             int grpCtr = 0;
             foreach (Group grp in m.Groups)
             {
                Console.WriteLine("   {0}: '{1}'", grpCtr, grp.Value);
                grpCtr++;
                int capCtr = 0;
                foreach (Capture cap in grp.Captures)
                {
                   Console.WriteLine("      {0}: '{1}'", capCtr, cap.Value);
                   capCtr++;
                }
             }
          }
          else
          {
             Console.WriteLine("No match found.");
          }
          Console.WriteLine();
       }
    }
    // The example displays the following output:
    //    No match found.
    //
    //    '((a+)(\1) ?)+' matches aa aaaa aaaaaa  at position 0.
    //       0: 'aa aaaa aaaaaa '
    //          0: 'aa aaaa aaaaaa '
    //       1: 'aaaaaa '
    //          0: 'aa '
    //          1: 'aaaa '
    //          2: 'aaaaaa '
    //       2: 'aa'
    //          0: 'aa'
    //          1: 'aa'
    //          2: 'aa'
    //       3: 'aaaa '
    //          0: ''
    //          1: 'aa '
    //          2: 'aaaa '
    
    Imports System.Text.RegularExpressions
    
    Module Example
        Dim pattern As String
    
        Public Sub Main()
            Dim input As String = "aa aaaa aaaaaa "
            pattern = "((a+)(\1) ?)+"
    
            ' Match input using canonical matching.
            AnalyzeMatch(Regex.Match(input, pattern))
    
            ' Match input using ECMAScript.
            AnalyzeMatch(Regex.Match(input, pattern, RegexOptions.ECMAScript))
        End Sub
    
        Private Sub AnalyzeMatch(m As Match)
            If m.Success
                Console.WriteLine("'{0}' matches {1} at position {2}.", _
                                  pattern, m.Value, m.Index)
                Dim grpCtr As Integer = 0
                For Each grp As Group In m.Groups
                    Console.WriteLine("   {0}: '{1}'", grpCtr, grp.Value)
                    grpCtr += 1
                    Dim capCtr As Integer = 0
                    For Each cap As Capture In grp.Captures
                        Console.WriteLine("      {0}: '{1}'", capCtr, cap.Value)
                        capCtr += 1
                    Next
                Next
            Else
                Console.WriteLine("No match found.")
            End If
            Console.WriteLine()
        End Sub
    End Module
    ' The example displays the following output:
    '    No match found.
    '    
    '    '((a+)(\1) ?)+' matches aa aaaa aaaaaa  at position 0.
    '       0: 'aa aaaa aaaaaa '
    '          0: 'aa aaaa aaaaaa '
    '       1: 'aaaaaa '
    '          0: 'aa '
    '          1: 'aaaa '
    '          2: 'aaaaaa '
    '       2: 'aa'
    '          0: 'aa'
    '          1: 'aa'
    '          2: 'aa'
    '       3: 'aaaa '
    '          0: ''
    '          1: 'aa '
    '          2: 'aaaa '
    

    该正则表达式的定义如下表所示。The regular expression is defined as shown in the following table.

    模式Pattern 描述Description
    (a+)(a+) 与字母“a”匹配一次或多次。Match the letter "a" one or more times. 这是第二个捕获组。This is the second capturing group.
    (\1)(\1) 与第一个捕获组捕获的子字符串匹配。Match the substring captured by the first capturing group. 这是第三个捕获组。This is the third capturing group.
    ?? 匹配零个或一个空白字符。Match zero or one space characters.
    ((a+)(\1) ?)+((a+)(\1) ?)+ 与某个模式匹配一次或多次,该模式有一个或多个“a”字符,后跟与第一个捕获组(后无空白字符或后跟一个空白字符)匹配的字符串。Match the pattern of one or more "a" characters followed by a string that matches the first capturing group followed by zero or one space characters one or more times. 这是第一个捕获组。This is the first capturing group.
  • 八进制转义和反向引用间的多义性的解析。Resolution of ambiguities between octal escapes and backreferences. 下表总结了规范化和 ECMAScript 正则表达式在八进制与后向引用解释中的区别。The following table summarizes the differences in octal versus backreference interpretation by canonical and ECMAScript regular expressions.

    正则表达式Regular expression 规范行为Canonical behavior ECMAScript 行为ECMAScript behavior
    \0 后跟 0 到 2 个八进制数字\0 followed by 0 to 2 octal digits 解释为八进制。Interpret as an octal. 例如,\044 总是解释为八进制值并表示“$”。For example, \044 is always interpreted as an octal value and means "$". 行为相同。Same behavior.
    \ 后跟一个从 1 到 9 的数字,后面再没有任何其他十进制数字,\ followed by a digit from 1 to 9, followed by no additional decimal digits, 解释为反向引用。Interpret as a backreference. 例如,\9 始终表示后向引用 9,即使第九捕获组不存在。For example, \9 always means backreference 9, even if a ninth capturing group does not exist. 如果捕获组不存在,则正则表达式分析器将引发 ArgumentExceptionIf the capturing group does not exist, the regular expression parser throws an ArgumentException. 如果存在单个十进制数字捕获组,则后向引用该数字。If a single decimal digit capturing group exists, backreference to that digit. 否则将该值解释为文本。Otherwise, interpret the value as a literal.
    \ 后跟一个从 1 到 9 的数字,后跟其他十进制数字\ followed by a digit from 1 to 9, followed by additional decimal digits 将数字解释为十进制值。Interpret the digits as a decimal value. 如果存在该捕获组,则将该表达式解释为后向引用。If that capturing group exists, interpret the expression as a backreference.

    否则,将前导的八进制数字解释为不超过八进制值 377 的八进制数字;也就是说,仅考虑该值的后八位。Otherwise, interpret the leading octal digits up to octal 377; that is, consider only the low 8 bits of the value. 将其余数字解释为文本。Interpret the remaining digits as literals. 例如,如果表达式 \3000 中存在捕获组 300,则解释为后向引用 300;如果捕获组 300 不存在,则解释为后跟 0 的八进制数字 300。For example, in the expression \3000, if capturing group 300 exists, interpret as backreference 300; if capturing group 300 does not exist, interpret as octal 300 followed by 0.
    通过将尽可能多的数字转换为可引用捕获的十进制值解释为反向引用。Interpret as a backreference by converting as many digits as possible to a decimal value that can refer to a capture. 如果任何数字都不能转换,则解释为使用其值不超过八进制值 377 的前导八进制数字的八进制数字;将其余数字解释为文本。If no digits can be converted, interpret as an octal by using the leading octal digits up to octal 377; interpret the remaining digits as literals.

使用固定区域性的比较Comparison Using the Invariant Culture

默认情况下,当正则表达式引擎执行不区分大小写的比较时,它使用当前区域性的大小写约定来确定等效的大写和小写字符。By default, when the regular expression engine performs case-insensitive comparisons, it uses the casing conventions of the current culture to determine equivalent uppercase and lowercase characters.

但是,此行为不需要某些类型的比较,尤其是在比较用户输入与系统资源名称时(如密码、文件或 URL)。However, this behavior is undesirable for some types of comparisons, particularly when comparing user input to the names of system resources, such as passwords, files, or URLs. 下面的示例阐释此类方案。The following example illustrates such as scenario. 该代码旨在阻止对 URL 开头为 FILE:// 的所有资源的访问。The code is intended to block access to any resource whose URL is prefaced with FILE://. 正则表达式通过使用正则表达式 $FILE:// 尝试与字符串的不区分大小写的匹配。The regular expression attempts a case-insensitive match with the string by using the regular expression $FILE://. 但是,在当前系统区域性为 tr-TR(土耳其语-土耳其)时,“I”不是“i”的大写等效项。However, when the current system culture is tr-TR (Turkish-Turkey), "I" is not the uppercase equivalent of "i". 因此,对 Regex.IsMatch 方法的调用返回 false,并允许访问该文件。As a result, the call to the Regex.IsMatch method returns false, and access to the file is allowed.

CultureInfo defaultCulture = Thread.CurrentThread.CurrentCulture;
Thread.CurrentThread.CurrentCulture = new CultureInfo("tr-TR");

string input = "file://c:/Documents.MyReport.doc";
string pattern = "FILE://";

Console.WriteLine("Culture-sensitive matching ({0} culture)...",
                  Thread.CurrentThread.CurrentCulture.Name);
if (Regex.IsMatch(input, pattern, RegexOptions.IgnoreCase))
   Console.WriteLine("URLs that access files are not allowed.");
else
   Console.WriteLine("Access to {0} is allowed.", input);

Thread.CurrentThread.CurrentCulture = defaultCulture;
// The example displays the following output:
//       Culture-sensitive matching (tr-TR culture)...
//       Access to file://c:/Documents.MyReport.doc is allowed.
Dim defaultCulture As CultureInfo = Thread.CurrentThread.CurrentCulture
Thread.CurrentThread.CurrentCulture = New CultureInfo("tr-TR")

Dim input As String = "file://c:/Documents.MyReport.doc"
Dim pattern As String = "$FILE://"

Console.WriteLine("Culture-sensitive matching ({0} culture)...", _
                  Thread.CurrentThread.CurrentCulture.Name)
If Regex.IsMatch(input, pattern, RegexOptions.IgnoreCase) Then
    Console.WriteLine("URLs that access files are not allowed.")
Else
    Console.WriteLine("Access to {0} is allowed.", input)
End If

Thread.CurrentThread.CurrentCulture = defaultCulture
' The example displays the following output:
'       Culture-sensitive matching (tr-TR culture)...
'       Access to file://c:/Documents.MyReport.doc is allowed.

备注

有关区分大小写和使用固定区域性的字符串比较的更多信息,请参见针对使用字符串的最佳做法For more information about string comparisons that are case-sensitive and that use the invariant culture, see Best Practices for Using Strings.

不使用当前区域性的不区分大小写比较,可以指定 RegexOptions.CultureInvariant 选项忽略语言的区域性差异,并使用固定区域性的约定。Instead of using the case-insensitive comparisons of the current culture, you can specify the RegexOptions.CultureInvariant option to ignore cultural differences in language and to use the conventions of the invariant culture.

备注

仅能通过将 RegexOptions.CultureInvariant 值提供给 options 类构造函数或静态模式匹配方法的 Regex 参数来提供使用固定区域性的比较。Comparison using the invariant culture is available only by supplying the RegexOptions.CultureInvariant value to the options parameter of a Regex class constructor or static pattern-matching method. 它不可作为内联选项使用。It is not available as an inline option.

下面的示例与上一示例相等,不同之处是下面的示例使用包含 Regex.IsMatch(String, String, RegexOptions) 的选项调用静态 RegexOptions.CultureInvariant 方法。The following example is identical to the previous example, except that the static Regex.IsMatch(String, String, RegexOptions) method is called with options that include RegexOptions.CultureInvariant. 即使设置当前区域性到土耳其语(土耳其),正则表达式引擎仍能够成功匹配“FILE”和“file”并能阻止对文件资源的访问。Even when the current culture is set to Turkish (Turkey), the regular expression engine is able to successfully match "FILE" and "file" and block access to the file resource.

CultureInfo defaultCulture = Thread.CurrentThread.CurrentCulture;
Thread.CurrentThread.CurrentCulture = new CultureInfo("tr-TR");

string input = "file://c:/Documents.MyReport.doc";
string pattern = "FILE://";

Console.WriteLine("Culture-insensitive matching...");
if (Regex.IsMatch(input, pattern,
                  RegexOptions.IgnoreCase | RegexOptions.CultureInvariant))
   Console.WriteLine("URLs that access files are not allowed.");
else
   Console.WriteLine("Access to {0} is allowed.", input);

Thread.CurrentThread.CurrentCulture = defaultCulture;
// The example displays the following output:
//       Culture-insensitive matching...
//       URLs that access files are not allowed.
Dim defaultCulture As CultureInfo = Thread.CurrentThread.CurrentCulture
Thread.CurrentThread.CurrentCulture = New CultureInfo("tr-TR")

Dim input As String = "file://c:/Documents.MyReport.doc"
Dim pattern As String = "$FILE://"

Console.WriteLine("Culture-insensitive matching...")
If Regex.IsMatch(input, pattern, _
               RegexOptions.IgnoreCase Or RegexOptions.CultureInvariant) Then
    Console.WriteLine("URLs that access files are not allowed.")
Else
    Console.WriteLine("Access to {0} is allowed.", input)
End If
Thread.CurrentThread.CurrentCulture = defaultCulture
' The example displays the following output:
'        Culture-insensitive matching...
'        URLs that access files are not allowed.

请参阅See also