.NET 中的正则表达式最佳做法Best practices for regular expressions in .NET

.NET 中的正则表达式引擎是一种功能强大而齐全的工具,它基于模式匹配(而不是比较和匹配文本)来处理文本。The regular expression engine in .NET is a powerful, full-featured tool that processes text based on pattern matches rather than on comparing and matching literal text. 在大多数情况下,它可以快速、高效地执行模式匹配。In most cases, it performs pattern matching rapidly and efficiently. 但在某些情况下,正则表达式引擎的速度似乎很慢。However, in some cases, the regular expression engine can appear to be very slow. 在极端情况下,它甚至看似停止响应,因为它会用若干个小时甚至若干天处理相对小的输入。In extreme cases, it can even appear to stop responding as it processes a relatively small input over the course of hours or even days.

本主题概述开发人员为了确保其正则表达式实现最佳性能可以采纳的一些最佳做法。This topic outlines some of the best practices that developers can adopt to ensure that their regular expressions achieve optimal performance.

考虑输入源Consider the input source

通常,正则表达式可接受两种类型的输入:受约束的输入或不受约束的输入。In general, regular expressions can accept two types of input: constrained or unconstrained. 受约束的输入是源自已知或可靠的源并遵循预定义格式的文本。Constrained input is text that originates from a known or reliable source and follows a predefined format. 不受约束的输入是源自不可靠的源(如 Web 用户)并且可能不遵循预定义或预期格式的文本。Unconstrained input is text that originates from an unreliable source, such as a web user, and may not follow a predefined or expected format.

编写的正则表达式模式的目的通常是匹配有效输入。Regular expression patterns are typically written to match valid input. 也就是说,开发人员检查他们要匹配的文本,然后编写与其匹配的正则表达式模式。That is, developers examine the text that they want to match and then write a regular expression pattern that matches it. 然后,开发人员使用多个有效输入项进行测试,以确定此模式是否需要更正或进一步细化。Developers then determine whether this pattern requires correction or further elaboration by testing it with multiple valid input items. 当模式可匹配所有假定的有效输入时,则将其声明为生产就绪并且可包括在发布的应用程序中。When the pattern matches all presumed valid inputs, it is declared to be production-ready and can be included in a released application. 这使得正则表达式模式适合匹配受约束的输入。This makes a regular expression pattern suitable for matching constrained input. 但它不适合匹配不受约束的输入。However, it does not make it suitable for matching unconstrained input.

若要匹配不受约束的输入,正则表达式必须能够高效处理以下三种文本:To match unconstrained input, a regular expression must be able to efficiently handle three kinds of text:

  • 与正则表达式模式匹配的文本。Text that matches the regular expression pattern.

  • 与正则表达式模式不匹配的文本。Text that does not match the regular expression pattern.

  • 与正则表达式模式大致匹配的文本。Text that nearly matches the regular expression pattern.

对于为了处理受约束的输入而编写的正则表达式,最后一种文本类型尤其存在问题。The last text type is especially problematic for a regular expression that has been written to handle constrained input. 如果该正则表达式还依赖大量回溯,则正则表达式引擎可能会花费大量时间(在有些情况下,需要许多个小时或许多天)来处理看似无害的文本。If that regular expression also relies on extensive backtracking, the regular expression engine can spend an inordinate amount of time (in some cases, many hours or days) processing seemingly innocuous text.

警告

下面的示例使用容易过度回溯并可能拒绝有效电子邮件地址的正则表达式。The following example uses a regular expression that is prone to excessive backtracking and that is likely to reject valid email addresses. 不应在电子邮件验证例程中使用。You should not use it in an email validation routine. 如需可验证电子邮件地址的正则表达式,请参阅如何:确认字符串是有效的电子邮件格式If you would like a regular expression that validates email addresses, see How to: Verify that Strings Are in Valid Email Format.

例如,考虑一种很常用但很有问题的用于验证电子邮件地址别名的正则表达式。For example, consider a very commonly used but extremely problematic regular expression for validating the alias of an email address. 编写正则表达式 ^[0-9A-Z]([-.\w]*[0-9A-Z])*$ 的目的是处理被视为有效的电子邮件地址,该地址包含一个字母数字字符,后跟零个或多个可为字母数字、句点或连字符的字符。The regular expression ^[0-9A-Z]([-.\w]*[0-9A-Z])*$ is written to process what is considered to be a valid email address, which consists of an alphanumeric character, followed by zero or more characters that can be alphanumeric, periods, or hyphens. 该正则表达式必须以字母数字字符结束。The regular expression must end with an alphanumeric character. 但正如下面的示例所示,尽管此正则表达式可以轻松处理有效输入,但在处理接近有效的输入时性能非常低效。However, as the following example shows, although this regular expression handles valid input easily, its performance is very inefficient when it is processing nearly valid input.

using System;
using System.Diagnostics;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      Stopwatch sw;    
      string[] addresses = { "AAAAAAAAAAA@contoso.com", 
                             "AAAAAAAAAAaaaaaaaaaa!@contoso.com" };
      // The following regular expression should not actually be used to 
      // validate an email address.
      string pattern = @"^[0-9A-Z]([-.\w]*[0-9A-Z])*$";
      string input; 
      
      foreach (var address in addresses) {
         string mailBox = address.Substring(0, address.IndexOf("@"));       
         int index = 0;
         for (int ctr = mailBox.Length - 1; ctr >= 0; ctr--) {
            index++;

            input = mailBox.Substring(ctr, index); 
            sw = Stopwatch.StartNew();
            Match m = Regex.Match(input, pattern, RegexOptions.IgnoreCase);
            sw.Stop();
            if (m.Success)
               Console.WriteLine("{0,2}. Matched '{1,25}' in {2}", 
                                 index, m.Value, sw.Elapsed);
            else                     
               Console.WriteLine("{0,2}. Failed  '{1,25}' in {2}", 
                                 index, input, sw.Elapsed);
         }
         Console.WriteLine();
      }
   }
}

// The example displays output similar to the following:
//     1. Matched '                        A' in 00:00:00.0007122
//     2. Matched '                       AA' in 00:00:00.0000282
//     3. Matched '                      AAA' in 00:00:00.0000042
//     4. Matched '                     AAAA' in 00:00:00.0000038
//     5. Matched '                    AAAAA' in 00:00:00.0000042
//     6. Matched '                   AAAAAA' in 00:00:00.0000042
//     7. Matched '                  AAAAAAA' in 00:00:00.0000042
//     8. Matched '                 AAAAAAAA' in 00:00:00.0000087
//     9. Matched '                AAAAAAAAA' in 00:00:00.0000045
//    10. Matched '               AAAAAAAAAA' in 00:00:00.0000045
//    11. Matched '              AAAAAAAAAAA' in 00:00:00.0000045
//    
//     1. Failed  '                        !' in 00:00:00.0000447
//     2. Failed  '                       a!' in 00:00:00.0000071
//     3. Failed  '                      aa!' in 00:00:00.0000071
//     4. Failed  '                     aaa!' in 00:00:00.0000061
//     5. Failed  '                    aaaa!' in 00:00:00.0000081
//     6. Failed  '                   aaaaa!' in 00:00:00.0000126
//     7. Failed  '                  aaaaaa!' in 00:00:00.0000359
//     8. Failed  '                 aaaaaaa!' in 00:00:00.0000414
//     9. Failed  '                aaaaaaaa!' in 00:00:00.0000758
//    10. Failed  '               aaaaaaaaa!' in 00:00:00.0001462
//    11. Failed  '              aaaaaaaaaa!' in 00:00:00.0002885
//    12. Failed  '             Aaaaaaaaaaa!' in 00:00:00.0005780
//    13. Failed  '            AAaaaaaaaaaa!' in 00:00:00.0011628
//    14. Failed  '           AAAaaaaaaaaaa!' in 00:00:00.0022851
//    15. Failed  '          AAAAaaaaaaaaaa!' in 00:00:00.0045864
//    16. Failed  '         AAAAAaaaaaaaaaa!' in 00:00:00.0093168
//    17. Failed  '        AAAAAAaaaaaaaaaa!' in 00:00:00.0185993
//    18. Failed  '       AAAAAAAaaaaaaaaaa!' in 00:00:00.0366723
//    19. Failed  '      AAAAAAAAaaaaaaaaaa!' in 00:00:00.1370108
//    20. Failed  '     AAAAAAAAAaaaaaaaaaa!' in 00:00:00.1553966
//    21. Failed  '    AAAAAAAAAAaaaaaaaaaa!' in 00:00:00.3223372
Imports System.Diagnostics
Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Dim sw As Stopwatch    
      Dim addresses() As String = { "AAAAAAAAAAA@contoso.com", 
                                 "AAAAAAAAAAaaaaaaaaaa!@contoso.com" }
      ' The following regular expression should not actually be used to 
      ' validate an email address.
      Dim pattern As String = "^[0-9A-Z]([-.\w]*[0-9A-Z])*$"
      Dim input As String 
      
      For Each address In addresses
         Dim mailBox As String = address.Substring(0, address.IndexOf("@"))       
         Dim index As Integer = 0
         For ctr As Integer = mailBox.Length - 1 To 0 Step -1
            index += 1
            input = mailBox.Substring(ctr, index) 
            sw = Stopwatch.StartNew()
            Dim m As Match = Regex.Match(input, pattern, RegexOptions.IgnoreCase)
            sw.Stop()
            if m.Success Then
               Console.WriteLine("{0,2}. Matched '{1,25}' in {2}", 
                                 index, m.Value, sw.Elapsed)
            Else                     
               Console.WriteLine("{0,2}. Failed  '{1,25}' in {2}", 
                                 index, input, sw.Elapsed)
            End If                  
         Next
         Console.WriteLine()
      Next
   End Sub
End Module
' The example displays output similar to the following:
'     1. Matched '                        A' in 00:00:00.0007122
'     2. Matched '                       AA' in 00:00:00.0000282
'     3. Matched '                      AAA' in 00:00:00.0000042
'     4. Matched '                     AAAA' in 00:00:00.0000038
'     5. Matched '                    AAAAA' in 00:00:00.0000042
'     6. Matched '                   AAAAAA' in 00:00:00.0000042
'     7. Matched '                  AAAAAAA' in 00:00:00.0000042
'     8. Matched '                 AAAAAAAA' in 00:00:00.0000087
'     9. Matched '                AAAAAAAAA' in 00:00:00.0000045
'    10. Matched '               AAAAAAAAAA' in 00:00:00.0000045
'    11. Matched '              AAAAAAAAAAA' in 00:00:00.0000045
'    
'     1. Failed  '                        !' in 00:00:00.0000447
'     2. Failed  '                       a!' in 00:00:00.0000071
'     3. Failed  '                      aa!' in 00:00:00.0000071
'     4. Failed  '                     aaa!' in 00:00:00.0000061
'     5. Failed  '                    aaaa!' in 00:00:00.0000081
'     6. Failed  '                   aaaaa!' in 00:00:00.0000126
'     7. Failed  '                  aaaaaa!' in 00:00:00.0000359
'     8. Failed  '                 aaaaaaa!' in 00:00:00.0000414
'     9. Failed  '                aaaaaaaa!' in 00:00:00.0000758
'    10. Failed  '               aaaaaaaaa!' in 00:00:00.0001462
'    11. Failed  '              aaaaaaaaaa!' in 00:00:00.0002885
'    12. Failed  '             Aaaaaaaaaaa!' in 00:00:00.0005780
'    13. Failed  '            AAaaaaaaaaaa!' in 00:00:00.0011628
'    14. Failed  '           AAAaaaaaaaaaa!' in 00:00:00.0022851
'    15. Failed  '          AAAAaaaaaaaaaa!' in 00:00:00.0045864
'    16. Failed  '         AAAAAaaaaaaaaaa!' in 00:00:00.0093168
'    17. Failed  '        AAAAAAaaaaaaaaaa!' in 00:00:00.0185993
'    18. Failed  '       AAAAAAAaaaaaaaaaa!' in 00:00:00.0366723
'    19. Failed  '      AAAAAAAAaaaaaaaaaa!' in 00:00:00.1370108
'    20. Failed  '     AAAAAAAAAaaaaaaaaaa!' in 00:00:00.1553966
'    21. Failed  '    AAAAAAAAAAaaaaaaaaaa!' in 00:00:00.3223372

如该示例输出所示,正则表达式引擎处理有效电子邮件别名的时间间隔大致相同,与其长度无关。As the output from the example shows, the regular expression engine processes the valid email alias in about the same time interval regardless of its length. 另一方面,当接近有效的电子邮件地址包含五个以上字符时,字符串中每增加一个字符,处理时间会大约增加一倍。On the other hand, when the nearly valid email address has more than five characters, processing time approximately doubles for each additional character in the string. 这意味着,处理接近有效的 28 个字符构成的字符串将需要一个小时,处理接近有效的 33 个字符构成的字符串将需要接近一天的时间。This means that a nearly valid 28-character string would take over an hour to process, and a nearly valid 33-character string would take nearly a day to process.

由于开发此正则表达式时只考虑了要匹配的输入的格式,因此未能考虑与模式不匹配的输入。Because this regular expression was developed solely by considering the format of input to be matched, it fails to take account of input that does not match the pattern. 这反过来会使与正则表达式模式近似匹配的不受约束输入的性能显著降低。This, in turn, can allow unconstrained input that nearly matches the regular expression pattern to significantly degrade performance.

若要解决此问题,可执行下列操作:To solve this problem, you can do the following:

  • 开发模式时,应考虑回溯对正则表达式引擎的性能的影响程度,特别是当正则表达式设计用于处理不受约束的输入时。When developing a pattern, you should consider how backtracking might affect the performance of the regular expression engine, particularly if your regular expression is designed to process unconstrained input. 有关详细信息,请参阅控制回溯部分。For more information, see the Take Charge of Backtracking section.

  • 使用无效输入、接近有效的输入以及有效输入对正则表达式进行完全测试。Thoroughly test your regular expression using invalid and near-valid input as well as valid input. 若要为特定正则表达式随机生成输入,可以使用 Rex,这是 Microsoft Research 提供的正则表达式探索工具。To generate input for a particular regular expression randomly, you can use Rex, which is a regular expression exploration tool from Microsoft Research.

适当处理对象实例化Handle object instantiation appropriately

.NET 正则表达式对象模型的核心是 System.Text.RegularExpressions.Regex 类,表示正则表达式引擎。At the heart of .NET’s regular expression object model is the System.Text.RegularExpressions.Regex class, which represents the regular expression engine. 通常,影响正则表达式性能的单个最大因素是 Regex 引擎的使用方式。Often, the single greatest factor that affects regular expression performance is the way in which the Regex engine is used. 定义正则表达式需要将正则表达式引擎与正则表达式模式紧密耦合。Defining a regular expression involves tightly coupling the regular expression engine with a regular expression pattern. 无论该耦合过程是需要通过向其构造函数传递正则表达式模式来实例化 Regex 还是通过向其传递正则表达式模式和要分析的字符串来调用静态方法,都必然会消耗大量资源。That coupling process, whether it involves instantiating a Regex object by passing its constructor a regular expression pattern or calling a static method by passing it the regular expression pattern along with the string to be analyzed, is by necessity an expensive one.

备注

若要详细了解使用已解释和已编译正则表达式造成的性能影响,请参阅 BCL 团队博客中的 Optimizing Regular Expression Performance, Part II:Taking Charge of Backtracking(优化正则表达式性能,第 II 部分:控制回溯)。For a more detailed discussion of the performance implications of using interpreted and compiled regular expressions, see Optimizing Regular Expression Performance, Part II: Taking Charge of Backtracking in the BCL Team blog.

可将正则表达式引擎与特定正则表达式模式耦合,然后使用该引擎以若干种方式匹配文本:You can couple the regular expression engine with a particular regular expression pattern and then use the engine to match text in several ways:

  • 可以调用静态模式匹配方法,如 Regex.Match(String, String)You can call a static pattern-matching method, such as Regex.Match(String, String). 这不需要实例化正则表达式对象。This does not require instantiation of a regular expression object.

  • 可以实例化一个 Regex 对象并调用已解释的正则表达式的实例模式匹配方法。You can instantiate a Regex object and call an instance pattern-matching method of an interpreted regular expression. 这是将正则表达式引擎绑定到正则表达式模式的默认方法。This is the default method for binding the regular expression engine to a regular expression pattern. 如果实例化 Regex 对象时未使用包括 options 标记的 Compiled 自变量,则会生成此方法。It results when a Regex object is instantiated without an options argument that includes the Compiled flag.

  • 可以实例化一个 Regex 对象并调用已编译的正则表达式的实例模式匹配方法。You can instantiate a Regex object and call an instance pattern-matching method of a compiled regular expression. 当使用包括 Regex 标记的 options 参数实例化 Compiled 对象时,正则表达式对象表示已编译的模式。Regular expression objects represent compiled patterns when a Regex object is instantiated with an options argument that includes the Compiled flag.

  • 可以创建一个与特定正则表达式模式紧密耦合的特殊用途的 Regex 对象,编译该对象,并将其保存到独立程序集中。You can create a special-purpose Regex object that is tightly coupled with a particular regular expression pattern, compile it, and save it to a standalone assembly. 为此,可调用 Regex.CompileToAssembly 方法。You do this by calling the Regex.CompileToAssembly method.

这种调用正则表达式匹配方法的特殊方式会对应用程序产生显著影响。The particular way in which you call regular expression matching methods can have a significant impact on your application. 以下各节讨论何时使用静态方法调用、已解释的正则表达式和已编译的正则表达式,以改进应用程序的性能。The following sections discuss when to use static method calls, interpreted regular expressions, and compiled regular expressions to improve your application's performance.

重要

如果方法调用中重复使用同一正则表达式或者应用程序大量使用正则表达式对象,则方法调用的形式(静态、已解释、已编译)会影响性能。The form of the method call (static, interpreted, compiled) affects performance if the same regular expression is used repeatedly in method calls, or if an application makes extensive use of regular expression objects.

静态正则表达式Static regular expressions

建议将静态正则表达式方法用作使用同一正则表达式重复实例化正则表达式对象的替代方法。Static regular expression methods are recommended as an alternative to repeatedly instantiating a regular expression object with the same regular expression. 与正则表达式对象使用的正则表达式模式不同,静态方法调用所使用的模式中的操作代码或已编译的 Microsoft 中间语言 (MSIL) 由正则表达式引擎缓存在内部。Unlike regular expression patterns used by regular expression objects, either the operation codes or the compiled Microsoft intermediate language (MSIL) from patterns used in static method calls is cached internally by the regular expression engine.

例如,事件处理程序会频繁调用其他方法来验证用户输入。For example, an event handler frequently calls another method to validate user input. 下面的代码中反映了这一点,其中一个 Button 控件的 Click 事件用于调用名为 IsValidCurrency 的方法,该方法检查用户是否输入了后跟至少一个十进制数的货币符号。This is reflected in the following code, in which a Button control's Click event is used to call a method named IsValidCurrency, which checks whether the user has entered a currency symbol followed by at least one decimal digit.

public void OKButton_Click(object sender, EventArgs e) 
{
   if (! String.IsNullOrEmpty(sourceCurrency.Text))
      if (RegexLib.IsValidCurrency(sourceCurrency.Text))
         PerformConversion();
      else
         status.Text = "The source currency value is invalid.";
}
Public Sub OKButton_Click(sender As Object, e As EventArgs) _ 
           Handles OKButton.Click

   If Not String.IsNullOrEmpty(sourceCurrency.Text) Then
      If RegexLib.IsValidCurrency(sourceCurrency.Text) Then
         PerformConversion()
      Else
         status.Text = "The source currency value is invalid."
      End If          
   End If
End Sub

下面的示例显示 IsValidCurrency 方法的一个非常低效的实现。A very inefficient implementation of the IsValidCurrency method is shown in the following example. 请注意,每个方法调用使用相同模式重新实例化 Regex 对象。Note that each method call reinstantiates a Regex object with the same pattern. 这反过来意味着,每次调用该方法时,都必须重新编译正则表达式模式。This, in turn, means that the regular expression pattern must be recompiled each time the method is called.

using System;
using System.Text.RegularExpressions;

public class RegexLib
{
   public static bool IsValidCurrency(string currencyValue)
   {
      string pattern = @"\p{Sc}+\s*\d+";
      Regex currencyRegex = new Regex(pattern);
      return currencyRegex.IsMatch(currencyValue);
   }
}
Imports System.Text.RegularExpressions

Public Module RegexLib
   Public Function IsValidCurrency(currencyValue As String) As Boolean
      Dim pattern As String = "\p{Sc}+\s*\d+"
      Dim currencyRegex As New Regex(pattern)
      Return currencyRegex.IsMatch(currencyValue) 
   End Function
End Module

应将此低效代码替换为对静态 Regex.IsMatch(String, String) 方法的调用。You should replace this inefficient code with a call to the static Regex.IsMatch(String, String) method. 这样便不必在你每次要调用模式匹配方法时都实例化 Regex 对象,还允许正则表达式引擎从其缓存中检索正则表达式的已编译版本。This eliminates the need to instantiate a Regex object each time you want to call a pattern-matching method, and enables the regular expression engine to retrieve a compiled version of the regular expression from its cache.

using System;
using System.Text.RegularExpressions;

public class RegexLib
{
   public static bool IsValidCurrency(string currencyValue)
   {
      string pattern = @"\p{Sc}+\s*\d+";
      return Regex.IsMatch(currencyValue, pattern); 
   }
}
Imports System.Text.RegularExpressions

Public Module RegexLib
   Public Function IsValidCurrency(currencyValue As String) As Boolean
      Dim pattern As String = "\p{Sc}+\s*\d+"
      Return Regex.IsMatch(currencyValue, pattern)
   End Function
End Module

默认情况下,将缓存最后 15 个最近使用的静态正则表达式模式。By default, the last 15 most recently used static regular expression patterns are cached. 对于需要大量已缓存的静态正则表达式的应用程序,可通过设置 Regex.CacheSize 属性来调整缓存大小。For applications that require a larger number of cached static regular expressions, the size of the cache can be adjusted by setting the Regex.CacheSize property.

此示例中使用的正则表达式 \p{Sc}+\s*\d+ 可验证输入字符串是否包含一个货币符号和至少一个十进制数。The regular expression \p{Sc}+\s*\d+ that is used in this example verifies that the input string consists of a currency symbol and at least one decimal digit. 模式的定义如下表所示。The pattern is defined as shown in the following table.

模式Pattern 描述Description
\p{Sc}+ 与 Unicode 符号、货币类别中的一个或多个字符匹配。Match one or more characters in the Unicode Symbol, Currency category.
\s* 匹配零个或多个空白字符。Match zero or more white-space characters.
\d+ 匹配一个或多个十进制数字。Match one or more decimal digits.

已解释与已编译的正则表达式Interpreted vs. compiled regular expressions

将解释未通过 Compiled 选项的规范绑定到正则表达式引擎的正则表达式模式。Regular expression patterns that are not bound to the regular expression engine through the specification of the Compiled option are interpreted. 在实例化正则表达式对象时,正则表达式引擎会将正则表达式转换为一组操作代码。When a regular expression object is instantiated, the regular expression engine converts the regular expression to a set of operation codes. 调用实例方法时,操作代码会转换为 MSIL 并由 JIT 编译器执行。When an instance method is called, the operation codes are converted to MSIL and executed by the JIT compiler. 同样,当调用一种静态正则表达式方法并且在缓存中找不到该正则表达式时,正则表达式引擎会将该正则表达式转换为一组操作代码并将其存储在缓存中。Similarly, when a static regular expression method is called and the regular expression cannot be found in the cache, the regular expression engine converts the regular expression to a set of operation codes and stores them in the cache. 然后,它将这些操作代码转换为 MSIL,以便于 JIT 编译器执行。It then converts these operation codes to MSIL so that the JIT compiler can execute them. 已解释的正则表达式会减少启动时间,但会使执行速度变慢。Interpreted regular expressions reduce startup time at the cost of slower execution time. 因此,在少数方法调用中使用正则表达式时或调用正则表达式方法的确切数量未知但预期很小时,使用已解释的正则表达式的效果最佳。Because of this, they are best used when the regular expression is used in a small number of method calls, or if the exact number of calls to regular expression methods is unknown but is expected to be small. 随着方法调用数量的增加,执行速度变慢对性能的影响会超过减少启动时间带来的性能改进。As the number of method calls increases, the performance gain from reduced startup time is outstripped by the slower execution speed.

将编译通过 Compiled 选项的规范绑定到正则表达式引擎的正则表达式模式。Regular expression patterns that are bound to the regular expression engine through the specification of the Compiled option are compiled. 这意味着,当实例化正则表达式对象时或当调用一种静态正则表达式方法并且在缓存中找不到该正则表达式时,正则表达式引擎会将该正则表达式转换为一组中间操作代码,这些代码之后会转换为 MSIL。This means that, when a regular expression object is instantiated, or when a static regular expression method is called and the regular expression cannot be found in the cache, the regular expression engine converts the regular expression to an intermediary set of operation codes, which it then converts to MSIL. 调用方法时,JIT 编译器将执行该 MSIL。When a method is called, the JIT compiler executes the MSIL. 与已解释的正则表达式相比,已编译的正则表达式增加了启动时间,但执行各种模式匹配方法的速度更快。In contrast to interpreted regular expressions, compiled regular expressions increase startup time but execute individual pattern-matching methods faster. 因此,相对于调用的正则表达式方法的数量,因编译正则表达式而产生的性能产生了改进。As a result, the performance benefit that results from compiling the regular expression increases in proportion to the number of regular expression methods called.

简言之,当你使用特定正则表达式调用正则表达式方法相对不频繁时,建议使用已解释的正则表达式。To summarize, we recommend that you use interpreted regular expressions when you call regular expression methods with a specific regular expression relatively infrequently. 当你使用特定正则表达式调用正则表达式方法相对频繁时,应使用已编译的正则表达式。You should use compiled regular expressions when you call regular expression methods with a specific regular expression relatively frequently. 很难确定已解释的正则表达式执行速度减慢超出启动时间减少带来的性能增益的确切阈值,或已编译的正则表达式启动速度减慢超出执行速度加快带来的性能增益的阈值。The exact threshold at which the slower execution speeds of interpreted regular expressions outweigh gains from their reduced startup time, or the threshold at which the slower startup times of compiled regular expressions outweigh gains from their faster execution speeds, is difficult to determine. 这依赖于各种因素,包括正则表达式的复杂程度和它处理的特定数据。It depends on a variety of factors, including the complexity of the regular expression and the specific data that it processes. 若要确定已解释或已编译的正则表达式是否可为特定应用程序方案提供最佳性能,可以使用 Stopwatch 类来比较其执行时间。To determine whether interpreted or compiled regular expressions offer the best performance for your particular application scenario, you can use the Stopwatch class to compare their execution times.

下面的示例比较了已编译和已解释正则表达式在读取 Theodore Dreiser 所著《金融家》 中前十句文本和所有句文本时的性能。The following example compares the performance of compiled and interpreted regular expressions when reading the first ten sentences and when reading all the sentences in the text of Theodore Dreiser's The Financier. 如示例输出所示,当只对匹配方法的正则表达式进行十次调用时,已解释的正则表达式与已编译的正则表达式相比,可提供更好的性能。As the output from the example shows, when only ten calls are made to regular expression matching methods, an interpreted regular expression offers better performance than a compiled regular expression. 但是,当进行大量调用(在此示例中,超过 13,000 次调用)时,已编译的正则表达式可提供更好的性能。However, a compiled regular expression offers better performance when a large number of calls (in this case, over 13,000) are made.

using System;
using System.Diagnostics;
using System.IO;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string pattern = @"\b(\w+((\r?\n)|,?\s))*\w+[.?:;!]";
      Stopwatch sw;
      Match match;
      int ctr;

      StreamReader inFile = new StreamReader(@".\Dreiser_TheFinancier.txt");
      string input = inFile.ReadToEnd();
      inFile.Close();
      
      // Read first ten sentences with interpreted regex.
      Console.WriteLine("10 Sentences with Interpreted Regex:");
      sw = Stopwatch.StartNew();
      Regex int10 = new Regex(pattern, RegexOptions.Singleline);
      match = int10.Match(input);
      for (ctr = 0; ctr <= 9; ctr++) {
         if (match.Success)
            // Do nothing with the match except get the next match.
            match = match.NextMatch();
         else
            break;
      }
      sw.Stop();
      Console.WriteLine("   {0} matches in {1}", ctr, sw.Elapsed);
      
      // Read first ten sentences with compiled regex.
      Console.WriteLine("10 Sentences with Compiled Regex:");
      sw = Stopwatch.StartNew();
      Regex comp10 = new Regex(pattern, 
                   RegexOptions.Singleline | RegexOptions.Compiled);
      match = comp10.Match(input);
      for (ctr = 0; ctr <= 9; ctr++) {
         if (match.Success)
            // Do nothing with the match except get the next match.
            match = match.NextMatch();
         else
            break;
      }
      sw.Stop();
      Console.WriteLine("   {0} matches in {1}", ctr, sw.Elapsed);
      
      // Read all sentences with interpreted regex.
      Console.WriteLine("All Sentences with Interpreted Regex:");
      sw = Stopwatch.StartNew();
      Regex intAll = new Regex(pattern, RegexOptions.Singleline);
      match = intAll.Match(input);
      int matches = 0;
      while (match.Success) {
         matches++;
         // Do nothing with the match except get the next match.
         match = match.NextMatch();
      }
      sw.Stop();
      Console.WriteLine("   {0:N0} matches in {1}", matches, sw.Elapsed);
      
      // Read all sentences with compiled regex.
      Console.WriteLine("All Sentences with Compiled Regex:");
      sw = Stopwatch.StartNew();
      Regex compAll = new Regex(pattern, 
                      RegexOptions.Singleline | RegexOptions.Compiled);
      match = compAll.Match(input);
      matches = 0;
      while (match.Success) {
         matches++;
         // Do nothing with the match except get the next match.
         match = match.NextMatch();
      }
      sw.Stop();
      Console.WriteLine("   {0:N0} matches in {1}", matches, sw.Elapsed);      
   }
}
// The example displays the following output:
//       10 Sentences with Interpreted Regex:
//          10 matches in 00:00:00.0047491
//       10 Sentences with Compiled Regex:
//          10 matches in 00:00:00.0141872
//       All Sentences with Interpreted Regex:
//          13,443 matches in 00:00:01.1929928
//       All Sentences with Compiled Regex:
//          13,443 matches in 00:00:00.7635869
//       
//       >compare1
//       10 Sentences with Interpreted Regex:
//          10 matches in 00:00:00.0046914
//       10 Sentences with Compiled Regex:
//          10 matches in 00:00:00.0143727
//       All Sentences with Interpreted Regex:
//          13,443 matches in 00:00:01.1514100
//       All Sentences with Compiled Regex:
//          13,443 matches in 00:00:00.7432921
Imports System.Diagnostics
Imports System.IO
Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Dim pattern As String = "\b(\w+((\r?\n)|,?\s))*\w+[.?:;!]"
      Dim sw As Stopwatch
      Dim match As Match
      Dim ctr As Integer

      Dim inFile As New StreamReader(".\Dreiser_TheFinancier.txt")
      Dim input As String = inFile.ReadToEnd()
      inFile.Close()
      
      ' Read first ten sentences with interpreted regex.
      Console.WriteLine("10 Sentences with Interpreted Regex:")
      sw = Stopwatch.StartNew()
      Dim int10 As New Regex(pattern, RegexOptions.SingleLine)
      match = int10.Match(input)
      For ctr = 0 To 9
         If match.Success Then
            ' Do nothing with the match except get the next match.
            match = match.NextMatch()
         Else
            Exit For
         End If
      Next
      sw.Stop()
      Console.WriteLine("   {0} matches in {1}", ctr, sw.Elapsed)
      
      ' Read first ten sentences with compiled regex.
      Console.WriteLine("10 Sentences with Compiled Regex:")
      sw = Stopwatch.StartNew()
      Dim comp10 As New Regex(pattern, 
                   RegexOptions.SingleLine Or RegexOptions.Compiled)
      match = comp10.Match(input)
      For ctr = 0 To 9
         If match.Success Then
            ' Do nothing with the match except get the next match.
            match = match.NextMatch()
         Else
            Exit For
         End If
      Next
      sw.Stop()
      Console.WriteLine("   {0} matches in {1}", ctr, sw.Elapsed)
      
      ' Read all sentences with interpreted regex.
      Console.WriteLine("All Sentences with Interpreted Regex:")
      sw = Stopwatch.StartNew()
      Dim intAll As New Regex(pattern, RegexOptions.SingleLine)
      match = intAll.Match(input)
      Dim matches As Integer = 0
      Do While match.Success
         matches += 1
         ' Do nothing with the match except get the next match.
         match = match.NextMatch()
      Loop
      sw.Stop()
      Console.WriteLine("   {0:N0} matches in {1}", matches, sw.Elapsed)
      
      ' Read all sentences with compiled regex.
      Console.WriteLine("All Sentences with Compiled Regex:")
      sw = Stopwatch.StartNew()
      Dim compAll As New Regex(pattern, 
                     RegexOptions.SingleLine Or RegexOptions.Compiled)
      match = compAll.Match(input)
      matches = 0
      Do While match.Success
         matches += 1
         ' Do nothing with the match except get the next match.
         match = match.NextMatch()
      Loop
      sw.Stop()
      Console.WriteLine("   {0:N0} matches in {1}", matches, sw.Elapsed)      
   End Sub
End Module
' The example displays output like the following:
'       10 Sentences with Interpreted Regex:
'          10 matches in 00:00:00.0047491
'       10 Sentences with Compiled Regex:
'          10 matches in 00:00:00.0141872
'       All Sentences with Interpreted Regex:
'          13,443 matches in 00:00:01.1929928
'       All Sentences with Compiled Regex:
'          13,443 matches in 00:00:00.7635869
'       
'       >compare1
'       10 Sentences with Interpreted Regex:
'          10 matches in 00:00:00.0046914
'       10 Sentences with Compiled Regex:
'          10 matches in 00:00:00.0143727
'       All Sentences with Interpreted Regex:
'          13,443 matches in 00:00:01.1514100
'       All Sentences with Compiled Regex:
'          13,443 matches in 00:00:00.7432921

该示例中使用的正则表达式模式 \b(\w+((\r?\n)|,?\s))*\w+[.?:;!] 的定义如下表所示。The regular expression pattern used in the example, \b(\w+((\r?\n)|,?\s))*\w+[.?:;!], is defined as shown in the following table.

模式Pattern 描述Description
\b 在单词边界处开始匹配。Begin the match at a word boundary.
\w+ 匹配一个或多个单词字符。Match one or more word characters.
(\r?\n)|,?\s) 匹配零个或一个回车符后跟一个换行符,或零个或一个逗号后跟一个空白字符。Match either zero or one carriage return followed by a newline character, or zero or one comma followed by a white-space character.
(\w+((\r?\n)|,?\s))* 匹配一个或多个单词字符的零个或多个事例,后跟零个或一个回车符和换行符,或后跟零个或一个逗号、一个空格字符。Match zero or more occurrences of one or more word characters that are followed either by zero or one carriage return and a newline character, or by zero or one comma followed by a white-space character.
\w+ 匹配一个或多个单词字符。Match one or more word characters.
[.?:;!] 匹配句号、问号、冒号、分号或感叹号。Match a period, question mark, colon, semicolon, or exclamation point.

正则表达式:编译为程序集Regular expressions: Compiled to an assembly

借助 .NET,还可以创建包含已编译正则表达式的程序集。.NET also enables you to create an assembly that contains compiled regular expressions. 这样会将正则表达式编译对性能造成的影响从运行时转移到设计时。This moves the performance hit of regular expression compilation from run time to design time. 但是,这还涉及一些其他工作:必须提前定义正则表达式并将其编译为程序集。However, it also involves some additional work: You must define the regular expressions in advance and compile them to an assembly. 然后,编译器在编译使用该程序集的正则表达式的源代码时,可以引用此程序集。The compiler can then reference this assembly when compiling source code that uses the assembly’s regular expressions. 程序集内的每个已编译正则表达式都由从 Regex 派生的类来表示。Each compiled regular expression in the assembly is represented by a class that derives from Regex.

若要将正则表达式编译为程序集,可调用 Regex.CompileToAssembly(RegexCompilationInfo[], AssemblyName) 方法并向其传递表示要编译的正则表达式的 RegexCompilationInfo 对象数组和包含有关要创建的程序集的信息的 AssemblyName 对象。To compile regular expressions to an assembly, you call the Regex.CompileToAssembly(RegexCompilationInfo[], AssemblyName) method and pass it an array of RegexCompilationInfo objects that represent the regular expressions to be compiled, and an AssemblyName object that contains information about the assembly to be created.

建议你在以下情况下将正则表达式编译为程序集:We recommend that you compile regular expressions to an assembly in the following situations:

  • 如果你是要创建可重用正则表达式库的组件开发人员。If you are a component developer who wants to create a library of reusable regular expressions.

  • 如果你预期正则表达式的模式匹配方法要被调用的次数无法确定 -- 从任意位置,次数可能为一次两次到上千上万次。If you expect your regular expression's pattern-matching methods to be called an indeterminate number of times -- anywhere from once or twice to thousands or tens of thousands of times. 与已编译或已解释的正则表达式不同,编译为单独程序集的正则表达式可提供与方法调用数量无关的一致性能。Unlike compiled or interpreted regular expressions, regular expressions that are compiled to separate assemblies offer performance that is consistent regardless of the number of method calls.

如果使用已编译的正则表达式来优化性能,则不应使用反射来创建程序集,加载正则表达式引擎并执行其模式匹配方法。If you are using compiled regular expressions to optimize performance, you should not use reflection to create the assembly, load the regular expression engine, and execute its pattern-matching methods. 这要求你避免动态生成正则表达式模式,并且要在创建程序集时指定模式匹配选项(如不区分大小写的模式匹配)。This requires that you avoid building regular expression patterns dynamically, and that you specify any pattern-matching options (such as case-insensitive pattern matching) at the time the assembly is created. 它还要求将创建程序集的代码与使用正则表达式的代码分离。It also requires that you separate the code that creates the assembly from the code that uses the regular expression.

下面的示例演示如何创建包含已编译的正则表达式的程序集。The following example shows how to create an assembly that contains a compiled regular expression. 它创建包含一个正则表达式类 SentencePattern 的程序集 RegexLib.dll,其中包含已解释与已编译的正则表达式部分中使用的句子匹配的正则表达式模式。It creates an assembly named RegexLib.dll with a single regular expression class, SentencePattern, that contains the sentence-matching regular expression pattern used in the Interpreted vs. Compiled Regular Expressions section.

using System;
using System.Reflection;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      RegexCompilationInfo SentencePattern =
                           new RegexCompilationInfo(@"\b(\w+((\r?\n)|,?\s))*\w+[.?:;!]",
                                                    RegexOptions.Multiline,
                                                    "SentencePattern",
                                                    "Utilities.RegularExpressions",
                                                    true);
      RegexCompilationInfo[] regexes = { SentencePattern };
      AssemblyName assemName = new AssemblyName("RegexLib, Version=1.0.0.1001, Culture=neutral, PublicKeyToken=null");
      Regex.CompileToAssembly(regexes, assemName);
   }
}
Imports System.Reflection
Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Dim SentencePattern As New RegexCompilationInfo("\b(\w+((\r?\n)|,?\s))*\w+[.?:;!]",
                                                      RegexOptions.Multiline,
                                                      "SentencePattern",
                                                      "Utilities.RegularExpressions",
                                                      True)
      Dim regexes() As RegexCompilationInfo = {SentencePattern}
      Dim assemName As New AssemblyName("RegexLib, Version=1.0.0.1001, Culture=neutral, PublicKeyToken=null")
      Regex.CompileToAssembly(regexes, assemName)
   End Sub
End Module

在将示例编译为可执行文件并运行时,它会创建一个名为 RegexLib.dll 的程序集。When the example is compiled to an executable and run, it creates an assembly named RegexLib.dll. 正则表达式用名为 Utilities.RegularExpressions.SentencePattern 并由 Regex 派生的类来表示。The regular expression is represented by a class named Utilities.RegularExpressions.SentencePattern that is derived from Regex. 然后,下面的示例使用已编译正则表达式,从 Theodore Dreiser 所著《金融家》 文本中提取句子。The following example then uses the compiled regular expression to extract the sentences from the text of Theodore Dreiser's The Financier.

using System;
using System.IO;
using System.Text.RegularExpressions;
using Utilities.RegularExpressions;

public class Example
{
   public static void Main()
   {
      SentencePattern pattern = new SentencePattern();
      StreamReader inFile = new StreamReader(@".\Dreiser_TheFinancier.txt");
      string input = inFile.ReadToEnd();
      inFile.Close();
      
      MatchCollection matches = pattern.Matches(input);
      Console.WriteLine("Found {0:N0} sentences.", matches.Count);      
   }
}
// The example displays the following output:
//      Found 13,443 sentences.
Imports System.IO
Imports System.Text.RegularExpressions
Imports Utilities.RegularExpressions

Module Example
   Public Sub Main()
      Dim pattern As New SentencePattern()
      Dim inFile As New StreamReader(".\Dreiser_TheFinancier.txt")
      Dim input As String = inFile.ReadToEnd()
      inFile.Close()
      
      Dim matches As MatchCollection = pattern.Matches(input)
      Console.WriteLine("Found {0:N0} sentences.", matches.Count)      
   End Sub
End Module
' The example displays the following output:
'      Found 13,443 sentences.

控制回溯Take charge of backtracking

通常,正则表达式引擎使用线性进度在输入字符串中移动并将其编译为正则表达式模式。Ordinarily, the regular expression engine uses linear progression to move through an input string and compare it to a regular expression pattern. 但是,当在正则表达式模式中使用不确定限定符(如 *+?)时,正则表达式引擎可能会放弃一部分成功的分部匹配,并返回以前保存的状态,以便为整个模式搜索成功匹配。However, when indeterminate quantifiers such as *, +, and ? are used in a regular expression pattern, the regular expression engine may give up a portion of successful partial matches and return to a previously saved state in order to search for a successful match for the entire pattern. 此过程称为回溯。This process is known as backtracking.

备注

若要详细了解回溯,请参阅正则表达式行为的详细信息回溯For more information on backtracking, see Details of Regular Expression Behavior and Backtracking. 若要详细了解回溯,请参阅 BCL 团队博客中的 Optimizing Regular Expression Performance, Part II:Taking Charge of Backtracking(优化正则表达式性能,第 II 部分:控制回溯)。For a detailed discussion of backtracking, see Optimizing Regular Expression Performance, Part II: Taking Charge of Backtracking in the BCL Team blog.

支持回溯可为正则表达式提供强大的功能和灵活性。Support for backtracking gives regular expressions power and flexibility. 还可将控制正则表达式引擎操作的职责交给正则表达式开发人员来处理。It also places the responsibility for controlling the operation of the regular expression engine in the hands of regular expression developers. 由于开发人员通常不了解此职责,因此其误用回溯或依赖过多回溯通常会显著降低正则表达式的性能。Because developers are often not aware of this responsibility, their misuse of backtracking or reliance on excessive backtracking often plays the most significant role in degrading regular expression performance. 在最糟糕的情况下,输入字符串中每增加一个字符,执行时间会加倍。In a worst-case scenario, execution time can double for each additional character in the input string. 实际上,如果过多使用回溯,则在输入与正则表达式模式近似匹配时很容易创建无限循环的编程等效形式;正则表达式引擎可能需要几小时甚至几天来处理相对短的输入字符串。In fact, by using backtracking excessively, it is easy to create the programmatic equivalent of an endless loop if input nearly matches the regular expression pattern; the regular expression engine may take hours or even days to process a relatively short input string.

通常,尽管回溯不是匹配所必需的,但应用程序会因使用回溯而对性能产生负面影响。Often, applications pay a performance penalty for using backtracking despite the fact that backtracking is not essential for a match. 例如,正则表达式 \b\p{Lu}\w*\b 将匹配以大写字符开头的所有单词,如下表所示。For example, the regular expression \b\p{Lu}\w*\b matches all words that begin with an uppercase character, as the following table shows.

模式Pattern 描述Description
\b 在单词边界处开始匹配。Begin the match at a word boundary.
\p{Lu} 匹配大写字符。Match an uppercase character.
\w* 匹配零个或多个单词字符。Match zero or more word characters.
\b 在单词边界处结束匹配。End the match at a word boundary.

由于单词边界与单词字符不同也不是其子集,因此正则表达式引擎在匹配单词字符时无法跨越单词边界。Because a word boundary is not the same as, or a subset of, a word character, there is no possibility that the regular expression engine will cross a word boundary when matching word characters. 这意味着,对于此正则表达式而言,回溯对任何匹配的总体成功不会有任何贡献 -- 由于正则表达式引擎被强制为单词字符的每个成功的初步匹配保存其状态,因此它只会降低性能。This means that for this regular expression, backtracking can never contribute to the overall success of any match -- it can only degrade performance, because the regular expression engine is forced to save its state for each successful preliminary match of a word character.

如果确定不需要回溯,可使用 (?>subexpression) 语言元素(被称为原子组)来禁用它。If you determine that backtracking is not necessary, you can disable it by using the (?>subexpression) language element, known as an atomic group. 下面的示例通过使用两个正则表达式来分析输入字符串。The following example parses an input string by using two regular expressions. 第一个正则表达式 \b\p{Lu}\w*\b 依赖于回溯。The first, \b\p{Lu}\w*\b, relies on backtracking. 第二个正则表达式 \b\p{Lu}(?>\w*)\b 禁用回溯。The second, \b\p{Lu}(?>\w*)\b, disables backtracking. 如示例输出所示,这两个正则表达式产生的结果相同。As the output from the example shows, they both produce the same result.

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string input = "This this word Sentence name Capital";
      string pattern = @"\b\p{Lu}\w*\b";
      foreach (Match match in Regex.Matches(input, pattern))
         Console.WriteLine(match.Value);

      Console.WriteLine();
      
      pattern = @"\b\p{Lu}(?>\w*)\b";   
      foreach (Match match in Regex.Matches(input, pattern))
         Console.WriteLine(match.Value);
   }
}
// The example displays the following output:
//       This
//       Sentence
//       Capital
//       
//       This
//       Sentence
//       Capital
Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Dim input As String = "This this word Sentence name Capital"
      Dim pattern As String = "\b\p{Lu}\w*\b"
      For Each match As Match In Regex.Matches(input, pattern)
         Console.WriteLine(match.Value)
      Next
      Console.WriteLine()
      
      pattern = "\b\p{Lu}(?>\w*)\b"   
      For Each match As Match In Regex.Matches(input, pattern)
         Console.WriteLine(match.Value)
      Next
   End Sub
End Module
' The example displays the following output:
'       This
'       Sentence
'       Capital
'       
'       This
'       Sentence
'       Capital

在许多情况下,在将正则表达式模式与输入文本匹配时,回溯很重要。In many cases, backtracking is essential for matching a regular expression pattern to input text. 但是,过度回溯会严重降低性能,并且会产生应用程序已停止响应的感觉。However, excessive backtracking can severely degrade performance and create the impression that an application has stopped responding. 特别需要指出的是,当嵌套限定符并且与外部子表达式匹配的文本为与内部子表达式匹配的文本的子集时,尤其会出现这种情况。In particular, this happens when quantifiers are nested and the text that matches the outer subexpression is a subset of the text that matches the inner subexpression.

警告

除避免过度回溯之外,还应使用超时功能以确保过度回溯不会严重降低正则表达式性能。In addition to avoiding excessive backtracking, you should use the timeout feature to ensure that excessive backtracking does not severely degrade regular expression performance. 有关详细信息,请参阅使用超时值部分。For more information, see the Use Time-out Values section.

例如,正则表达式模式 ^[0-9A-Z]([-.\w]*[0-9A-Z])*\$$ 用于匹配至少包括一个字母数字字符的部件号。For example, the regular expression pattern ^[0-9A-Z]([-.\w]*[0-9A-Z])*\$$ is intended to match a part number that consists of at least one alphanumeric character. 任何附加字符可以包含字母数字字符、连字符、下划线或句号,但最后一个字符必须为字母数字。Any additional characters can consist of an alphanumeric character, a hyphen, an underscore, or a period, though the last character must be alphanumeric. 美元符号用于终止部件号。A dollar sign terminates the part number. 在某些情况下,由于限定符嵌套并且子表达式 [0-9A-Z] 是子表达式 [-.\w]* 的子集,因此此正则表达式模式会表现出极差的性能。In some cases, this regular expression pattern can exhibit extremely poor performance because quantifiers are nested, and because the subexpression [0-9A-Z] is a subset of the subexpression [-.\w]*.

在这些情况下,可通过移除嵌套限定符并将外部子表达式替换为零宽度预测先行和回顾断言来优化正则表达式性能。In these cases, you can optimize regular expression performance by removing the nested quantifiers and replacing the outer subexpression with a zero-width lookahead or lookbehind assertion. 预测先行和回顾断言是定位点;它们不在输入字符串中移动指针,而是通过预测先行或回顾来检查是否满足指定条件。Lookahead and lookbehind assertions are anchors; they do not move the pointer in the input string, but instead look ahead or behind to check whether a specified condition is met. 例如,可将部件号正则表达式重写为 ^[0-9A-Z][-.\w]*(?<=[0-9A-Z])\$$For example, the part number regular expression can be rewritten as ^[0-9A-Z][-.\w]*(?<=[0-9A-Z])\$$. 此正则表达式模式的定义如下表所示。This regular expression pattern is defined as shown in the following table.

模式Pattern 描述Description
^ 从输入字符串的开头部分开始匹配。Begin the match at the beginning of the input string.
[0-9A-Z] 匹配字母数字字符。Match an alphanumeric character. 部件号至少要包含此字符。The part number must consist of at least this character.
[-.\w]* 匹配零个或多个任意单词字符、连字符或句号。Match zero or more occurrences of any word character, hyphen, or period.
\$ 匹配美元符号。Match a dollar sign.
(?<=[0-9A-Z]) 查看作为结束的美元符号,以确保前一个字符是字母数字。Look ahead of the ending dollar sign to ensure that the previous character is alphanumeric.
$ 在输入字符串末尾结束匹配。End the match at the end of the input string.

下面的示例演示了如何使用此正则表达式来匹配包含可能部件号的数组。The following example illustrates the use of this regular expression to match an array containing possible part numbers.

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string pattern = @"^[0-9A-Z][-.\w]*(?<=[0-9A-Z])\$$";
      string[] partNos = { "A1C$", "A4", "A4$", "A1603D$", "A1603D#" };
      
      foreach (var input in partNos) {
         Match match = Regex.Match(input, pattern);
         if (match.Success)
            Console.WriteLine(match.Value);
         else
            Console.WriteLine("Match not found.");
      }      
   }
}
// The example displays the following output:
//       A1C$
//       Match not found.
//       A4$
//       A1603D$
//       Match not found.
Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Dim pattern As String = "^[0-9A-Z][-.\w]*(?<=[0-9A-Z])\$$"
      Dim partNos() As String = { "A1C$", "A4", "A4$", "A1603D$", 
                                  "A1603D#" }
      
      For Each input As String In partNos
         Dim match As Match = Regex.Match(input, pattern)
         If match.Success Then
            Console.WriteLine(match.Value)
         Else
            Console.WriteLine("Match not found.")
         End If
      Next      
   End Sub
End Module
' The example displays the following output:
'       A1C$
'       Match not found.
'       A4$
'       A1603D$
'       Match not found.

.NET 中的正则表达式语言包括以下可用于消除嵌套限定符的语言元素。The regular expression language in .NET includes the following language elements that you can use to eliminate nested quantifiers. 有关详细信息,请参阅 分组构造For more information, see Grouping Constructs.

语言元素Language element 描述Description
(?= subexpression )(?= subexpression ) 零宽度正预测先行。Zero-width positive lookahead. 预测先行当前位置,以确定 subexpression 是否与输入字符串匹配。Look ahead of the current position to determine whether subexpression matches the input string.
(?! subexpression )(?! subexpression ) 零宽度负预测先行。Zero-width negative lookahead. 预测先行当前位置,以确定 subexpression 是否不与输入字符串匹配。Look ahead of the current position to determine whether subexpression does not match the input string.
(?<= subexpression )(?<= subexpression ) 零宽度正回顾。Zero-width positive lookbehind. 回顾后发当前位置,以确定 subexpression 是否与输入字符串匹配。Look behind the current position to determine whether subexpression matches the input string.
(?<! subexpression )(?<! subexpression ) 零宽度负回顾。Zero-width negative lookbehind. 回顾后发当前位置,以确定 subexpression 是否不与输入字符串匹配。Look behind the current position to determine whether subexpression does not match the input string.

使用超时值Use time-out values

如果正则表达式处理与正则表达式模式大致匹配的输入,则通常依赖于会严重影响其性能的过度回溯。If your regular expressions processes input that nearly matches the regular expression pattern, it can often rely on excessive backtracking, which impacts its performance significantly. 除认真考虑对回溯的使用以及针对大致匹配输入对正则表达式进行测试之外,还应始终设置一个超时值以确保最大程度地降低过度回溯的影响(如果有)。In addition to carefully considering your use of backtracking and testing the regular expression against near-matching input, you should always set a time-out value to ensure that the impact of excessive backtracking, if it occurs, is minimized.

正则表达式超时间隔定义了在超时前正则表达式引擎用于查找单个匹配项的时间长度。默认超时间隔为 Regex.InfiniteMatchTimeout,这意味着正则表达式不会超时。可以按如下所示重写此值并定义超时间隔:The regular expression time-out interval defines the period of time that the regular expression engine will look for a single match before it times out. The default time-out interval is Regex.InfiniteMatchTimeout, which means that the regular expression will not time out. You can override this value and define a time-out interval as follows:

如果定义了超时间隔并且在此间隔结束时未找到匹配项,则正则表达式方法将引发 RegexMatchTimeoutException 异常。If you have defined a time-out interval and a match is not found at the end of that interval, the regular expression method throws a RegexMatchTimeoutException exception. 在异常处理程序中,可以选择使用一个更长的超时间隔来重试匹配、放弃匹配尝试并假定没有匹配项,或者放弃匹配尝试并记录异常信息以供未来分析。In your exception handler, you can choose to retry the match with a longer time-out interval, abandon the match attempt and assume that there is no match, or abandon the match attempt and log the exception information for future analysis.

下面的示例定义了一种 GetWordData 方法,此方法实例化了一个正则表达式,使其具有 350 毫秒的超时间隔,用于计算文本文件中的词语数和一个词语中的平均字符数。The following example defines a GetWordData method that instantiates a regular expression with a time-out interval of 350 milliseconds to calculate the number of words and average number of characters in a word in a text document. 如果匹配操作超时,则超时间隔将延长 350 毫秒并重新实例化 Regex 对象。If the matching operation times out, the time-out interval is increased by 350 milliseconds and the Regex object is re-instantiated. 如果新的超时间隔超过 1 秒,则此方法将再次向调用方引发异常。If the new time-out interval exceeds 1 second, the method re-throws the exception to the caller.

using System;
using System.Collections.Generic;
using System.IO;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      RegexUtilities util = new RegexUtilities();
      string title = "Doyle - The Hound of the Baskervilles.txt";
      try {
         var info = util.GetWordData(title);
         Console.WriteLine("Words:               {0:N0}", info.Item1);
         Console.WriteLine("Average Word Length: {0:N2} characters", info.Item2); 
      }
      catch (IOException e) {
         Console.WriteLine("IOException reading file '{0}'", title);
         Console.WriteLine(e.Message);
      }
      catch (RegexMatchTimeoutException e) {
         Console.WriteLine("The operation timed out after {0:N0} milliseconds", 
                           e.MatchTimeout.TotalMilliseconds);
      }
   }
}

public class RegexUtilities
{
   public Tuple<int, double> GetWordData(string filename)
   { 
      const int MAX_TIMEOUT = 1000;   // Maximum timeout interval in milliseconds.
      const int INCREMENT = 350;      // Milliseconds increment of timeout.
      
      List<string> exclusions = new List<string>( new string[] { "a", "an", "the" });
      int[] wordLengths = new int[29];        // Allocate an array of more than ample size.
      string input = null;
      StreamReader sr = null;
      try { 
         sr = new StreamReader(filename);
         input = sr.ReadToEnd();
      }
      catch (FileNotFoundException e) {
         string msg = String.Format("Unable to find the file '{0}'", filename);
         throw new IOException(msg, e);
      }
      catch (IOException e) {
         throw new IOException(e.Message, e);
      }
      finally {
         if (sr != null) sr.Close(); 
      }

      int timeoutInterval = INCREMENT;
      bool init = false;
      Regex rgx = null;
      Match m = null;
      int indexPos = 0;  
      do {
         try {
            if (! init) {
               rgx = new Regex(@"\b\w+\b", RegexOptions.None, 
                               TimeSpan.FromMilliseconds(timeoutInterval));
               m = rgx.Match(input, indexPos);
               init = true;
            }
            else { 
               m = m.NextMatch();
            }
            if (m.Success) {    
               if ( !exclusions.Contains(m.Value.ToLower()))
                  wordLengths[m.Value.Length]++;

               indexPos += m.Length + 1;   
            }
         }
         catch (RegexMatchTimeoutException e) {
            if (e.MatchTimeout.TotalMilliseconds < MAX_TIMEOUT) {
               timeoutInterval += INCREMENT;
               init = false;
            }
            else {
               // Rethrow the exception.
               throw; 
            }   
         }          
      } while (m.Success);
            
      // If regex completed successfully, calculate number of words and average length.
      int nWords = 0; 
      long totalLength = 0;
      
      for (int ctr = wordLengths.GetLowerBound(0); ctr <= wordLengths.GetUpperBound(0); ctr++) {
         nWords += wordLengths[ctr];
         totalLength += ctr * wordLengths[ctr];
      }
      return new Tuple<int, double>(nWords, totalLength/nWords);
   }
}
Imports System.Collections.Generic
Imports System.IO
Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Dim util As New RegexUtilities()
      Dim title As String = "Doyle - The Hound of the Baskervilles.txt"
      Try
         Dim info = util.GetWordData(title)
         Console.WriteLine("Words:               {0:N0}", info.Item1)
         Console.WriteLine("Average Word Length: {0:N2} characters", info.Item2) 
      Catch e As IOException
         Console.WriteLine("IOException reading file '{0}'", title)
         Console.WriteLine(e.Message)
      Catch e As RegexMatchTimeoutException
         Console.WriteLine("The operation timed out after {0:N0} milliseconds", 
                           e.MatchTimeout.TotalMilliseconds)
      End Try
   End Sub
End Module

Public Class RegexUtilities
   Public Function GetWordData(filename As String) As Tuple(Of Integer, Double) 
      Const MAX_TIMEOUT As Integer = 1000  ' Maximum timeout interval in milliseconds.
      Const INCREMENT As Integer = 350     ' Milliseconds increment of timeout.
      
      Dim exclusions As New List(Of String)({"a", "an", "the" })
      Dim wordLengths(30) As Integer        ' Allocate an array of more than ample size.
      Dim input As String = Nothing
      Dim sr As StreamReader = Nothing
      Try 
         sr = New StreamReader(filename)
         input = sr.ReadToEnd()
      Catch e As FileNotFoundException
         Dim msg As String = String.Format("Unable to find the file '{0}'", filename)
         Throw New IOException(msg, e)
      Catch e As IOException
         Throw New IOException(e.Message, e)
      Finally
         If sr IsNot Nothing Then sr.Close() 
      End Try

      Dim timeoutInterval As Integer = INCREMENT
      Dim init As Boolean = False
      Dim rgx As Regex = Nothing
      Dim m As Match = Nothing
      Dim indexPos As Integer = 0  
      Do
         Try
            If Not init Then
               rgx = New Regex("\b\w+\b", RegexOptions.None, 
                               TimeSpan.FromMilliseconds(timeoutInterval))
               m = rgx.Match(input, indexPos)
               init = True
            Else 
               m = m.NextMatch()
            End If
            If m.Success Then    
               If Not exclusions.Contains(m.Value.ToLower()) Then
                  wordLengths(m.Value.Length) += 1
               End If
               indexPos += m.Length + 1   
            End If
         Catch e As RegexMatchTimeoutException
            If e.MatchTimeout.TotalMilliseconds < MAX_TIMEOUT Then
               timeoutInterval += INCREMENT
               init = False
            Else
               ' Rethrow the exception.
               Throw 
            End If   
         End Try          
      Loop While m.Success
            
      ' If regex completed successfully, calculate number of words and average length.
      Dim nWords As Integer
      Dim totalLength As Long
      
      For ctr As Integer = wordLengths.GetLowerBound(0) To wordLengths.GetUpperBound(0)
         nWords += wordLengths(ctr)
         totalLength += ctr * wordLengths(ctr)
      Next
      Return New Tuple(Of Integer, Double)(nWords, totalLength/nWords)
   End Function
End Class

只在必要时捕获Capture only when necessary

.NET 中的正则表达式支持许多分组构造,这样,便可以将正则表达式模式分组为一个或多个子表达式。Regular expressions in .NET support a number of grouping constructs, which let you group a regular expression pattern into one or more subexpressions. .NET 正则表达式语言中最常用的分组构造为 (subexpression )(用于定义编号捕获组)和 (?<name >subexpression )(用于定义命名捕获组)。The most commonly used grouping constructs in .NET regular expression language are (subexpression), which defines a numbered capturing group, and (?<name>subexpression), which defines a named capturing group. 分组构造是创建反向引用和定义要应用限定符的子表达式时所必需的。Grouping constructs are essential for creating backreferences and for defining a subexpression to which a quantifier is applied.

但是,使用这些语言元素会产生一定的开销。However, the use of these language elements has a cost. 它们会导致用最近的未命名或已命名捕获来填充 GroupCollection 属性返回的 Match.Groups 对象,如果单个分组构造已捕获输入字符串中的多个子字符串,则还会填充包含多个 CaptureCollection 对象的特定捕获组的 Group.Captures 属性返回的 Capture 对象。They cause the GroupCollection object returned by the Match.Groups property to be populated with the most recent unnamed or named captures, and if a single grouping construct has captured multiple substrings in the input string, they also populate the CaptureCollection object returned by the Group.Captures property of a particular capturing group with multiple Capture objects.

通常,只在正则表达式中使用分组构造,这样可对其应用限定符,而且以后不会使用这些子表达式捕获的组。Often, grouping constructs are used in a regular expression only so that quantifiers can be applied to them, and the groups captured by these subexpressions are not subsequently used. 例如,正则表达式 \b(\w+[;,]?\s?)+[.?!] 用于捕获整个句子。For example, the regular expression \b(\w+[;,]?\s?)+[.?!] is designed to capture an entire sentence. 下表描述了此正则表达式模式中的语言元素及其对 Match 对象的 Match.GroupsGroup.Captures 集合的影响。The following table describes the language elements in this regular expression pattern and their effect on the Match object's Match.Groups and Group.Captures collections.

模式Pattern 描述Description
\b 在单词边界处开始匹配。Begin the match at a word boundary.
\w+ 匹配一个或多个单词字符。Match one or more word characters.
[;,]? 匹配零个或一个逗号或分号。Match zero or one comma or semicolon.
\s? 匹配零个或一个空白字符。Match zero or one white-space character.
(\w+[;,]?\s?)+ 匹配以下一个或多个事例:一个或多个单词字符,后跟一个可选逗号或分号,一个可选的空白字符。Match one or more occurrences of one or more word characters followed by an optional comma or semicolon followed by an optional white-space character. 用于定义第一个捕获组,它是必需的,以便将重复多个单词字符的组合(即单词)后跟可选标点符号,直至正则表达式引擎到达句子末尾。This defines the first capturing group, which is necessary so that the combination of multiple word characters (that is, a word) followed by an optional punctuation symbol will be repeated until the regular expression engine reaches the end of a sentence.
[.?!] 匹配句号、问号或感叹号。Match a period, question mark, or exclamation point.

如下面的示例所示,当找到匹配时,GroupCollectionCaptureCollection 对象都将用匹配中的捕获内容来填充。As the following example shows, when a match is found, both the GroupCollection and CaptureCollection objects are populated with captures from the match. 在此情况下,存在捕获组 (\w+[;,]?\s?),因此可对其应用 + 限定符,从而使得正则表达式模式可与句子中的每个单词匹配。In this case, the capturing group (\w+[;,]?\s?) exists so that the + quantifier can be applied to it, which enables the regular expression pattern to match each word in a sentence. 否则,它将匹配句子中的最后一个单词。Otherwise, it would match the last word in a sentence.

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string input = "This is one sentence. This is another.";
      string pattern = @"\b(\w+[;,]?\s?)+[.?!]";
      
      foreach (Match match in Regex.Matches(input, pattern)) {
         Console.WriteLine("Match: '{0}' at index {1}.", 
                           match.Value, match.Index);
         int grpCtr = 0;
         foreach (Group grp in match.Groups) {
            Console.WriteLine("   Group {0}: '{1}' at index {2}.",
                              grpCtr, grp.Value, grp.Index);
            int capCtr = 0;
            foreach (Capture cap in grp.Captures) {
               Console.WriteLine("      Capture {0}: '{1}' at {2}.",
                                 capCtr, cap.Value, cap.Index);
               capCtr++;
            }
            grpCtr++;
         }          
         Console.WriteLine();        
      }
   }
}
// The example displays the following output:
//       Match: 'This is one sentence.' at index 0.
//          Group 0: 'This is one sentence.' at index 0.
//             Capture 0: 'This is one sentence.' at 0.
//          Group 1: 'sentence' at index 12.
//             Capture 0: 'This ' at 0.
//             Capture 1: 'is ' at 5.
//             Capture 2: 'one ' at 8.
//             Capture 3: 'sentence' at 12.
//       
//       Match: 'This is another.' at index 22.
//          Group 0: 'This is another.' at index 22.
//             Capture 0: 'This is another.' at 22.
//          Group 1: 'another' at index 30.
//             Capture 0: 'This ' at 22.
//             Capture 1: 'is ' at 27.
//             Capture 2: 'another' at 30.
Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Dim input As String = "This is one sentence. This is another."
      Dim pattern As String = "\b(\w+[;,]?\s?)+[.?!]"
      
      For Each match As Match In Regex.Matches(input, pattern)
         Console.WriteLine("Match: '{0}' at index {1}.", 
                           match.Value, match.Index)
         Dim grpCtr As Integer = 0
         For Each grp As Group In match.Groups
            Console.WriteLine("   Group {0}: '{1}' at index {2}.",
                              grpCtr, grp.Value, grp.Index)
            Dim capCtr As Integer = 0
            For Each cap As Capture In grp.Captures
               Console.WriteLine("      Capture {0}: '{1}' at {2}.",
                                 capCtr, cap.Value, cap.Index)
               capCtr += 1
            Next
            grpCtr += 1
         Next          
         Console.WriteLine()        
      Next    
   End Sub
End Module
' The example displays the following output:
'       Match: 'This is one sentence.' at index 0.
'          Group 0: 'This is one sentence.' at index 0.
'             Capture 0: 'This is one sentence.' at 0.
'          Group 1: 'sentence' at index 12.
'             Capture 0: 'This ' at 0.
'             Capture 1: 'is ' at 5.
'             Capture 2: 'one ' at 8.
'             Capture 3: 'sentence' at 12.
'       
'       Match: 'This is another.' at index 22.
'          Group 0: 'This is another.' at index 22.
'             Capture 0: 'This is another.' at 22.
'          Group 1: 'another' at index 30.
'             Capture 0: 'This ' at 22.
'             Capture 1: 'is ' at 27.
'             Capture 2: 'another' at 30.

当你只使用子表达式来对其应用限定符并且你对捕获的文本不感兴趣时,应禁用组捕获。When you use subexpressions only to apply quantifiers to them, and you are not interested in the captured text, you should disable group captures. 例如,(?:subexpression) 语言元素可防止应用此元素的组捕获匹配的子字符串。For example, the (?:subexpression) language element prevents the group to which it applies from capturing matched substrings. 在下面的示例中,上一示例中的正则表达式模式更改为 \b(?:\w+[;,]?\s?)+[.?!]In the following example, the regular expression pattern from the previous example is changed to \b(?:\w+[;,]?\s?)+[.?!]. 正如输出所示,它禁止正则表达式引擎填充 GroupCollectionCaptureCollection 集合。As the output shows, it prevents the regular expression engine from populating the GroupCollection and CaptureCollection collections.

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string input = "This is one sentence. This is another.";
      string pattern = @"\b(?:\w+[;,]?\s?)+[.?!]";
      
      foreach (Match match in Regex.Matches(input, pattern)) {
         Console.WriteLine("Match: '{0}' at index {1}.", 
                           match.Value, match.Index);
         int grpCtr = 0;
         foreach (Group grp in match.Groups) {
            Console.WriteLine("   Group {0}: '{1}' at index {2}.",
                              grpCtr, grp.Value, grp.Index);
            int capCtr = 0;
            foreach (Capture cap in grp.Captures) {
               Console.WriteLine("      Capture {0}: '{1}' at {2}.",
                                 capCtr, cap.Value, cap.Index);
               capCtr++;
            }
            grpCtr++;
         }          
         Console.WriteLine();        
      }
   }
}
// The example displays the following output:
//       Match: 'This is one sentence.' at index 0.
//          Group 0: 'This is one sentence.' at index 0.
//             Capture 0: 'This is one sentence.' at 0.
//       
//       Match: 'This is another.' at index 22.
//          Group 0: 'This is another.' at index 22.
//             Capture 0: 'This is another.' at 22.
Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Dim input As String = "This is one sentence. This is another."
      Dim pattern As String = "\b(?:\w+[;,]?\s?)+[.?!]"
      
      For Each match As Match In Regex.Matches(input, pattern)
         Console.WriteLine("Match: '{0}' at index {1}.", 
                           match.Value, match.Index)
         Dim grpCtr As Integer = 0
         For Each grp As Group In match.Groups
            Console.WriteLine("   Group {0}: '{1}' at index {2}.",
                              grpCtr, grp.Value, grp.Index)
            Dim capCtr As Integer = 0
            For Each cap As Capture In grp.Captures
               Console.WriteLine("      Capture {0}: '{1}' at {2}.",
                                 capCtr, cap.Value, cap.Index)
               capCtr += 1
            Next
            grpCtr += 1
         Next          
         Console.WriteLine()        
      Next    
   End Sub
End Module
' The example displays the following output:
'       Match: 'This is one sentence.' at index 0.
'          Group 0: 'This is one sentence.' at index 0.
'             Capture 0: 'This is one sentence.' at 0.
'       
'       Match: 'This is another.' at index 22.
'          Group 0: 'This is another.' at index 22.
'             Capture 0: 'This is another.' at 22.

可以通过以下方式之一来禁用捕获:You can disable captures in one of the following ways:

  • 使用 (?:subexpression) 语言元素。Use the (?:subexpression) language element. 此元素可防止在它应用的组中捕获匹配的子字符串。This element prevents the capture of matched substrings in the group to which it applies. 它不在任何嵌套的组中禁用子字符串捕获。It does not disable substring captures in any nested groups.

  • 使用 ExplicitCapture 选项。Use the ExplicitCapture option. 在正则表达式模式中禁用所有未命名或隐式捕获。It disables all unnamed or implicit captures in the regular expression pattern. 使用此选项时,只能捕获与使用 (?<name>subexpression) 语言元素定义的命名组匹配的子字符串。When you use this option, only substrings that match named groups defined with the (?<name>subexpression) language element can be captured. 可将 ExplicitCapture 标记传递给 options 类构造函数的 Regex 参数或 options 静态匹配方法的 Regex 参数。The ExplicitCapture flag can be passed to the options parameter of a Regex class constructor or to the options parameter of a Regex static matching method.

  • n 语言元素中使用 (?imnsx) 选项。Use the n option in the (?imnsx) language element. 此选项将在元素出现的正则表达式模式中的点处禁用所有未命名或隐式捕获。This option disables all unnamed or implicit captures from the point in the regular expression pattern at which the element appears. 捕获将一直禁用到模式结束或 (-n) 选项启用未命名或隐式捕获。Captures are disabled either until the end of the pattern or until the (-n) option enables unnamed or implicit captures. 有关详细信息,请参阅 其他构造For more information, see Miscellaneous Constructs.

  • n 语言元素中使用 (?imnsx:subexpression) 选项。Use the n option in the (?imnsx:subexpression) language element. 此选项可在 subexpression 中禁用所有未命名或隐式捕获。This option disables all unnamed or implicit captures in subexpression. 同时禁用任何未命名或隐式的嵌套捕获组进行的任何捕获。Captures by any unnamed or implicit nested capturing groups are disabled as well.

TitleTitle 描述Description
正则表达式行为的详细信息Details of Regular Expression Behavior 在 .NET 中检查正则表达式引擎的实现。Examines the implementation of the regular expression engine in .NET. 该主题重点介绍正则表达式的灵活性,并说明开发人员确保正则表达式引擎高效、强健运行的职责。The topic focuses on the flexibility of regular expressions and explains the developer's responsibility for ensuring the efficient and robust operation of the regular expression engine.
回溯Backtracking 说明何为回溯及其对正则表达式性能有何影响,并检查为回溯提供替代项的语言元素。Explains what backtracking is and how it affects regular expression performance, and examines language elements that provide alternatives to backtracking.
正则表达式语言 - 快速参考Regular Expression Language - Quick Reference 介绍 .NET 中的正则表达式语言的元素,并提供每个语言元素的详细文档链接。Describes the elements of the regular expression language in .NET and provides links to detailed documentation for each language element.