規則運算式中的回溯Backtracking in Regular Expressions

回溯 (Backtracking) 會在規則運算式模式包含選擇性的 數量詞交替建構,且規則運算式引擎返回之前儲存的狀態繼續搜尋相符項目時發生。Backtracking occurs when a regular expression pattern contains optional quantifiers or alternation constructs, and the regular expression engine returns to a previous saved state to continue its search for a match. 回溯是規則運算式的核心能力,可讓運算式功能強大且靈活,並且比對非常複雜的模式。Backtracking is central to the power of regular expressions; it makes it possible for expressions to be powerful and flexible, and to match very complex patterns. 但同時,這項強大功能需付出相當的代價。At the same time, this power comes at a cost. 回溯經常是影響規則運算式引擎之效能最重要的一項因素。Backtracking is often the single most important factor that affects the performance of the regular expression engine. 幸好開發人員能夠掌控規則運算式引擎的行為,以及其使用回溯的方式。Fortunately, the developer has control over the behavior of the regular expression engine and how it uses backtracking. 本主題將說明回溯運作的方式,以及如何進行控制。This topic explains how backtracking works and how it can be controlled.

注意

大致而言,像是 .NET 規則運算式引擎這類非決定性有限自動化 (NFA) 引擎將設計有效率且快速之規則運算式的責任交給了開發人員。In general, a Nondeterministic Finite Automaton (NFA) engine like .NET regular expression engine places the responsibility for crafting efficient, fast regular expressions on the developer.

此主題包括下列章節:This topic contains the following sections:

不進行回溯的線性比較Linear Comparison Without Backtracking

如果規則運算式模式沒有選擇性數量詞或交替建構,則規則運算式引擎會以線性時間執行。If a regular expression pattern has no optional quantifiers or alternation constructs, the regular expression engine executes in linear time. 也就是說,規則運算式引擎比對模式中的第一個語言項目與輸入字串中的文字之後,會嘗試比對模式中的下一個語言項目與輸入字串中的下一個字元或字元群組。That is, after the regular expression engine matches the first language element in the pattern with text in the input string, it tries to match the next language element in the pattern with the next character or group of characters in the input string. 這項作業會繼續進行,直到比對成功或失敗為止。This continues until the match either succeeds or fails. 無論成功或失敗,規則運算式引擎都會在輸入字串中一次前進一個字元。In either case, the regular expression engine advances by one character at a time in the input string.

下列範例提供一個實例。The following example provides an illustration. 規則運算式 e{2}\w\b 會尋找兩個出現字母 "e" 的位置,這後面接著任何文字字元,然後再接著字邊界。The regular expression e{2}\w\b looks for two occurrences of the letter "e" followed by any word character followed by a word boundary.

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string input = "needing a reed";
      string pattern = @"e{2}\w\b";
      foreach (Match match in Regex.Matches(input, pattern))
         Console.WriteLine("{0} found at position {1}", 
                           match.Value, match.Index);
   }
}
// The example displays the following output:
//       eed found at position 11
Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Dim input As String = "needing a reed"
      Dim pattern As String = "e{2}\w\b"
      For Each match As Match In Regex.Matches(input, pattern)
         Console.WriteLine("{0} found at position {1}", _
                           match.Value, match.Index)
      Next   
   End Sub
End Module
' The example displays the following output:
'       eed found at position 11

雖然這個規則運算式包括數量詞 {2},但仍然會以線性方式進行評估。Although this regular expression includes the quantifier {2}, it is evaluated in a linear manner. 規則運算式引擎不會回溯,因為 {2} 不是選擇性的數量詞,它指定了確切的數字,而不是前面的子運算式必須比對的可變次數。The regular expression engine does not backtrack because {2} is not an optional quantifier; it specifies an exact number and not a variable number of times that the previous subexpression must match. 因此,規則運算式引擎會嘗試比對規則運算式模式與輸入字串,如下表所示。As a result, the regular expression engine tries to match the regular expression pattern with the input string as shown in the following table.

作業Operation 模式中的位置Position in pattern 字串中的位置Position in string 結果Result
11 ee "needing a reed" (索引 0)"needing a reed" (index 0) 沒有符合的結果。No match.
22 ee "eeding a reed" (索引 1)"eeding a reed" (index 1) 可能符合的結果。Possible match.
33 e{2}e{2} "eding a reed" (索引 2)"eding a reed" (index 2) 可能符合的結果。Possible match.
44 \w\w "ding a reed" (索引 3)"ding a reed" (index 3) 可能符合的結果。Possible match.
55 \b\b "ing a reed" (索引 4)"ing a reed" (index 4) 可能符合的結果失敗。Possible match fails.
66 ee "eding a reed" (索引 2)"eding a reed" (index 2) 可能符合的結果。Possible match.
77 e{2}e{2} "ding a reed" (索引 3)"ding a reed" (index 3) 可能符合的結果失敗。Possible match fails.
88 ee "ding a reed" (索引 3)"ding a reed" (index 3) 比對會失敗。Match fails.
99 ee "ing a reed" (索引 4)"ing a reed" (index 4) 沒有符合的結果。No match.
1010 ee "ng a reed" (索引 5)"ng a reed" (index 5) 沒有符合的結果。No match.
1111 ee "g a reed" (索引 6)"g a reed" (index 6) 沒有符合的結果。No match.
1212 ee " a reed" (索引 7)" a reed" (index 7) 沒有符合的結果。No match.
1313 ee "a reed" (索引 8)"a reed" (index 8) 沒有符合的結果。No match.
1414 ee " reed" (索引 9)" reed" (index 9) 沒有符合的結果。No match.
1515 ee "a reed" (索引 10)"reed" (index 10) 沒有符合的結果No match
1616 ee "eed" (索引 11)"eed" (index 11) 可能符合的結果。Possible match.
1717 e{2}e{2} "ed" (索引 12)"ed" (index 12) 可能符合的結果。Possible match.
1818 \w\w "d" (索引 13)"d" (index 13) 可能符合的結果。Possible match.
1919 \b\b "" (索引 14)"" (index 14) 符合的結果。Match.

如果規則運算式模式未包含選擇性數量詞或交替建構,則比對規則運算式模式與輸入字串所需的比較次數上限,約相當於輸入字串中的字元數。If a regular expression pattern includes no optional quantifiers or alternation constructs, the maximum number of comparisons required to match the regular expression pattern with the input string is roughly equivalent to the number of characters in the input string. 在這個案例中,規則運算式引擎會使用 19 項比較找出這 13 個字元字串中可能的相符項目。In this case, the regular expression engine uses 19 comparisons to identify possible matches in this 13-character string. 換句話說,如果沒有選擇性數量詞或交替建構,規則運算式引擎就會以近似線性時間執行。In other words, the regular expression engine runs in near-linear time if it contains no optional quantifiers or alternation constructs.

回到頁首Back to top

含有選擇性數量詞或交替建構的回溯Backtracking with Optional Quantifiers or Alternation Constructs

當規則運算式包含選擇性數量詞或交替建構時,輸入字串的評估就不再是線性。When a regular expression includes optional quantifiers or alternation constructs, the evaluation of the input string is no longer linear. 與 NFA 引擎比對的模式是由規則運算式中的語言項目所引導,而不是由輸入字串中要比對的字元引導。Pattern matching with an NFA engine is driven by the language elements in the regular expression and not by the characters to be matched in the input string. 因此,規則運算式引擎會嘗試完整比對選擇性或替代子運算式。Therefore, the regular expression engine tries to fully match optional or alternative subexpressions. 當規則運算式引擎前進到子運算式中的下一個語言項目且比對失敗時,它可能會捨棄成功比對的一部分,並且返回之前儲存的狀態,以便完整比對規則運算式與輸入字串。When it advances to the next language element in the subexpression and the match is unsuccessful, the regular expression engine can abandon a portion of its successful match and return to an earlier saved state in the interest of matching the regular expression as a whole with the input string. 這個返回之前儲存狀態尋找符合結果的程序,就稱為回溯 (Backtracking)。This process of returning to a previous saved state to find a match is known as backtracking.

例如,想想規則運算式模式 .*(es),它會比對字元 "es" 與其前面的所有字元。For example, consider the regular expression pattern .*(es), which matches the characters "es" and all the characters that precede it. 如下列範例中所示,如果輸入字串為 "Essential services are provided by regular expressions.",則模式會比對整個字串直到 (且包含) "expressions" 中的 "es" 為止。As the following example shows, if the input string is "Essential services are provided by regular expressions.", the pattern matches the whole string up to and including the "es" in "expressions".

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string input = "Essential services are provided by regular expressions.";
      string pattern = ".*(es)"; 
      Match m = Regex.Match(input, pattern, RegexOptions.IgnoreCase);
      if (m.Success) {
         Console.WriteLine("'{0}' found at position {1}", 
                           m.Value, m.Index);
         Console.WriteLine("'es' found at position {0}", 
                           m.Groups[1].Index);      
      } 
   }
}
//    'Essential services are provided by regular expres' found at position 0
//    'es' found at position 47
Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Dim input As String = "Essential services are provided by regular expressions."
      Dim pattern As String = ".*(es)" 
      Dim m As Match = Regex.Match(input, pattern, RegexOptions.IgnoreCase)
      If m.Success Then
         Console.WriteLine("'{0}' found at position {1}", _
                           m.Value, m.Index)
         Console.WriteLine("'es' found at position {0}", _
                           m.Groups(1).Index)                  
      End If     
   End Sub
End Module
'    'Essential services are provided by regular expres' found at position 0
'    'es' found at position 47

為了進行這項比對,規則運算式引擎會使用回溯,如下所示:To do this, the regular expression engine uses backtracking as follows:

  • 它會比對 .* (其會比對零個、一個或多個任意字元) 與整個輸入字串。It matches the .* (which matches zero, one, or more occurrences of any character) with the whole input string.

  • 它會嘗試比對規則運算式模式中的 "e"。It attempts to match "e" in the regular expression pattern. 不過,輸入字串沒有其他可供比對的字元。However, the input string has no remaining characters available to match.

  • 接著它會回溯至上一個成功的比對結果 "Essential services are provided by regular expressions",並嘗試比對 "e" 與句尾的句號。It backtracks to its last successful match, "Essential services are provided by regular expressions", and attempts to match "e" with the period at the end of the sentence. 比對會失敗。The match fails.

  • 然後它會繼續回溯至前一個成功的比對結果,一次一個字元,直到暫時相符的子字串為 "Essential services are provided by regular expr"。It continues to backtrack to a previous successful match one character at a time until the tentatively matched substring is "Essential services are provided by regular expr". 然後運算式會比較模式中的 "e" 與 "expressions" 中的第二個 "e",並且尋找符合的結果。It then compares the "e" in the pattern to the second "e" in "expressions" and finds a match.

  • 接著再比較模式中的 "s" 與相符的 "e" 字元後面接著的 "s" ("expressions" 中的第一個 "s")。It compares "s" in the pattern to the "s" that follows the matched "e" character (the first "s" in "expressions"). 比對將會成功。The match is successful.

當您使用回溯時,比對規則運算式模式與長度 55 個字元的輸入字串需要進行 67 次比較作業。When you use backtracking, matching the regular expression pattern with the input string, which is 55 characters long, requires 67 comparison operations. 通常如果規則運算式模式包含單一交替建構或單一選擇性數量詞,則比對模式所需的比較作業次數會超過輸入字串中字元數的兩倍。Generally, if a regular expression pattern has a single alternation construct or a single optional quantifier, the number of comparison operations required to match the pattern is more than twice the number of characters in the input string.

回到頁首Back to top

含有巢狀選擇性數量詞的回溯Backtracking with Nested Optional Quantifiers

如果模式包含大量交替建構、包括巢狀交替建構,或是最常見的巢狀選擇性數量詞,則比對規則運算式模式所需的比較作業次數可能大幅增加。The number of comparison operations required to match a regular expression pattern can increase exponentially if the pattern includes a large number of alternation constructs, if it includes nested alternation constructs, or, most commonly, if it includes nested optional quantifiers. 例如,規則運算式模式 ^(a+)+$ 的設計為比對包含一個或多個 "a" 字元的完整字串。For example, the regular expression pattern ^(a+)+$ is designed to match a complete string that contains one or more "a" characters. 範例中提供了兩個長度相同的輸入字串,但只有第一個字串與模式相符。The example provides two input strings of identical length, but only the first string matches the pattern. System.Diagnostics.Stopwatch 類別可用來判斷比對作業進行的時間。The System.Diagnostics.Stopwatch class is used to determine how long the match operation takes.

using System;
using System.Diagnostics;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string pattern = "^(a+)+$";
      string[] inputs = { "aaaaaa", "aaaaa!" };
      Regex rgx = new Regex(pattern);
      Stopwatch sw;
      
      foreach (string input in inputs) {
         sw = Stopwatch.StartNew();   
         Match match = rgx.Match(input);
         sw.Stop();
         if (match.Success)
            Console.WriteLine("Matched {0} in {1}", match.Value, sw.Elapsed);
         else
            Console.WriteLine("No match found in {0}", sw.Elapsed);
      }
   }
}
Imports System.Diagnostics
Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Dim pattern As String = "^(a+)+$"
      Dim inputs() As String = { "aaaaaa", "aaaaa!" }
      Dim rgx As New Regex(pattern)
      Dim sw As Stopwatch
      
      For Each input As String In inputs
         sw = Stopwatch.StartNew()   
         Dim match As Match = rgx.Match(input)
         sw.Stop()
         If match.Success Then
            Console.WriteLine("Matched {0} in {1}", match.Value, sw.Elapsed)
         Else
            Console.WriteLine("No match found in {0}", sw.Elapsed)
         End If      
      Next
   End Sub
End Module

如範例的輸出所示,規則運算式引擎耗費約為識別比對字串的兩倍時間,才發現輸入字串與模式不相符。As the output from the example shows, the regular expression engine took about twice as long to find that an input string did not match the pattern as it did to identify a matching string. 這是因為失敗的比對永遠代表最糟的狀況。This is because an unsuccessful match always represents a worst-case scenario. 規則運算式引擎必須先使用規則運算式依循所有可能的途徑來處理資料,才能得到比對失敗的結果,而巢狀括號會製造許多額外的途徑來處理資料。The regular expression engine must use the regular expression to follow all possible paths through the data before it can conclude that the match is unsuccessful, and the nested parentheses create many additional paths through the data. 規則運算式引擎會藉由執行下列操作得出第二個字串與模式不相符的結果:The regular expression engine concludes that the second string did not match the pattern by doing the following:

  • 它會檢查是否位於字串開頭,然後比對字串中的前五個字元與 a+模式。It checks that it was at the beginning of the string, and then matches the first five characters in the string with the pattern a+. 然後判斷字串中沒有額外的 "a" 字元群組。It then determines that there are no additional groups of "a" characters in the string. 最後測試是否到達字串結尾。Finally, it tests for the end of the string. 由於字串中剩下一個額外的字串,因此比對失敗。Because one additional character remains in the string, the match fails. 這個失敗的比對需要經過 9 次比較。This failed match requires 9 comparisons. 規則運算式引擎還會儲存 "a" (以下稱為符合結果 1)、"aa" (符合結果 2)、"aaa" (符合結果 3) 和 "aaaa" (符合結果 4) 這些符合結果的狀態資訊。The regular expression engine also saves state information from its matches of "a" (which we will call match 1), "aa" (match 2), "aaa" (match 3), and "aaaa" (match 4).

  • 它會返回之前儲存的符合結果 4。It returns to the previously saved match 4. 然後判斷出有一個額外的 "a" 字元要指派至額外的擷取群組。It determines that there is one additional "a" character to assign to an additional captured group. 最後測試是否到達字串結尾。Finally, it tests for the end of the string. 由於字串中剩下一個額外的字串,因此比對失敗。Because one additional character remains in the string, the match fails. 這個失敗的比對需要經過 4 次比較。This failed match requires 4 comparisons. 到目前為止,總共執行了 13 次比較。So far, a total of 13 comparisons have been performed.

  • 它會返回之前儲存的符合結果 3。It returns to the previously saved match 3. 然後判斷出有兩個額外的 "a" 字元要指派至額外的擷取群組。It determines that there are two additional "a" characters to assign to an additional captured group. 然而,字串結尾測試失敗。However, the end-of-string test fails. 接著它會返回符合結果 3,並嘗試比對兩個額外的擷取群組中的這兩個額外的 "a" 字元。It then returns to match3 and tries to match the two additional "a" characters in two additional captured groups. 字串結尾測試仍然失敗。The end-of-string test still fails. 這些失敗的比對需要經過 12 次比較。These failed matches require 12 comparisons. 到目前為止,總共執行了 25 次比較。So far, a total of 25 comparisons have been performed.

輸入字串與規則運算式的比較會依照這種方式繼續進行,直到規則運算式引擎嘗試過所有可能的比對組合,然後得到沒有符合的結果這個結論。Comparison of the input string with the regular expression continues in this way until the regular expression engine has tried all possible combinations of matches, and then concludes that there is no match. 由於巢狀數量詞的關係,這個比較是 O(2n) 或指數運算,其中 n 是輸入字串中的字元數。Because of the nested quantifiers, this comparison is an O(2n) or an exponential operation, where n is the number of characters in the input string. 這表示,在最糟的情況下,包含 30 個字元的輸入字串約需要進行 1,073,741,824 次比較,而包含 40 個字元的輸入字串約需要進行 1,099,511,627,776 次比較。This means that in the worst case, an input string of 30 characters requires approximately 1,073,741,824 comparisons, and an input string of 40 characters requires approximately 1,099,511,627,776 comparisons. 如果您使用這類長度甚至更長的字串,則規則運算式方法處理不符合規則運算式模式的輸入時,可能需要相當長的時間才能完成。If you use strings of these or even greater lengths, regular expression methods can take an extremely long time to complete when they process input that does not match the regular expression pattern.

回到頁首Back to top

控制回溯Controlling Backtracking

回溯可讓您建立強大、靈活的規則運算式。Backtracking lets you create powerful, flexible regular expressions. 不過,如上一節所示,這些好處可能伴隨著令人敬謝不敏的低落效能。However, as the previous section showed, these benefits may be coupled with unacceptably poor performance. 為了避免大量回溯,當您具現化 Regex 物件或呼叫靜態規則運算式比對方法時,應該定義逾時間隔。To prevent excessive backtracking, you should define a time-out interval when you instantiate a Regex object or call a static regular expression matching method. 下一節將討論這個部分。This is discussed in the next section. 此外,.NET 支援三個規則運算式語言元素,可限制或隱藏回溯並支援複雜的規則運算式,而且對效能的影響微不足道:非回溯子運算式左合樣判斷提示右合樣判斷提示In addition, .NET supports three regular expression language elements that limit or suppress backtracking and that support complex regular expressions with little or no performance penalty: nonbacktracking subexpressions, lookbehind assertions, and lookahead assertions. 如需各語言項目的詳細資訊,請參閱 Grouping ConstructsFor more information about each language element, see Grouping Constructs.

定義逾時間隔Defining a Time-out Interval

從 .NET Framework 4.5 開始,您可以設定逾時值,表示規則運算式引擎開始搜尋單一符合項目到放棄嘗試並擲回 RegexMatchTimeoutException 例外狀況之前的最長間隔。Starting with the .NET Framework 4.5, you can set a time-out value that represents the longest interval the regular expression engine will search for a single match before it abandons the attempt and throws a RegexMatchTimeoutException exception. 您可以提供執行個體規則運算式之 TimeSpan 建構函式的 Regex.Regex(String, RegexOptions, TimeSpan) 值,藉此指定逾時間隔。You specify the time-out interval by supplying a TimeSpan value to the Regex.Regex(String, RegexOptions, TimeSpan) constructor for instance regular expressions. 此外,每一個靜態模式比對方法都有 TimeSpan 參數的多載,可讓您指定逾時值。In addition, each static pattern matching method has an overload with a TimeSpan parameter that allows you to specify a time-out value. 根據預設,逾時間隔會設為 Regex.InfiniteMatchTimeout ,表示規則運算式引擎不會逾時。By default, the time-out interval is set to Regex.InfiniteMatchTimeout and the regular expression engine does not time out.

重要

如果您的規則運算式倚賴回溯,建議您一定要設定逾時間隔。We recommend that you always set a time-out interval if your regular expression relies on backtracking.

RegexMatchTimeoutException 例外狀況表示規則運算式引擎在指定的逾時間隔內找不到相符項目,但不會指出擲回例外狀況的原因。A RegexMatchTimeoutException exception indicates that the regular expression engine was unable to find a match within the specified time-out interval but does not indicate why the exception was thrown. 這個原因可能是大量回溯,不過也有可能是對於擲回例外狀況當時的系統負載而言,設定的逾時間隔太低。The reason might be excessive backtracking, but it is also possible that the time-out interval was set too low given the system load at the time the exception was thrown. 當您處理例外狀況時,可以選擇中放棄一步比對輸入字串,或增加逾時間隔並重試比對作業。When you handle the exception, you can choose to abandon further matches with the input string or increase the time-out interval and retry the matching operation.

例如,下列程式碼會呼叫 Regex.Regex(String, RegexOptions, TimeSpan) 建構函式來具現化逾時值為一秒的 Regex 物件。For example, the following code calls the Regex.Regex(String, RegexOptions, TimeSpan) constructor to instantiate a Regex object with a time-out value of one second. 規則運算式模式 (a+)+$會在行尾比對一個或多個 "a" 字元的一個或多個序列,並且受限於大量回溯。The regular expression pattern (a+)+$, which matches one or more sequences of one or more "a" characters at the end of a line, is subject to excessive backtracking. 如果擲回 RegexMatchTimeoutException ,則範例會將逾時值增加至最大間隔三秒。If a RegexMatchTimeoutException is thrown, the example increases the time-out value up to a maximum interval of three seconds. 在這個間隔之後,它就會放棄嘗試比對模式。After that, it abandons the attempt to match the pattern.

using System;
using System.ComponentModel;
using System.Diagnostics;
using System.Security;
using System.Text.RegularExpressions;
using System.Threading; 

public class Example
{
   const int MaxTimeoutInSeconds = 3;

   public static void Main()
   {
      string pattern = @"(a+)+$";    // DO NOT REUSE THIS PATTERN.
      Regex rgx = new Regex(pattern, RegexOptions.IgnoreCase, TimeSpan.FromSeconds(1));       
      Stopwatch sw = null;
      
      string[] inputs= { "aa", "aaaa>", 
                         "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
                         "aaaaaaaaaaaaaaaaaaaaaa>",
                         "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa>" };
                                 
      foreach (var inputValue in inputs) {
         Console.WriteLine("Processing {0}", inputValue);
         bool timedOut = false;
         do { 
            try {
               sw = Stopwatch.StartNew();
               // Display the result.
               if (rgx.IsMatch(inputValue)) {
                  sw.Stop();
                  Console.WriteLine(@"Valid: '{0}' ({1:ss\.fffffff} seconds)", 
                                    inputValue, sw.Elapsed); 
               }
               else {
                  sw.Stop();
                  Console.WriteLine(@"'{0}' is not a valid string. ({1:ss\.fffff} seconds)", 
                                    inputValue, sw.Elapsed);
               }
            }
            catch (RegexMatchTimeoutException e) {   
               sw.Stop();
               // Display the elapsed time until the exception.
               Console.WriteLine(@"Timeout with '{0}' after {1:ss\.fffff}", 
                                 inputValue, sw.Elapsed);
               Thread.Sleep(1500);       // Pause for 1.5 seconds.

               // Increase the timeout interval and retry.
               TimeSpan timeout = e.MatchTimeout.Add(TimeSpan.FromSeconds(1));
               if (timeout.TotalSeconds > MaxTimeoutInSeconds) {
                  Console.WriteLine("Maximum timeout interval of {0} seconds exceeded.",
                                    MaxTimeoutInSeconds);
                  timedOut = false;
               }
               else {               
                  Console.WriteLine("Changing the timeout interval to {0}", 
                                    timeout); 
                  rgx = new Regex(pattern, RegexOptions.IgnoreCase, timeout);
                  timedOut = true;
               }
            }
         } while (timedOut);
         Console.WriteLine();
      }   
   }
}
// The example displays output like the following :
//    Processing aa
//    Valid: 'aa' (00.0000779 seconds)
//    
//    Processing aaaa>
//    'aaaa>' is not a valid string. (00.00005 seconds)
//    
//    Processing aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
//    Valid: 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa' (00.0000043 seconds)
//    
//    Processing aaaaaaaaaaaaaaaaaaaaaa>
//    Timeout with 'aaaaaaaaaaaaaaaaaaaaaa>' after 01.00469
//    Changing the timeout interval to 00:00:02
//    Timeout with 'aaaaaaaaaaaaaaaaaaaaaa>' after 02.01202
//    Changing the timeout interval to 00:00:03
//    Timeout with 'aaaaaaaaaaaaaaaaaaaaaa>' after 03.01043
//    Maximum timeout interval of 3 seconds exceeded.
//    
//    Processing aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa>
//    Timeout with 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa>' after 03.01018
//    Maximum timeout interval of 3 seconds exceeded.
Imports System.ComponentModel
Imports System.Diagnostics
Imports System.Security
Imports System.Text.RegularExpressions
Imports System.Threading 

Module Example
   Const MaxTimeoutInSeconds As Integer = 3
   
   Public Sub Main()
      Dim pattern As String = "(a+)+$"    ' DO NOT REUSE THIS PATTERN.
      Dim rgx As New Regex(pattern, RegexOptions.IgnoreCase, TimeSpan.FromSeconds(1))       
      Dim sw As Stopwatch = Nothing
      
      Dim inputs() As String = { "aa", "aaaa>", 
                                 "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
                                 "aaaaaaaaaaaaaaaaaaaaaa>",
                                 "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa>" }
                                 
      For Each inputValue In inputs
         Console.WriteLine("Processing {0}", inputValue)
         Dim timedOut As Boolean = False
         Do 
            Try
               sw = Stopwatch.StartNew()
               ' Display the result.
               If rgx.IsMatch(inputValue) Then
                  sw.Stop()
                  Console.WriteLine("Valid: '{0}' ({1:ss\.fffffff} seconds)", 
                                    inputValue, sw.Elapsed) 
               Else
                  sw.Stop()
                  Console.WriteLine("'{0}' is not a valid string. ({1:ss\.fffff} seconds)", 
                                    inputValue, sw.Elapsed)
               End If
            Catch e As RegexMatchTimeoutException   
               sw.Stop()
               ' Display the elapsed time until the exception.
               Console.WriteLine("Timeout with '{0}' after {1:ss\.fffff}", 
                                 inputValue, sw.Elapsed)
               Thread.Sleep(1500)       ' Pause for 1.5 seconds.

               ' Increase the timeout interval and retry.
               Dim timeout As TimeSpan = e.MatchTimeout.Add(TimeSpan.FromSeconds(1))
               If timeout.TotalSeconds > MaxTimeoutInSeconds Then
                  Console.WriteLine("Maximum timeout interval of {0} seconds exceeded.",
                                    MaxTimeoutInSeconds)
                  timedOut = False
               Else                
                  Console.WriteLine("Changing the timeout interval to {0}", 
                                    timeout) 
                  rgx = New Regex(pattern, RegexOptions.IgnoreCase, timeout)
                  timedOut = True
               End If
            End Try
         Loop While timedOut
         Console.WriteLine()
      Next   
   End Sub 
End Module
' The example displays output like the following:
'    Processing aa
'    Valid: 'aa' (00.0000779 seconds)
'    
'    Processing aaaa>
'    'aaaa>' is not a valid string. (00.00005 seconds)
'    
'    Processing aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
'    Valid: 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa' (00.0000043 seconds)
'    
'    Processing aaaaaaaaaaaaaaaaaaaaaa>
'    Timeout with 'aaaaaaaaaaaaaaaaaaaaaa>' after 01.00469
'    Changing the timeout interval to 00:00:02
'    Timeout with 'aaaaaaaaaaaaaaaaaaaaaa>' after 02.01202
'    Changing the timeout interval to 00:00:03
'    Timeout with 'aaaaaaaaaaaaaaaaaaaaaa>' after 03.01043
'    Maximum timeout interval of 3 seconds exceeded.
'    
'    Processing aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa>
'    Timeout with 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa>' after 03.01018
'    Maximum timeout interval of 3 seconds exceeded.

非回溯子運算式Nonbacktracking Subexpression

(?> 子運算式) 語言項目會隱藏子運算式中的回溯。The (?> subexpression) language element suppresses backtracking in a subexpression. 對於避免發生與比對失敗相關的效能問題相當有用。It is useful for preventing the performance problems associated with failed matches.

下列範例說明隱藏回溯如何在使用巢狀數量詞時改善效能。The following example illustrates how suppressing backtracking improves performance when using nested quantifiers. 它會測量規則運算式引擎判斷出輸入字串與兩個規則運算式不相符所需的時間。It measures the time required for the regular expression engine to determine that an input string does not match two regular expressions. 第一個規則運算式會使用回溯嘗試比對包含出現一次或多次的一個或多個十六進位數字的字串,後面接著一個冒號,再接著一個或多個十六進位數字,最後接著兩個冒號。The first regular expression uses backtracking to attempt to match a string that contains one or more occurrences of one or more hexadecimal digits, followed by a colon, followed by one or more hexadecimal digits, followed by two colons. 第二個規則運算式與第一個完全相同,但會停用回溯。The second regular expression is identical to the first, except that it disables backtracking. 如範例的輸出所示,停用回溯使效能大幅提升。As the output from the example shows, the performance improvement from disabling backtracking is significant.

using System;
using System.Diagnostics;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string input = "b51:4:1DB:9EE1:5:27d60:f44:D4:cd:E:5:0A5:4a:D24:41Ad:";
      bool matched;
      Stopwatch sw;
      
      Console.WriteLine("With backtracking:");
      string backPattern = "^(([0-9a-fA-F]{1,4}:)*([0-9a-fA-F]{1,4}))*(::)$";
      sw = Stopwatch.StartNew();
      matched = Regex.IsMatch(input, backPattern);
      sw.Stop();
      Console.WriteLine("Match: {0} in {1}", Regex.IsMatch(input, backPattern), sw.Elapsed);
      Console.WriteLine();
      
      Console.WriteLine("Without backtracking:");
      string noBackPattern = "^((?>[0-9a-fA-F]{1,4}:)*(?>[0-9a-fA-F]{1,4}))*(::)$";
      sw = Stopwatch.StartNew();
      matched = Regex.IsMatch(input, noBackPattern);
      sw.Stop();
      Console.WriteLine("Match: {0} in {1}", Regex.IsMatch(input, noBackPattern), sw.Elapsed);
   }
}
// The example displays output like the following:
//       With backtracking:
//       Match: False in 00:00:27.4282019
//       
//       Without backtracking:
//       Match: False in 00:00:00.0001391
Imports System.Diagnostics
Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Dim input As String = "b51:4:1DB:9EE1:5:27d60:f44:D4:cd:E:5:0A5:4a:D24:41Ad:"
      Dim matched As Boolean
      Dim sw As Stopwatch
      
      Console.WriteLine("With backtracking:")
      Dim backPattern As String = "^(([0-9a-fA-F]{1,4}:)*([0-9a-fA-F]{1,4}))*(::)$"
      sw = Stopwatch.StartNew()
      matched = Regex.IsMatch(input, backPattern)
      sw.Stop()
      Console.WriteLine("Match: {0} in {1}", Regex.IsMatch(input, backPattern), sw.Elapsed)
      Console.WriteLine()
      
      Console.WriteLine("Without backtracking:")
      Dim noBackPattern As String = "^((?>[0-9a-fA-F]{1,4}:)*(?>[0-9a-fA-F]{1,4}))*(::)$"
      sw = Stopwatch.StartNew()
      matched = Regex.IsMatch(input, noBackPattern)
      sw.Stop()
      Console.WriteLine("Match: {0} in {1}", Regex.IsMatch(input, noBackPattern), sw.Elapsed)
   End Sub
End Module
' The example displays the following output:
'       With backtracking:
'       Match: False in 00:00:27.4282019
'       
'       Without backtracking:
'       Match: False in 00:00:00.0001391

左合樣判斷提示Lookbehind Assertions

.NET 包含兩個語言元素:(?<=subexpression)(?<!subexpression),這兩者會比對輸入字串中的前一個或多個字元。.NET includes two language elements, (?<=subexpression) and (?<!subexpression), that match the previous character or characters in the input string. 這兩個語言項目都是零寬度判斷提示,也就是說,它們會判斷緊接著目前字元前面的字元是否可由 子運算式比對,而不需前進或回溯。Both language elements are zero-width assertions; that is, they determine whether the character or characters that immediately precede the current character can be matched by subexpression, without advancing or backtracking.

(?<= 子運算式 ) 是左合樣判斷提示,也就是說,目前位置前面的字元必須符合 子運算式(?<= subexpression ) is a positive lookbehind assertion; that is, the character or characters before the current position must match subexpression. (?<!子運算式) 是左不合樣判斷提示,也就是說,目前位置前面的字元必須不符合 子運算式(?<!subexpression) is a negative lookbehind assertion; that is, the character or characters before the current position must not match subexpression. 子運算式 為前一個子運算式的子集時,左合樣和左不合樣判斷提示最為實用。Both positive and negative lookbehind assertions are most useful when subexpression is a subset of the previous subexpression.

下列範例使用兩個同等的規則運算式模式,來驗證電子郵件地址中的使用者名稱。The following example uses two equivalent regular expression patterns that validate the user name in an email address. 第一個模式因為進行大量回溯,而受限於低落的效能。The first pattern is subject to poor performance because of excessive backtracking. 第二個模式修改了第一個規則運算式,將巢狀數量詞取代為左合樣判斷提示。The second pattern modifies the first regular expression by replacing a nested quantifier with a positive lookbehind assertion. 範例的輸入顯示 Regex.IsMatch 方法的執行時間。The output from the example displays the execution time of the Regex.IsMatch method.

using System;
using System.Diagnostics;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      Stopwatch sw;
      string input = "test@contoso.com";
      bool result;
      
      string pattern = @"^[0-9A-Z]([-.\w]*[0-9A-Z])?@";
      sw = Stopwatch.StartNew();
      result = Regex.IsMatch(input, pattern, RegexOptions.IgnoreCase);
      sw.Stop();
      Console.WriteLine("Match: {0} in {1}", result, sw.Elapsed);
      
      string behindPattern = @"^[0-9A-Z][-.\w]*(?<=[0-9A-Z])@";
      sw = Stopwatch.StartNew();
      result = Regex.IsMatch(input, behindPattern, RegexOptions.IgnoreCase);
      sw.Stop();
      Console.WriteLine("Match with Lookbehind: {0} in {1}", result, sw.Elapsed);
   }
}
// The example displays output similar to the following:
//       Match: True in 00:00:00.0017549
//       Match with Lookbehind: True in 00:00:00.0000659
Module Example
   Public Sub Main()
      Dim sw As Stopwatch
      Dim input As String = "test@contoso.com"
      Dim result As Boolean
      
      Dim pattern As String = "^[0-9A-Z]([-.\w]*[0-9A-Z])?@"
      sw = Stopwatch.StartNew()
      result = Regex.IsMatch(input, pattern, RegexOptions.IgnoreCase)
      sw.Stop()
      Console.WriteLine("Match: {0} in {1}", result, sw.Elapsed)
      
      Dim behindPattern As String = "^[0-9A-Z][-.\w]*(?<=[0-9A-Z])@"
      sw = Stopwatch.StartNew()
      result = Regex.IsMatch(input, behindPattern, RegexOptions.IgnoreCase)
      sw.Stop()
      Console.WriteLine("Match with Lookbehind: {0} in {1}", result, sw.Elapsed)
   End Sub
End Module
' The example displays output similar to the following:
'       Match: True in 00:00:00.0017549
'       Match with Lookbehind: True in 00:00:00.0000659

第一個規則運算式模式 ^[0-9A-Z]([-.\w]*[0-9A-Z])*@的定義如下表所示。The first regular expression pattern, ^[0-9A-Z]([-.\w]*[0-9A-Z])*@, is defined as shown in the following table.

模式Pattern 說明Description
^ 從字串的開頭開始比對。Start the match at the beginning of the string.
[0-9A-Z] 比對英數字元。Match an alphanumeric character. 這項比較不區分大小寫,因為 Regex.IsMatch 方法是使用 RegexOptions.IgnoreCase 選項呼叫。This comparison is case-insensitive, because the Regex.IsMatch method is called with the RegexOptions.IgnoreCase option.
[-.\w]* 比對出現零次、一次或多次的連字號、句號或文字字元。Match zero, one, or more occurrences of a hyphen, period, or word character.
[0-9A-Z] 比對英數字元。Match an alphanumeric character.
([-.\w]*[0-9A-Z])* 比對出現零次或多次的零個或多個連字號、句號或文字字元組合,後面接著英數字元。Match zero or more occurrences of the combination of zero or more hyphens, periods, or word characters, followed by an alphanumeric character. 這是第一個擷取群組。This is the first capturing group.
@ 比對 "@" 記號。Match an at sign ("@").

第二個規則運算式模式 ^[0-9A-Z][-.\w]*(?<=[0-9A-Z])@使用左合樣判斷提示。The second regular expression pattern, ^[0-9A-Z][-.\w]*(?<=[0-9A-Z])@, uses a positive lookbehind assertion. 其定義方式如下表所示。It is defined as shown in the following table.

模式Pattern 說明Description
^ 從字串的開頭開始比對。Start the match at the beginning of the string.
[0-9A-Z] 比對英數字元。Match an alphanumeric character. 這項比較不區分大小寫,因為 Regex.IsMatch 方法是使用 RegexOptions.IgnoreCase 選項呼叫。This comparison is case-insensitive, because the Regex.IsMatch method is called with the RegexOptions.IgnoreCase option.
[-.\w]* 比對出現零次或多次的連字號、句號或文字字元。Match zero or more occurrences of a hyphen, period, or word character.
(?<=[0-9A-Z]) 如果是英數字,則向左合樣最後一個符合的字元並繼續比對。Look back at the last matched character and continue the match if it is alphanumeric. 請注意,英數字元是由句號、連字號和所有文字字元組成之集合的子集。Note that alphanumeric characters are a subset of the set that consists of periods, hyphens, and all word characters.
@ 比對 "@" 記號。Match an at sign ("@").

右合樣判斷提示Lookahead Assertions

.NET 包含兩個語言元素:(?=subexpression)(?!subexpression),這兩者會比對輸入字串中的下一個或多個字元。.NET includes two language elements, (?=subexpression) and (?!subexpression), that match the next character or characters in the input string. 這兩個語言項目都是零寬度判斷提示,也就是說,它們會判斷緊接著目前字元後面的字元是否可由 子運算式比對,而不需前進或回溯。Both language elements are zero-width assertions; that is, they determine whether the character or characters that immediately follow the current character can be matched by subexpression, without advancing or backtracking.

(?= 子運算式 ) 是右合樣判斷提示,也就是說,目前位置後面的字元必須符合 子運算式(?= subexpression ) is a positive lookahead assertion; that is, the character or characters after the current position must match subexpression. (?!子運算式) 是右不合樣判斷提示,也就是說,目前位置後面的字元必須不符合 子運算式(?!subexpression) is a negative lookahead assertion; that is, the character or characters after the current position must not match subexpression. 子運算式 為下一個子運算式的子集時,右合樣和右不合樣判斷提示最為實用。Both positive and negative lookahead assertions are most useful when subexpression is a subset of the next subexpression.

下列範例使用兩個同等的規則運算式模式,這兩個模式會驗證完整類型名稱。The following example uses two equivalent regular expression patterns that validate a fully qualified type name. 第一個模式因為進行大量回溯,而受限於低落的效能。The first pattern is subject to poor performance because of excessive backtracking. 第二個修改了第一個規則運算式,將巢狀數量詞取代為右合樣判斷提示。The second modifies the first regular expression by replacing a nested quantifier with a positive lookahead assertion. 範例的輸入顯示 Regex.IsMatch 方法的執行時間。The output from the example displays the execution time of the Regex.IsMatch method.

using System;
using System.Diagnostics;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string input = "aaaaaaaaaaaaaaaaaaaaaa.";
      bool result;
      Stopwatch sw;
      
      string pattern = @"^(([A-Z]\w*)+\.)*[A-Z]\w*$";
      sw = Stopwatch.StartNew();
      result = Regex.IsMatch(input, pattern, RegexOptions.IgnoreCase);
      sw.Stop();
      Console.WriteLine("{0} in {1}", result, sw.Elapsed);

      string aheadPattern = @"^((?=[A-Z])\w+\.)*[A-Z]\w*$";
      sw = Stopwatch.StartNew();
      result = Regex.IsMatch(input, aheadPattern, RegexOptions.IgnoreCase);
      sw.Stop();
      Console.WriteLine("{0} in {1}", result, sw.Elapsed);
   }
}
// The example displays the following output:
//       False in 00:00:03.8003793
//       False in 00:00:00.0000866
Imports System.Diagnostics
Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Dim input As String = "aaaaaaaaaaaaaaaaaaaaaa."
      Dim result As Boolean
      Dim sw As Stopwatch
      
      Dim pattern As String = "^(([A-Z]\w*)+\.)*[A-Z]\w*$"
      sw = Stopwatch.StartNew()
      result = Regex.IsMatch(input, pattern, RegexOptions.IgnoreCase)
      sw.Stop()
      Console.WriteLine("{0} in {1}", result, sw.Elapsed)

      Dim aheadPattern As String = "^((?=[A-Z])\w+\.)*[A-Z]\w*$"
      sw = Stopwatch.StartNew()
      result = Regex.IsMatch(input, aheadPattern, RegexOptions.IgnoreCase)
      sw.Stop()
      Console.WriteLine("{0} in {1}", result, sw.Elapsed)
   End Sub
End Module
' The example displays the following output:
'       False in 00:00:03.8003793
'       False in 00:00:00.0000866

第一個規則運算式模式 ^(([A-Z]\w*)+\.)*[A-Z]\w*$的定義如下表所示。The first regular expression pattern, ^(([A-Z]\w*)+\.)*[A-Z]\w*$, is defined as shown in the following table.

模式Pattern 說明Description
^ 從字串的開頭開始比對。Start the match at the beginning of the string.
([A-Z]\w*)+\. 比對後面接著零個或多個文字字元的字母字元 (A-Z) 一次或多次,後面接著句號。Match an alphabetical character (A-Z) followed by zero or more word characters one or more times, followed by a period. 這項比較不區分大小寫,因為 Regex.IsMatch 方法是使用 RegexOptions.IgnoreCase 選項呼叫。This comparison is case-insensitive, because the Regex.IsMatch method is called with the RegexOptions.IgnoreCase option.
(([A-Z]\w*)+\.)* 比對上一個模式零次或多次。Match the previous pattern zero or more times.
[A-Z]\w* 比對後面接著零個或多個文字字元的字母字元。Match an alphabetical character followed by zero or more word characters.
$ 在輸入字串結尾結束比對。End the match at the end of the input string.

第二個規則運算式模式 ^((?=[A-Z])\w+\.)*[A-Z]\w*$使用右合樣判斷提示。The second regular expression pattern, ^((?=[A-Z])\w+\.)*[A-Z]\w*$, uses a positive lookahead assertion. 其定義方式如下表所示。It is defined as shown in the following table.

模式Pattern 說明Description
^ 從字串的開頭開始比對。Start the match at the beginning of the string.
(?=[A-Z]) 如果是字母 (A-Z),則向右合樣至第一個字元並繼續比對。Look ahead to the first character and continue the match if it is alphabetical (A-Z). 這項比較不區分大小寫,因為 Regex.IsMatch 方法是使用 RegexOptions.IgnoreCase 選項呼叫。This comparison is case-insensitive, because the Regex.IsMatch method is called with the RegexOptions.IgnoreCase option.
\w+\. 比對後面接著句號的零個或多個文字字元。Match one or more word characters followed by a period.
((?=[A-Z])\w+\.)* 比對後面接句號的一個或多個文字字元這個模式一次或多次。Match the pattern of one or more word characters followed by a period zero or more times. 初始文字字元必須是字母。The initial word character must be alphabetical.
[A-Z]\w* 比對後面接著零個或多個文字字元的字母字元。Match an alphabetical character followed by zero or more word characters.
$ 在輸入字串結尾結束比對。End the match at the end of the input string.

回到頁首Back to top

另請參閱See also