正则表达式中的限定符Quantifiers in Regular Expressions

限定符指定输入中必须存在字符、组或字符类的多少实例才能找到匹配项。Quantifiers specify how many instances of a character, group, or character class must be present in the input for a match to be found. 下表列出了 .NET 支持的限定符。The following table lists the quantifiers supported by .NET.

贪婪限定符Greedy quantifier 惰性限定符Lazy quantifier 说明Description
* *? 匹配零次或多次。Match zero or more times.
+ +? 匹配一次或多次。Match one or more times.
? ?? 匹配零次或一次。Match zero or one time.
{ n }{ n } { n }?{ n }? 恰好匹配 n 次 。Match exactly n times.
{ n ,}{ n ,} { n ,}?{ n ,}? 至少匹配 n 次 。Match at least n times.
{ n , m }{ n , m } { n , m }?{ n , m }? 匹配 n 到 m 次 。Match from n to m times.

数量 nm 是整数常量。The quantities n and m are integer constants. 通常,限定符是贪婪的;它们使正则表达式引擎匹配尽可能多的特定模式实例。Ordinarily, quantifiers are greedy; they cause the regular expression engine to match as many occurrences of particular patterns as possible. 向限定符追加 ? 字符可使它成为惰性的;会使正则表达式引擎匹配尽可能少的实例。Appending the ? character to a quantifier makes it lazy; it causes the regular expression engine to match as few occurrences as possible. 有关贪婪与惰性限定符之间的差异的完整说明,请参见本主题后面的贪婪与惰性限定符部分。For a complete description of the difference between greedy and lazy quantifiers, see the section Greedy and Lazy Quantifiers later in this topic.

重要

嵌套限定符(例如正则表达式模式 (a*)* 的行为)可以按输入字符串中的字符数的指数函数形式,来增加正则表达式引擎必须执行的比较次数。Nesting quantifiers (for example, as the regular expression pattern (a*)* does) can increase the number of comparisons that the regular expression engine must perform, as an exponential function of the number of characters in the input string. 若要详细了解此行为及其解决方法,请参阅回溯For more information about this behavior and its workarounds, see Backtracking.

正则表达式限定符Regular Expression Quantifiers

以下部分列出了 .NET 正则表达式支持的限定符。The following sections list the quantifiers supported by .NET regular expressions.

备注

如果在正则表达式模式中遇到 *、+、?、{ 和 } 字符,正则表达式引擎会将它们解释为量符或量符构造的一部分,除非它们包含在字符类中。If the *, +, ?, {, and } characters are encountered in a regular expression pattern, the regular expression engine interprets them as quantifiers or part of quantifier constructs unless they are included in a character class. 若要在字符类外部将这些字符解释文本字符,必须通过在它们前面加反斜杠来对它们进行转义。To interpret these as literal characters outside a character class, you must escape them by preceding them with a backslash. 例如,正则表达式模式中的字符串 \* 会被解释为文本星号(“*”)字符。For example, the string \* in a regular expression pattern is interpreted as a literal asterisk ("*") character.

匹配零次或多次:*Match Zero or More Times: *

* 限定符与前面的元素匹配零次或多次。The * quantifier matches the preceding element zero or more times. 它相当于 {0,} 量符。It is equivalent to the {0,} quantifier. * 是贪婪量符,相当的惰性量符是 *?* is a greedy quantifier whose lazy equivalent is *?.

下面的示例说明此正则表达式。The following example illustrates this regular expression. 在输入字符串中的九个数字中,五个与模式匹配,四个(9592992199919)不匹配。Of the nine digits in the input string, five match the pattern and four (95, 929, 9219, and 9919) do not.

string pattern = @"\b91*9*\b";   
string input = "99 95 919 929 9119 9219 999 9919 91119";
foreach (Match match in Regex.Matches(input, pattern))
   Console.WriteLine("'{0}' found at position {1}.", match.Value, match.Index);
     
// The example displays the following output:   
//       '99' found at position 0.
//       '919' found at position 6.
//       '9119' found at position 14.
//       '999' found at position 24.
//       '91119' found at position 33.
Dim pattern As String = "\b91*9*\b"   
Dim input As String = "99 95 919 929 9119 9219 999 9919 91119"
For Each match As Match In Regex.Matches(input, pattern)
   Console.WriteLine("'{0}' found at position {1}.", match.Value, match.Index)
Next     
' The example displays the following output:   
'       '99' found at position 0.
'       '919' found at position 6.
'       '9119' found at position 14.
'       '999' found at position 24.
'       '91119' found at position 33.

正则表达式模式的定义如下表所示。The regular expression pattern is defined as shown in the following table.

模式Pattern 说明Description
\b 在单词边界处开始。Start at a word boundary.
91* 匹配后跟零个或多个“1”字符的“9”。Match a "9" followed by zero or more "1" characters.
9* 匹配零个或多个“9”字符。Match zero or more "9" characters.
\b 在字边界结束。End at a word boundary.

匹配一次或多次:+Match One or More Times: +

+ 量符匹配上一元素一次或多次。The + quantifier matches the preceding element one or more times. 它相当于 {1,}It is equivalent to {1,}. + 是贪婪量符,相当的惰性量符是 +?+ is a greedy quantifier whose lazy equivalent is +?.

例如,正则表达式 \ban+\w*?\b 会尝试匹配以后跟字母 n 的一个或多个实例的字母 a 开头的完整单词。For example, the regular expression \ban+\w*?\b tries to match entire words that begin with the letter a followed by one or more instances of the letter n. 下面的示例说明此正则表达式。The following example illustrates this regular expression. 正则表达式会匹配单词 anannualannouncementantique,并且正确地无法匹配 autumnallThe regular expression matches the words an, annual, announcement, and antique, and correctly fails to match autumn and all.

string pattern = @"\ban+\w*?\b";

string input = "Autumn is a great time for an annual announcement to all antique collectors.";
foreach (Match match in Regex.Matches(input, pattern, RegexOptions.IgnoreCase))
   Console.WriteLine("'{0}' found at position {1}.", match.Value, match.Index);
   
// The example displays the following output:   
//       'an' found at position 27.
//       'annual' found at position 30.
//       'announcement' found at position 37.
//       'antique' found at position 57.      
Dim pattern As String = "\ban+\w*?\b"

Dim input As String = "Autumn is a great time for an annual announcement to all antique collectors."
For Each match As Match In Regex.Matches(input, pattern, RegexOptions.IgnoreCase)
   Console.WriteLine("'{0}' found at position {1}.", match.Value, match.Index)
Next   
' The example displays the following output:   
'       'an' found at position 27.
'       'annual' found at position 30.
'       'announcement' found at position 37.
'       'antique' found at position 57.      

正则表达式模式的定义如下表所示。The regular expression pattern is defined as shown in the following table.

模式Pattern 说明Description
\b 在单词边界处开始。Start at a word boundary.
an+ 匹配后跟一个或多个“n”字符的“a”。Match an "a" followed by one or more "n" characters.
\w*? 匹配单词字符零次或多次,但次数尽可能少。Match a word character zero or more times, but as few times as possible.
\b 在字边界结束。End at a word boundary.

匹配零次或一次:?Match Zero or One Time: ?

? 量符匹配上一元素零次或一次。The ? quantifier matches the preceding element zero or one time. 它相当于 {0,1}It is equivalent to {0,1}. ? 是贪婪量符,相当的惰性量符是 ??? is a greedy quantifier whose lazy equivalent is ??.

例如,正则表达式 \ban?\b 会尝试匹配以后跟字母 n 的零个或一个实例的字母 a 开头的完整单词。For example, the regular expression \ban?\b tries to match entire words that begin with the letter a followed by zero or one instances of the letter n. 换句话说,它会尝试匹配单词 aanIn other words, it tries to match the words a and an. 下面的示例说明此正则表达式。The following example illustrates this regular expression.

string pattern = @"\ban?\b";
string input = "An amiable animal with a large snount and an animated nose.";
foreach (Match match in Regex.Matches(input, pattern, RegexOptions.IgnoreCase))
   Console.WriteLine("'{0}' found at position {1}.", match.Value, match.Index);
  
// The example displays the following output:   
//        'An' found at position 0.
//        'a' found at position 23.
//        'an' found at position 42.
Dim pattern As String = "\ban?\b"
Dim input As String = "An amiable animal with a large snount and an animated nose."
For Each match As Match In Regex.Matches(input, pattern, RegexOptions.IgnoreCase)
   Console.WriteLine("'{0}' found at position {1}.", match.Value, match.Index)
Next  
' The example displays the following output:   
'       'An' found at position 0.
'       'a' found at position 23.
'       'an' found at position 42.

正则表达式模式的定义如下表所示。The regular expression pattern is defined as shown in the following table.

模式Pattern 说明Description
\b 在单词边界处开始。Start at a word boundary.
an? 匹配后跟零个或一个“n”字符的“a”。Match an "a" followed by zero or one "n" character.
\b 在字边界结束。End at a word boundary.

恰好匹配 n 次:{n}Match Exactly n Times: {n}

{n} 限定符与前面的元素恰好匹配 n 次,其中 n 是任何整数 。The {n} quantifier matches the preceding element exactly n times, where n is any integer. {n} 是贪婪限定符,其惰性等效项是 {n}?{n} is a greedy quantifier whose lazy equivalent is {n}?.

例如,正则表达式 \b\d+\,\d{3}\b 会尝试匹配依次后跟一个或多个十进制数字、三个十进制数字、一个单词边界的单词边界。For example, the regular expression \b\d+\,\d{3}\b tries to match a word boundary followed by one or more decimal digits followed by three decimal digits followed by a word boundary. 下面的示例说明此正则表达式。The following example illustrates this regular expression.

string pattern = @"\b\d+\,\d{3}\b";
string input = "Sales totaled 103,524 million in January, " + 
                      "106,971 million in February, but only " + 
                      "943 million in March.";
foreach (Match match in Regex.Matches(input, pattern))
   Console.WriteLine("'{0}' found at position {1}.", match.Value, match.Index);
     
//  The example displays the following output:   
//        '103,524' found at position 14.
//        '106,971' found at position 45.
Dim pattern As String = "\b\d+\,\d{3}\b"
Dim input As String = "Sales totaled 103,524 million in January, " + _
                      "106,971 million in February, but only " + _
                      "943 million in March."
For Each match As Match In Regex.Matches(input, pattern)
   Console.WriteLine("'{0}' found at position {1}.", match.Value, match.Index)
Next     
' The example displays the following output:   
'       '103,524' found at position 14.
'       '106,971' found at position 45.

正则表达式模式的定义如下表所示。The regular expression pattern is defined as shown in the following table.

模式Pattern 说明Description
\b 在单词边界处开始。Start at a word boundary.
\d+ 匹配一个或多个十进制数字。Match one or more decimal digits.
\, 匹配逗号字符。Match a comma character.
\d{3} 匹配三个十进制数字。Match three decimal digits.
\b 在字边界结束。End at a word boundary.

至少匹配 n 次:{n,}Match at Least n Times: {n,}

{n,} 限定符与前面的元素至少匹配 n 次,其中 n 是任何整数 。The {n,} quantifier matches the preceding element at least n times, where n is any integer. {n,} 是贪婪限定符,其惰性等效项是 {n,}?{n,} is a greedy quantifier whose lazy equivalent is {n,}?.

例如,正则表达式 \b\d{2,}\b\D+ 会尝试匹配依次后跟至少两个数字、一个单词边界和一个非数字字符的单词边界。For example, the regular expression \b\d{2,}\b\D+ tries to match a word boundary followed by at least two digits followed by a word boundary and a non-digit character. 下面的示例说明此正则表达式。The following example illustrates this regular expression. 正则表达式无法匹配短语 "7 days",因为它只包含一个十进制数字,但可以成功匹配短语 "10 weeks and 300 years"The regular expression fails to match the phrase "7 days" because it contains just one decimal digit, but it successfully matches the phrases "10 weeks and 300 years".

string pattern = @"\b\d{2,}\b\D+";   
string input = "7 days, 10 weeks, 300 years";
foreach (Match match in Regex.Matches(input, pattern))
   Console.WriteLine("'{0}' found at position {1}.", match.Value, match.Index);
 
//  The example displays the following output:
//        '10 weeks, ' found at position 8.
//        '300 years' found at position 18.
 Dim pattern As String = "\b\d{2,}\b\D+"  
 Dim input As String = "7 days, 10 weeks, 300 years"
For Each match As Match In Regex.Matches(input, pattern)
   Console.WriteLine("'{0}' found at position {1}.", match.Value, match.Index)
Next 
' The example displays the following output:
'       '10 weeks, ' found at position 8.
'       '300 years' found at position 18.

正则表达式模式的定义如下表所示。The regular expression pattern is defined as shown in the following table.

模式Pattern 说明Description
\b 在单词边界处开始。Start at a word boundary.
\d{2,} 匹配至少两个十进制数字。Match at least two decimal digits.
\b 与字边界匹配。Match a word boundary.
\D+ 匹配至少一个非十进制数字。Match at least one non-decimal digit.

匹配 n 到 m 次:{n,m}Match Between n and m Times: {n,m}

{n,m} 限定符与前面的元素至少匹配 n 次,但不超过 m 次,其中 n 和 m 是整数 。The {n,m} quantifier matches the preceding element at least n times, but no more than m times, where n and m are integers. {n,m} 是贪婪限定符,相当的惰性限定符是 {n,m}?{n,m} is a greedy quantifier whose lazy equivalent is {n,m}?.

在下面的示例中,正则表达式 (00\s){2,4} 尝试与后跟一个空格的两个零数字匹配两到四次。In the following example, the regular expression (00\s){2,4} tries to match between two and four occurrences of two zero digits followed by a space. 请注意,输入字符串的最后一部分包含此模式五次,而不是最大值四次。Note that the final portion of the input string includes this pattern five times rather than the maximum of four. 但是,只有此子字符串的初始部分(到空格和第五对零)与正则表达式模式匹配。However, only the initial portion of this substring (up to the space and the fifth pair of zeros) matches the regular expression pattern.

string pattern = @"(00\s){2,4}";
string input = "0x00 FF 00 00 18 17 FF 00 00 00 21 00 00 00 00 00";
foreach (Match match in Regex.Matches(input, pattern))
   Console.WriteLine("'{0}' found at position {1}.", match.Value, match.Index);
 
//  The example displays the following output:
//        '00 00 ' found at position 8.
//        '00 00 00 ' found at position 23.
//        '00 00 00 00 ' found at position 35.
Dim pattern As String = "(00\s){2,4}"
Dim input As String = "0x00 FF 00 00 18 17 FF 00 00 00 21 00 00 00 00 00"
For Each match As Match In Regex.Matches(input, pattern)
   Console.WriteLine("'{0}' found at position {1}.", match.Value, match.Index)
Next 
' The example displays the following output:
'       '00 00 ' found at position 8.
'       '00 00 00 ' found at position 23.
'       '00 00 00 00 ' found at position 35.

匹配零次或多次(惰性匹配):*?Match Zero or More Times (Lazy Match): *?

*? 量符匹配上一元素零次或多次,但次数尽可能少。The *? quantifier matches the preceding element zero or more times, but as few times as possible. 它是贪婪量符 * 对应的惰性量符。It is the lazy counterpart of the greedy quantifier *.

在下面的示例中,正则表达式 \b\w*?oo\w*?\b 匹配包含字符串 oo 的所有单词。In the following example, the regular expression \b\w*?oo\w*?\b matches all words that contain the string oo.

 string pattern = @"\b\w*?oo\w*?\b";
 string input = "woof root root rob oof woo woe";
 foreach (Match match in Regex.Matches(input, pattern, RegexOptions.IgnoreCase))
    Console.WriteLine("'{0}' found at position {1}.", match.Value, match.Index);
  
 //  The example displays the following output:
//        'woof' found at position 0.
//        'root' found at position 5.
//        'root' found at position 10.
//        'oof' found at position 19.
//        'woo' found at position 23.
 Dim pattern As String = "\b\w*?oo\w*?\b"
 Dim input As String = "woof root root rob oof woo woe"
 For Each match As Match In Regex.Matches(input, pattern, RegexOptions.IgnoreCase)
    Console.WriteLine("'{0}' found at position {1}.", match.Value, match.Index)
 Next 
 ' The example displays the following output:
'       'woof' found at position 0.
'       'root' found at position 5.
'       'root' found at position 10.
'       'oof' found at position 19.
'       'woo' found at position 23.

正则表达式模式的定义如下表所示。The regular expression pattern is defined as shown in the following table.

模式Pattern 说明Description
\b 在单词边界处开始。Start at a word boundary.
\w*? 匹配零个或多个单词字符,但字符要尽可能的少。Match zero or more word characters, but as few characters as possible.
oo 匹配字符串“oo”。Match the string "oo".
\w*? 匹配零个或多个单词字符,但字符要尽可能的少。Match zero or more word characters, but as few characters as possible.
\b 在单词边界处结束。End on a word boundary.

匹配一次或多次(惰性匹配):+?Match One or More Times (Lazy Match): +?

+? 量符匹配上一元素一次或多次,但次数尽可能少。The +? quantifier matches the preceding element one or more times, but as few times as possible. 它是贪婪量符 + 对应的惰性量符。It is the lazy counterpart of the greedy quantifier +.

例如,正则表达式 \b\w+?\b 匹配由单词边界分隔的一个或多个字符。For example, the regular expression \b\w+?\b matches one or more characters separated by word boundaries. 下面的示例说明此正则表达式。The following example illustrates this regular expression.

string pattern = @"\b\w+?\b";
string input = "Aa Bb Cc Dd Ee Ff";
foreach (Match match in Regex.Matches(input, pattern))
   Console.WriteLine("'{0}' found at position {1}.", match.Value, match.Index);
 
//  The example displays the following output:
//        'Aa' found at position 0.
//        'Bb' found at position 3.
//        'Cc' found at position 6.
//        'Dd' found at position 9.
//        'Ee' found at position 12.
//        'Ff' found at position 15.
 Dim pattern As String = "\b\w+?\b"
 Dim input As String = "Aa Bb Cc Dd Ee Ff"
For Each match As Match In Regex.Matches(input, pattern)
   Console.WriteLine("'{0}' found at position {1}.", match.Value, match.Index)
Next 
' The example displays the following output:
'       'Aa' found at position 0.
'       'Bb' found at position 3.
'       'Cc' found at position 6.
'       'Dd' found at position 9.
'       'Ee' found at position 12.
'       'Ff' found at position 15.

匹配零次或一次(惰性匹配):??Match Zero or One Time (Lazy Match): ??

?? 量符匹配上一元素零次或一次,但次数尽可能少。The ?? quantifier matches the preceding element zero or one time, but as few times as possible. 它是贪婪量符 ? 对应的惰性量符。It is the lazy counterpart of the greedy quantifier ?.

例如,正则表达式 ^\s*(System.)??Console.Write(Line)??\(?? 尝试匹配字符串“Console.Write”或“Console.WriteLine”。For example, the regular expression ^\s*(System.)??Console.Write(Line)??\(?? attempts to match the strings "Console.Write" or "Console.WriteLine". 字符串还可以在“Console”前面包含“System.”,The string can also include "System." 并且可以后跟左括号。before "Console", and it can be followed by an opening parenthesis. 字符串必须处于行的开头,不过前面可以是空格。The string must be at the beginning of a line, although it can be preceded by white space. 下面的示例说明此正则表达式。The following example illustrates this regular expression.

string pattern = @"^\s*(System.)??Console.Write(Line)??\(??";
string input = "System.Console.WriteLine(\"Hello!\")\n" + 
                      "Console.Write(\"Hello!\")\n" + 
                      "Console.WriteLine(\"Hello!\")\n" + 
                      "Console.ReadLine()\n" + 
                      "   Console.WriteLine";
foreach (Match match in Regex.Matches(input, pattern, 
                                      RegexOptions.IgnorePatternWhitespace | 
                                      RegexOptions.IgnoreCase | 
                                      RegexOptions.Multiline))
   Console.WriteLine("'{0}' found at position {1}.", match.Value, match.Index);
 
//  The example displays the following output:
//        'System.Console.Write' found at position 0.
//        'Console.Write' found at position 36.
//        'Console.Write' found at position 61.
//        '   Console.Write' found at position 110.
Dim pattern As String = "^\s*(System.)??Console.Write(Line)??\(??"
Dim input As String = "System.Console.WriteLine(""Hello!"")" + vbCrLf + _
                      "Console.Write(""Hello!"")" + vbCrLf + _
                      "Console.WriteLine(""Hello!"")" + vbCrLf + _
                      "Console.ReadLine()" + vbCrLf + _
                      "   Console.WriteLine"
For Each match As Match In Regex.Matches(input, pattern, _
                                         RegexOptions.IgnorePatternWhitespace Or RegexOptions.IgnoreCase Or RegexOptions.MultiLine)
   Console.WriteLine("'{0}' found at position {1}.", match.Value, match.Index)
Next 
' The example displays the following output:
'       'System.Console.Write' found at position 0.
'       'Console.Write' found at position 36.
'       'Console.Write' found at position 61.
'       '   Console.Write' found at position 110.

正则表达式模式的定义如下表所示。The regular expression pattern is defined as shown in the following table.

模式Pattern 说明Description
^ 匹配输入流的开头。Match the start of the input stream.
\s* 匹配零个或多个空白字符。Match zero or more white-space characters.
(System.)?? 匹配字符串“System.”的零个或一个匹配项。Match zero or one occurrence of the string "System.".
Console.Write 匹配字符串“Console.Write”。Match the string "Console.Write".
(Line)?? 匹配字符串“Line”的零个或一个匹配项。Match zero or one occurrence of the string "Line".
\(?? 匹配左括号的零个或一个匹配项。Match zero or one occurrence of the opening parenthesis.

恰好匹配 n 次(惰性匹配):{n}?Match Exactly n Times (Lazy Match): {n}?

{n}? 限定符与前面的元素恰好匹配 n 次,其中 n 是任何整数 。The {n}? quantifier matches the preceding element exactly n times, where n is any integer. 它是贪婪限定符 {n} 的惰性对应项 。It is the lazy counterpart of the greedy quantifier {n}.

在下面的示例中,正则表达式 \b(\w{3,}?\.){2}?\w{3,}?\b 用于标识网站地址。In the following example, the regular expression \b(\w{3,}?\.){2}?\w{3,}?\b is used to identify a Web site address. 请注意,它匹配“www.microsoft.com”和“msdn.microsoft.com”,但不匹配“mywebsite”或“mycompany.com”。Note that it matches "www.microsoft.com" and "msdn.microsoft.com", but does not match "mywebsite" or "mycompany.com".

string pattern = @"\b(\w{3,}?\.){2}?\w{3,}?\b";
string input = "www.microsoft.com msdn.microsoft.com mywebsite mycompany.com";
foreach (Match match in Regex.Matches(input, pattern))
   Console.WriteLine("'{0}' found at position {1}.", match.Value, match.Index);
     
//  The example displays the following output:
//        'www.microsoft.com' found at position 0.
//        'msdn.microsoft.com' found at position 18.
 Dim pattern As String = "\b(\w{3,}?\.){2}?\w{3,}?\b"
 Dim input As String = "www.microsoft.com msdn.microsoft.com mywebsite mycompany.com"
For Each match As Match In Regex.Matches(input, pattern)
   Console.WriteLine("'{0}' found at position {1}.", match.Value, match.Index)
Next     
' The example displays the following output:
'       'www.microsoft.com' found at position 0.
'       'msdn.microsoft.com' found at position 18.

正则表达式模式的定义如下表所示。The regular expression pattern is defined as shown in the following table.

模式Pattern 说明Description
\b 在单词边界处开始。Start at a word boundary.
(\w{3,}?\.) 匹配至少 3 个后跟一个点或句点字符的单词字符,但字符数尽可能少。Match at least 3 word characters, but as few characters as possible, followed by a dot or period character. 这是第一个捕获组。This is the first capturing group.
(\w{3,}?\.){2}? 匹配第一个组中的模式两次,但次数尽可能少。Match the pattern in the first group two times, but as few times as possible.
\b 在单词边界处结束匹配。End the match on a word boundary.

至少匹配 n 次(惰性匹配):{n,}?Match at Least n Times (Lazy Match): {n,}?

{n,}? 限定符与前面的元素至少匹配 n 次,其中 n 是任何整数,但次数尽可能少 。The {n,}? quantifier matches the preceding element at least n times, where n is any integer, but as few times as possible. 它是贪婪限定符 {n,} 的惰性对应项 。It is the lazy counterpart of the greedy quantifier {n,}.

有关说明,请参阅上一部分中的 {n}? 限定符示例 。See the example for the {n}? quantifier in the previous section for an illustration. 该示例中的正则表达式使用 {n,} 限定符匹配包含后跟一个句点的至少三个字符的字符串 。The regular expression in that example uses the {n,} quantifier to match a string that has at least three characters followed by a period.

匹配 n 到 m 次(惰性匹配):{n,m}?Match Between n and m Times (Lazy Match): {n,m}?

{n,m}? 限定符匹配上一元素 n 次到 m 次,其中 n 和 m 是整数,但次数尽可能少 。The {n,m}? quantifier matches the preceding element between n and m times, where n and m are integers, but as few times as possible. 它是贪婪限定符 {n,m} 的惰性对应项 。It is the lazy counterpart of the greedy quantifier {n,m}.

在下面的示例中,正则表达式 \b[A-Z](\w*?\s*?){1,10}[.!?] 匹配包含一到十个单词的句子。In the following example, the regular expression \b[A-Z](\w*?\s*?){1,10}[.!?] matches sentences that contain between one and ten words. 它可匹配输入字符串中的所有句子(除了包含 18 个单词的一个句子)。It matches all the sentences in the input string except for one sentence that contains 18 words.

string pattern = @"\b[A-Z](\w*?\s*?){1,10}[.!?]";
string input = "Hi. I am writing a short note. Its purpose is " + 
                      "to test a regular expression that attempts to find " + 
                      "sentences with ten or fewer words. Most sentences " + 
                      "in this note are short.";
foreach (Match match in Regex.Matches(input, pattern))
   Console.WriteLine("'{0}' found at position {1}.", match.Value, match.Index);
 
//  The example displays the following output:
//        'Hi.' found at position 0.
//        'I am writing a short note.' found at position 4.
//        'Most sentences in this note are short.' found at position 132.
Dim pattern As String = "\b[A-Z](\w*\s?){1,10}?[.!?]"
Dim input As String = "Hi. I am writing a short note. Its purpose is " + _
                      "to test a regular expression that attempts to find " + _
                      "sentences with ten or fewer words. Most sentences " + _
                      "in this note are short."
For Each match As Match In Regex.Matches(input, pattern)
   Console.WriteLine("'{0}' found at position {1}.", match.Value, match.Index)
Next 
' The example displays the following output:
'       'Hi.' found at position 0.
'       'I am writing a short note.' found at position 4.
'       'Most sentences in this note are short.' found at position 132.

正则表达式模式的定义如下表所示。The regular expression pattern is defined as shown in the following table.

模式Pattern 说明Description
\b 在单词边界处开始。Start at a word boundary.
[A-Z] 匹配从 A 到 Z 的大写字符。Match an uppercase character from A to Z.
(\w*?\s*?) 匹配零个或多个后跟一个或多个空白字符的单词字符,但次数应尽可能少。Match zero or more word characters, followed by one or more white-space characters, but as few times as possible. 这是第一个捕获组。This is the first capture group.
{1,10} 与前面的模式匹配 1 到 10 次。Match the previous pattern between 1 and 10 times.
[.!?] 匹配标点字符“.”、“!”或“?”中的任何一种。Match any one of the punctuation characters ".", "!", or "?".

贪婪与惰性限定符Greedy and Lazy Quantifiers

一些限定符具有两个版本:A number of the quantifiers have two versions:

  • 贪婪版本。A greedy version.

    贪婪限定符尝试尽可能多地匹配元素。A greedy quantifier tries to match an element as many times as possible.

  • 非贪婪(或惰性)版本。A non-greedy (or lazy) version.

    非贪婪限定符尝试尽可能少地匹配元素。A non-greedy quantifier tries to match an element as few times as possible. 只需添加 ?,即可将贪婪量符转换为惰性量符。You can turn a greedy quantifier into a lazy quantifier by simply adding a ?.

请考虑一个简单的正则表达式,它旨在从数字字符串(如信用卡号)中提取最后四位数。Consider a simple regular expression that is intended to extract the last four digits from a string of numbers such as a credit card number. 使用 *贪婪量符的正则表达式版本是 \b.*([0-9]{4})\bThe version of the regular expression that uses the * greedy quantifier is \b.*([0-9]{4})\b. 但是,如果字符串包含两个数字,则此正则表达式仅匹配第二个数字的最后四位数,如下面的示例所示。However, if a string contains two numbers, this regular expression matches the last four digits of the second number only, as the following example shows.

string greedyPattern = @"\b.*([0-9]{4})\b";
string input1 = "1112223333 3992991999";
foreach (Match match in Regex.Matches(input1, greedyPattern))
   Console.WriteLine("Account ending in ******{0}.", match.Groups[1].Value);

// The example displays the following output:
//       Account ending in ******1999.
Dim greedyPattern As String = "\b.*([0-9]{4})\b"
Dim input1 As String = "1112223333 3992991999"
For Each match As Match In Regex.Matches(input1, greedypattern)
   Console.WriteLine("Account ending in ******{0}.", match.Groups(1).Value)
Next
' The example displays the following output:
'       Account ending in ******1999.

正则表达式无法匹配第一个数字,因为 * 量符尝试在整个字符串中尽可能多地匹配上一元素,所以它会在字符串末尾找到匹配。The regular expression fails to match the first number because the * quantifier tries to match the previous element as many times as possible in the entire string, and so it finds its match at the end of the string.

这不是所需行为。This is not the desired behavior. 相反,可以使用 *? 惰性量符,从这两个数字提取数字,如下面的示例所示。Instead, you can use the *?lazy quantifier to extract digits from both numbers, as the following example shows.

string lazyPattern = @"\b.*?([0-9]{4})\b";
string input2 = "1112223333 3992991999";
foreach (Match match in Regex.Matches(input2, lazyPattern))
   Console.WriteLine("Account ending in ******{0}.", match.Groups[1].Value);

// The example displays the following output:
//       Account ending in ******3333.
//       Account ending in ******1999.
Dim lazyPattern As String = "\b.*?([0-9]{4})\b"
Dim input2 As String = "1112223333 3992991999"
For Each match As Match In Regex.Matches(input2, lazypattern)
   Console.WriteLine("Account ending in ******{0}.", match.Groups(1).Value)
Next     
' The example displays the following output:
'       Account ending in ******3333.
'       Account ending in ******1999.

在大多数情况下,具有贪婪和惰性限定符的正则表达式返回相同匹配项。In most cases, regular expressions with greedy and lazy quantifiers return the same matches. 与匹配任何字符的通配符 (.) 元字符一起使用时,它们通常会返回不同的结果。They most commonly return different results when they are used with the wildcard (.) metacharacter, which matches any character.

限定符和空匹配项Quantifiers and Empty Matches

如果已找到最小捕获数,限定符 *+{n,m} 及对应的惰性限定符绝不会在空匹配项后重复 。The quantifiers *, +, and {n,m} and their lazy counterparts never repeat after an empty match when the minimum number of captures has been found. 此规则会在最大可能组捕获数是无限或接近无限时,阻止限定符在空的子表达式匹配项上进入无限循环。This rule prevents quantifiers from entering infinite loops on empty subexpression matches when the maximum number of possible group captures is infinite or near infinite.

例如,下面的代码展示了使用正则表达式模式 (a?)*(匹配零个或一个“a”字符零次或多次)调用 Regex.Match 方法的结果。For example, the following code shows the result of a call to the Regex.Match method with the regular expression pattern (a?)*, which matches zero or one "a" character zero or more times. 请注意,一个捕获组捕获所有“a”以及 String.Empty,但没有第二个空匹配,因为第一个空匹配导致量符停止重复运行。Note that the single capturing group captures each "a" as well as String.Empty, but that there is no second empty match, because the first empty match causes the quantifier to stop repeating.

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string pattern = "(a?)*";
      string input = "aaabbb";
      Match match = Regex.Match(input, pattern);
      Console.WriteLine("Match: '{0}' at index {1}", 
                        match.Value, match.Index);
      if (match.Groups.Count > 1) {
         GroupCollection groups = match.Groups;
         for (int grpCtr = 1; grpCtr <= groups.Count - 1; grpCtr++) {
            Console.WriteLine("   Group {0}: '{1}' at index {2}", 
                              grpCtr, 
                              groups[grpCtr].Value,
                              groups[grpCtr].Index);
            int captureCtr = 0;
            foreach (Capture capture in groups[grpCtr].Captures) {
               captureCtr++;
               Console.WriteLine("      Capture {0}: '{1}' at index {2}", 
                                 captureCtr, capture.Value, capture.Index);
            }
         } 
      }   
   }
}
// The example displays the following output:
//       Match: 'aaa' at index 0
//          Group 1: '' at index 3
//             Capture 1: 'a' at index 0
//             Capture 2: 'a' at index 1
//             Capture 3: 'a' at index 2
//             Capture 4: '' at index 3
Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Dim pattern As String = "(a?)*"
      Dim input As String = "aaabbb"
      Dim match As Match = Regex.Match(input, pattern)
      Console.WriteLine("Match: '{0}' at index {1}", 
                        match.Value, match.Index)
      If match.Groups.Count > 1 Then
         Dim groups As GroupCollection = match.Groups
         For grpCtr As Integer = 1 To groups.Count - 1
            Console.WriteLine("   Group {0}: '{1}' at index {2}", 
                              grpCtr, 
                              groups(grpCtr).Value,
                              groups(grpCtr).Index)
            Dim captureCtr As Integer = 0
            For Each capture As Capture In groups(grpCtr).Captures
               captureCtr += 1
               Console.WriteLine("      Capture {0}: '{1}' at index {2}", 
                                 captureCtr, capture.Value, capture.Index)
            Next
         Next 
      End If   
   End Sub
End Module
' The example displays the following output:
'       Match: 'aaa' at index 0
'          Group 1: '' at index 3
'             Capture 1: 'a' at index 0
'             Capture 2: 'a' at index 1
'             Capture 3: 'a' at index 2
'             Capture 4: '' at index 3

若要查看定义最小和最大捕获数的捕获组与定义固定捕获数的捕获组之间的实际差异,请考虑正则表达式模式 (a\1|(?(1)\1)){0,2}(a\1|(?(1)\1)){2}To see the practical difference between a capturing group that defines a minimum and a maximum number of captures and one that defines a fixed number of captures, consider the regular expression patterns (a\1|(?(1)\1)){0,2} and (a\1|(?(1)\1)){2}. 这两个正则表达式包含单个捕获组,其定义如下表所示。Both regular expressions consist of a single capturing group, which is defined as shown in the following table.

模式Pattern 说明Description
(a\1 匹配“a”以及第一个捕获组的值...Either match "a" along with the value of the first captured group …
|(?(1) 或测试是否定义了第一个捕获组。or test whether the first captured group has been defined. (请注意,(?(1) 构造不定义捕获组。)(Note that the (?(1) construct does not define a capturing group.)
\1)) 如果第一个捕获组存在,则匹配其值。If the first captured group exists, match its value. 如果组不存在,组会匹配 String.EmptyIf the group does not exist, the group will match String.Empty.

第一个正则表达式尝试与此模式匹配零到二次;第二个正则表达式尝试恰好匹配两次。The first regular expression tries to match this pattern between zero and two times; the second, exactly two times. 由于第一个模式在首次捕获 String.Empty 时达到最小捕获数,因此它绝不会重复尝试匹配 a\1{0,2} 量符仅允许在最后一个迭代中有空匹配。Because the first pattern reaches its minimum number of captures with its first capture of String.Empty, it never repeats to try to match a\1; the {0,2} quantifier allows only empty matches in the last iteration. 相反,第二个正则表达式匹配“a”,因为它会第二次计算 a\1;最小迭代数 2 会强制引擎在空匹配项后面重复。In contrast, the second regular expression does match "a" because it evaluates a\1 a second time; the minimum number of iterations, 2, forces the engine to repeat after an empty match.

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string pattern, input;
       
      pattern = @"(a\1|(?(1)\1)){0,2}";
      input = "aaabbb"; 

      Console.WriteLine("Regex pattern: {0}", pattern);
      Match match = Regex.Match(input, pattern);
      Console.WriteLine("Match: '{0}' at position {1}.", 
                        match.Value, match.Index);
      if (match.Groups.Count > 1) {
         for (int groupCtr = 1; groupCtr <= match.Groups.Count - 1; groupCtr++)
         {
            Group group = match.Groups[groupCtr];         
            Console.WriteLine("   Group: {0}: '{1}' at position {2}.", 
                              groupCtr, group.Value, group.Index);
            int captureCtr = 0;
            foreach (Capture capture in group.Captures) {
               captureCtr++;
               Console.WriteLine("      Capture: {0}: '{1}' at position {2}.", 
                                 captureCtr, capture.Value, capture.Index);
            }   
         }
      }
      Console.WriteLine();

      pattern = @"(a\1|(?(1)\1)){2}";
      Console.WriteLine("Regex pattern: {0}", pattern);
      match = Regex.Match(input, pattern);
         Console.WriteLine("Matched '{0}' at position {1}.", 
                           match.Value, match.Index);
      if (match.Groups.Count > 1) {
         for (int groupCtr = 1; groupCtr <= match.Groups.Count - 1; groupCtr++)
         {
            Group group = match.Groups[groupCtr];         
            Console.WriteLine("   Group: {0}: '{1}' at position {2}.", 
                              groupCtr, group.Value, group.Index);
            int captureCtr = 0;
            foreach (Capture capture in group.Captures) {
               captureCtr++;
               Console.WriteLine("      Capture: {0}: '{1}' at position {2}.", 
                                 captureCtr, capture.Value, capture.Index);
            }   
         }
      }
   }
}
// The example displays the following output:
//       Regex pattern: (a\1|(?(1)\1)){0,2}
//       Match: '' at position 0.
//          Group: 1: '' at position 0.
//             Capture: 1: '' at position 0.
//       
//       Regex pattern: (a\1|(?(1)\1)){2}
//       Matched 'a' at position 0.
//          Group: 1: 'a' at position 0.
//             Capture: 1: '' at position 0.
//             Capture: 2: 'a' at position 0.
Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Dim pattern, input As String
       
      pattern = "(a\1|(?(1)\1)){0,2}"
      input = "aaabbb" 

      Console.WriteLine("Regex pattern: {0}", pattern)
      Dim match As Match = Regex.Match(input, pattern)
      Console.WriteLine("Match: '{0}' at position {1}.", 
                        match.Value, match.Index)
      If match.Groups.Count > 1 Then
         For groupCtr As Integer = 1 To match.Groups.Count - 1
            Dim group As Group = match.Groups(groupCtr)         
            Console.WriteLine("   Group: {0}: '{1}' at position {2}.", 
                              groupCtr, group.Value, group.Index)
            Dim captureCtr As Integer = 0
            For Each capture As Capture In group.Captures
               captureCtr += 1
               Console.WriteLine("      Capture: {0}: '{1}' at position {2}.", 
                                 captureCtr, capture.Value, capture.Index)
            Next   
         Next
      End If
      Console.WriteLine()

      pattern = "(a\1|(?(1)\1)){2}"
      Console.WriteLine("Regex pattern: {0}", pattern)
      match = Regex.Match(input, pattern)
         Console.WriteLine("Matched '{0}' at position {1}.", 
                           match.Value, match.Index)
      If match.Groups.Count > 1 Then
         For groupCtr As Integer = 1 To match.Groups.Count - 1
            Dim group As Group = match.Groups(groupCtr)         
            Console.WriteLine("   Group: {0}: '{1}' at position {2}.", 
                              groupCtr, group.Value, group.Index)
            Dim captureCtr As Integer = 0
            For Each capture As Capture In group.Captures
               captureCtr += 1
               Console.WriteLine("      Capture: {0}: '{1}' at position {2}.", 
                                 captureCtr, capture.Value, capture.Index)
            Next   
         Next
      End If
   End Sub
End Module
' The example displays the following output:
'       Regex pattern: (a\1|(?(1)\1)){0,2}
'       Match: '' at position 0.
'          Group: 1: '' at position 0.
'             Capture: 1: '' at position 0.
'       
'       Regex pattern: (a\1|(?(1)\1)){2}
'       Matched 'a' at position 0.
'          Group: 1: 'a' at position 0.
'             Capture: 1: '' at position 0.
'             Capture: 2: 'a' at position 0.

请参阅See also