.NET 正则表达式.NET Regular Expressions

正则表达式提供了功能强大、灵活而又高效的方法来处理文本。Regular expressions provide a powerful, flexible, and efficient method for processing text. 使用正则表达式的全面模式匹配表示法,可以快速分析大量文本,以找到特定的字符模式;验证文本以确保它匹配预定义模式(如电子邮件地址);提取、编辑、替换或删除文本子字符串;将提取的字符串添加到集合以生成报告。The extensive pattern-matching notation of regular expressions enables you to quickly parse large amounts of text to find specific character patterns; to validate text to ensure that it matches a predefined pattern (such as an email address); to extract, edit, replace, or delete text substrings; and to add the extracted strings to a collection in order to generate a report. 对于处理字符串或分析大文本块的许多应用程序而言,正则表达式是不可缺少的工具。For many applications that deal with strings or that parse large blocks of text, regular expressions are an indispensable tool.

正则表达式的工作方式How Regular Expressions Work

使用正则表达式处理文本的中心构件是正则表达式引擎(由 .NET 中的 System.Text.RegularExpressions.Regex 对象表示)。The centerpiece of text processing with regular expressions is the regular expression engine, which is represented by the System.Text.RegularExpressions.Regex object in .NET. 使用正则表达式处理文本至少要求向该正则表达式引擎提供以下两方面的信息:At a minimum, processing text using regular expressions requires that the regular expression engine be provided with the following two items of information:

  • 要在文本中标识的正则表达式模式。The regular expression pattern to identify in the text.

    在 .NET 中,正则表达式模式用特殊的语法或语言定义,该语法或语言与 Perl 5 正则表达式兼容,并添加了一些其他功能,例如从右到左匹配。In .NET, regular expression patterns are defined by a special syntax or language, which is compatible with Perl 5 regular expressions and adds some additional features such as right-to-left matching. 有关更多信息,请参见正则表达式语言 - 快速参考For more information, see Regular Expression Language - Quick Reference.

  • 要为正则表达式模式分析的文本。The text to parse for the regular expression pattern.

Regex 类的方法使你可以执行以下操作:The methods of the Regex class let you perform the following operations:

有关正则表达式对象模型的概述,请参见正则表达式对象模型For an overview of the regular expression object model, see The Regular Expression Object Model.

若要详细了解正则表达式语言,请参阅正则表达式语言 - 快速参考,或下载和打印下面的小册子之一:For more information about the regular expression language, see Regular Expression Language - Quick Reference or download and print one of these brochures:

快速参考(Word (.docx) 格式)Quick Reference in Word (.docx) format
快速参考(PDF (.pdf) 格式)Quick Reference in PDF (.pdf) format

正则表达式示例Regular Expression Examples

String 类包括许多字符串搜索和替换方法,当你要在较大字符串中定位文本字符串时,可以使用这些方法。The String class includes a number of string search and replacement methods that you can use when you want to locate literal strings in a larger string. 当你希望在较大字符串中定位若干子字符串之一时,或者当你希望在字符串中标识模式时,正则表达式最有用,如以下示例所示。Regular expressions are most useful either when you want to locate one of several substrings in a larger string, or when you want to identify patterns in a string, as the following examples illustrate.

示例 1:替换子字符串Example 1: Replacing Substrings

假设一个邮件列表包含一些姓名,这些姓名有时包括称谓(Mr.、Mrs.、Miss 或 Ms.)以及姓氏和名字。Assume that a mailing list contains names that sometimes include a title (Mr., Mrs., Miss, or Ms.) along with a first and last name. 如果你从列表中生成信封标签时不希望包括称谓,则可以使用正则表达式移除称谓,如以下示例所示。If you do not want to include the titles when you generate envelope labels from the list, you can use a regular expression to remove the titles, as the following example illustrates.

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string pattern = "(Mr\\.? |Mrs\\.? |Miss |Ms\\.? )";
      string[] names = { "Mr. Henry Hunt", "Ms. Sara Samuels", 
                         "Abraham Adams", "Ms. Nicole Norris" };
      foreach (string name in names)
         Console.WriteLine(Regex.Replace(name, pattern, String.Empty));
   }
}
// The example displays the following output:
//    Henry Hunt
//    Sara Samuels
//    Abraham Adams
//    Nicole Norris
Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Dim pattern As String = "(Mr\.? |Mrs\.? |Miss |Ms\.? )"
      Dim names() As String = { "Mr. Henry Hunt", "Ms. Sara Samuels", _
                                "Abraham Adams", "Ms. Nicole Norris" }
      For Each name As String In names
         Console.WriteLine(Regex.Replace(name, pattern, String.Empty))
      Next                                
   End Sub
End Module
' The example displays the following output:
'    Henry Hunt
'    Sara Samuels
'    Abraham Adams
'    Nicole Norris

正则表达式模式 (Mr\.? |Mrs\.? |Miss |Ms\.? ) 匹配任何"Mr"、"Mr."、"Mrs"、"Mrs."、"Miss"、"Ms 或"Ms."。The regular expression pattern(Mr\.? |Mrs\.? |Miss |Ms\.? ) matches any occurrence of "Mr ", "Mr. ", "Mrs ", "Mrs. ", "Miss ", "Ms or "Ms. ". Regex.Replace 方法的调用会将匹配的字符串替换为 String.Empty;换句话说,将其从原始字符串中移除。The call to the Regex.Replace method replaces the matched string with String.Empty; in other words, it removes it from the original string.

示例 2:标识重复的单词Example 2: Identifying Duplicated Words

意外地重复单词是编写器常犯的错误。Accidentally duplicating words is a common error that writers make. 可以使用正则表达式标识重复的单词,如以下示例所示。A regular expression can be used to identify duplicated words, as the following example shows.

using System;
using System.Text.RegularExpressions;

public class Class1
{
   public static void Main()
   {
      string pattern = @"\b(\w+?)\s\1\b";
      string input = "This this is a nice day. What about this? This tastes good. I saw a a dog.";
      foreach (Match match in Regex.Matches(input, pattern, RegexOptions.IgnoreCase))
         Console.WriteLine("{0} (duplicates '{1}') at position {2}", 
                           match.Value, match.Groups[1].Value, match.Index);
   }
}
// The example displays the following output:
//       This this (duplicates 'This') at position 0
//       a a (duplicates 'a') at position 66
Imports System.Text.RegularExpressions

Module modMain
   Public Sub Main()
      Dim pattern As String = "\b(\w+?)\s\1\b"
      Dim input As String = "This this is a nice day. What about this? This tastes good. I saw a a dog."
      For Each match As Match In Regex.Matches(input, pattern, RegexOptions.IgnoreCase)
         Console.WriteLine("{0} (duplicates '{1}') at position {2}", _
                           match.Value, match.Groups(1).Value, match.Index)
      Next
   End Sub
End Module
' The example displays the following output:
'       This this (duplicates 'This') at position 0
'       a a (duplicates 'a') at position 66

正则表达式模式 \b(\w+?)\s\1\b 的解释如下:The regular expression pattern \b(\w+?)\s\1\b can be interpreted as follows:

\b 在单词边界处开始。Start at a word boundary.
(\w+?)(\w+?) 匹配一个或多个单词字符,但字符要尽可能的少。Match one or more word characters, but as few characters as possible. 它们一起构成可称为 \1 的组。Together, they form a group that can be referred to as \1.
\s 与空白字符匹配。Match a white-space character.
\1 与等于名为 \1 的组的子字符串匹配。Match the substring that is equal to the group named \1.
\b 与字边界匹配。Match a word boundary.

通过将正则表达式选项设置为 Regex.Matches,调用 RegexOptions.IgnoreCase 方法。The Regex.Matches method is called with regular expression options set to RegexOptions.IgnoreCase. 因此,匹配操作不区分大小写,此示例将子字符串“This this”标识为重复。Therefore, the match operation is case-insensitive, and the example identifies the substring "This this" as a duplication.

请注意,输入字符串包括子字符串“this?Note that the input string includes the substring "this? This”。This". 但是,由于插入标点符号,该子字符串不被标识为重复。However, because of the intervening punctuation mark, it is not identified as a duplication.

示例 3:动态生成区分区域性的正则表达式Example 3: Dynamically Building a Culture-Sensitive Regular Expression

下面的示例演示如何将正则表达式的功能与 .NET 的全球化功能所提供的灵活性结合在一起。The following example illustrates the power of regular expressions combined with the flexibility offered by .NET's globalization features. 它使用 NumberFormatInfo 对象确定系统的当前区域性设置中货币值的格式。It uses the NumberFormatInfo object to determine the format of currency values in the system's current culture. 然后使用该信息动态构造从文本提取货币值的正则表达式。It then uses that information to dynamically construct a regular expression that extracts currency values from the text. 对于每个匹配,它提取仅包含数字字符串的子组,将其转换为 Decimal 值,然后计算累计值。For each match, it extracts the subgroup that contains the numeric string only, converts it to a Decimal value, and calculates a running total.

using System;
using System.Collections.Generic;
using System.Globalization;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      // Define text to be parsed.
      string input = "Office expenses on 2/13/2008:\n" + 
                     "Paper (500 sheets)                      $3.95\n" + 
                     "Pencils (box of 10)                     $1.00\n" + 
                     "Pens (box of 10)                        $4.49\n" + 
                     "Erasers                                 $2.19\n" + 
                     "Ink jet printer                        $69.95\n\n" + 
                     "Total Expenses                        $ 81.58\n"; 
      
      // Get current culture's NumberFormatInfo object.
      NumberFormatInfo nfi = CultureInfo.CurrentCulture.NumberFormat;
      // Assign needed property values to variables.
      string currencySymbol = nfi.CurrencySymbol;
      bool symbolPrecedesIfPositive = nfi.CurrencyPositivePattern % 2 == 0;
      string groupSeparator = nfi.CurrencyGroupSeparator;
      string decimalSeparator = nfi.CurrencyDecimalSeparator;

      // Form regular expression pattern.
      string pattern = Regex.Escape( symbolPrecedesIfPositive ? currencySymbol : "") + 
                       @"\s*[-+]?" + "([0-9]{0,3}(" + groupSeparator + "[0-9]{3})*(" + 
                       Regex.Escape(decimalSeparator) + "[0-9]+)?)" + 
                       (! symbolPrecedesIfPositive ? currencySymbol : ""); 
      Console.WriteLine( "The regular expression pattern is:");
      Console.WriteLine("   " + pattern);      

      // Get text that matches regular expression pattern.
      MatchCollection matches = Regex.Matches(input, pattern, 
                                              RegexOptions.IgnorePatternWhitespace);               
      Console.WriteLine("Found {0} matches.", matches.Count); 

      // Get numeric string, convert it to a value, and add it to List object.
      List<decimal> expenses = new List<Decimal>();
                     
      foreach (Match match in matches)
         expenses.Add(Decimal.Parse(match.Groups[1].Value));      

      // Determine whether total is present and if present, whether it is correct.
      decimal total = 0;
      foreach (decimal value in expenses)
         total += value;
      
      if (total / 2 == expenses[expenses.Count - 1]) 
         Console.WriteLine("The expenses total {0:C2}.", expenses[expenses.Count - 1]);
      else
         Console.WriteLine("The expenses total {0:C2}.", total);
   }  
}
// The example displays the following output:
//       The regular expression pattern is:
//          \$\s*[-+]?([0-9]{0,3}(,[0-9]{3})*(\.[0-9]+)?)
//       Found 6 matches.
//       The expenses total $81.58.
Imports System.Collections.Generic
Imports System.Globalization
Imports System.Text.RegularExpressions

Public Module Example
   Public Sub Main()
      ' Define text to be parsed.
      Dim input As String = "Office expenses on 2/13/2008:" + vbCrLf + _
                            "Paper (500 sheets)                      $3.95" + vbCrLf + _
                            "Pencils (box of 10)                     $1.00" + vbCrLf + _
                            "Pens (box of 10)                        $4.49" + vbCrLf + _
                            "Erasers                                 $2.19" + vbCrLf + _
                            "Ink jet printer                        $69.95" + vbCrLf + vbCrLf + _
                            "Total Expenses                        $ 81.58" + vbCrLf
      ' Get current culture's NumberFormatInfo object.
      Dim nfi As NumberFormatInfo = CultureInfo.CurrentCulture.NumberFormat
      ' Assign needed property values to variables.
      Dim currencySymbol As String = nfi.CurrencySymbol
      Dim symbolPrecedesIfPositive As Boolean = CBool(nfi.CurrencyPositivePattern Mod 2 = 0)
      Dim groupSeparator As String = nfi.CurrencyGroupSeparator
      Dim decimalSeparator As String = nfi.CurrencyDecimalSeparator

      ' Form regular expression pattern.
      Dim pattern As String = Regex.Escape(CStr(IIf(symbolPrecedesIfPositive, currencySymbol, ""))) + _
                              "\s*[-+]?" + "([0-9]{0,3}(" + groupSeparator + "[0-9]{3})*(" + _
                              Regex.Escape(decimalSeparator) + "[0-9]+)?)" + _
                              CStr(IIf(Not symbolPrecedesIfPositive, currencySymbol, "")) 
      Console.WriteLine("The regular expression pattern is: ")
      Console.WriteLine("   " + pattern)      

      ' Get text that matches regular expression pattern.
      Dim matches As MatchCollection = Regex.Matches(input, pattern, RegexOptions.IgnorePatternWhitespace)               
      Console.WriteLine("Found {0} matches. ", matches.Count)

      ' Get numeric string, convert it to a value, and add it to List object.
      Dim expenses As New List(Of Decimal)
                     
      For Each match As Match In matches
         expenses.Add(Decimal.Parse(match.Groups.Item(1).Value))      
      Next

      ' Determine whether total is present and if present, whether it is correct.
      Dim total As Decimal
      For Each value As Decimal In expenses
         total += value
      Next
      
      If total / 2 = expenses(expenses.Count - 1) Then
         Console.WriteLine("The expenses total {0:C2}.", expenses(expenses.Count - 1))
      Else
         Console.WriteLine("The expenses total {0:C2}.", total)
      End If   
   End Sub
End Module
' The example displays the following output:
'       The regular expression pattern is:
'          \$\s*[-+]?([0-9]{0,3}(,[0-9]{3})*(\.[0-9]+)?)
'       Found 6 matches.
'       The expenses total $81.58.

在当前区域性设置为“英语 - 美国”(en-US) 的计算机上,该示例动态生成正则表达式 \$\s*[-+]?([0-9]{0,3}(,[0-9]{3})*(\.[0-9]+)?)On a computer whose current culture is English - United States (en-US), the example dynamically builds the regular expression \$\s*[-+]?([0-9]{0,3}(,[0-9]{3})*(\.[0-9]+)?). 此正则表达式模式可以按以下方式解释:This regular expression pattern can be interpreted as follows:

\$ 在输入字符串中查找美元符号 ($) 的一个匹配项。Look for a single occurrence of the dollar symbol ($) in the input string. 正则表达式模式字符串包含一个反斜杠来指示按字面解释美元符号而非将其作为正则表达式定位点。The regular expression pattern string includes a backslash to indicate that the dollar symbol is to be interpreted literally rather than as a regular expression anchor. (单独的 $ 符号将指示正则表达式引擎应尝试在字符串的末尾开始匹配。)为了确保当前区域性设置的货币符号不被错误解释为正则表达式符号,该示例调用 Escape 方法使该字符转义。(The $ symbol alone would indicate that the regular expression engine should try to begin its match at the end of a string.) To ensure that the current culture's currency symbol is not misinterpreted as a regular expression symbol, the example calls the Escape method to escape the character.
\s* 查找空白字符的零个或多个匹配项。Look for zero or more occurrences of a white-space character.
[-+]? 查找正号或负号的零个或一个匹配项。Look for zero or one occurrence of either a positive sign or a negative sign.
([0-9]{0,3}(,[0-9]{3})*(\.[0-9]+)?) 括起此表达式的外部括号将表达式定义为捕获组或子表达式。The outer parentheses around this expression define it as a capturing group or a subexpression. 如果找到匹配项,则有关匹配字符串的此部分的信息可以从第二个 Group 对象中检索(该对象位于 GroupCollection 属性所返回的 Match.Groups 对象中)。If a match is found, information about this part of the matching string can be retrieved from the second Group object in the GroupCollection object returned by the Match.Groups property. (集合中的第一个元素表示整个匹配。)(The first element in the collection represents the entire match.)
[0-9]{0,3} 查找十进制数字 0 到 9 的零到三个匹配项。Look for zero to three occurrences of the decimal digits 0 through 9.
(,[0-9]{3})* 查找后跟三个十进制数字的组分隔符的零个或多个匹配项。Look for zero or more occurrences of a group separator followed by three decimal digits.
\. 查找小数分隔符的一个匹配项。Look for a single occurrence of the decimal separator.
[0-9]+ 查找一个或多个十进制数字。Look for one or more decimal digits.
(\.[0-9]+)? 查找后跟至少一个十进制数字的小数分隔符的零个或一个匹配项。Look for zero or one occurrence of the decimal separator followed by at least one decimal digit.

如果在输入字符串中找到所有这些子模式,则匹配成功,并将包含有关匹配的信息的 Match 对象添加到 MatchCollection 对象。If each of these subpatterns is found in the input string, the match succeeds, and a Match object that contains information about the match is added to the MatchCollection object.

标题Title 描述Description
正则表达式语言 - 快速参考Regular Expression Language - Quick Reference 提供有关可用来定义正则表达式的字符集、运算符和构造的信息。Provides information on the set of characters, operators, and constructs that you can use to define regular expressions.
正则表达式对象模型The Regular Expression Object Model 提供演示如何使用正则表达式类的信息和代码示例。Provides information and code examples that illustrate how to use the regular expression classes.
正则表达式行为的详细信息Details of Regular Expression Behavior 介绍了 .NET 正则表达式的功能和行为。Provides information about the capabilities and behavior of .NET regular expressions.
正则表达式示例Regular Expression Examples 提供演示正则表达式的典型用法的代码示例。Provides code examples that illustrate typical uses of regular expressions.

参考Reference

System.Text.RegularExpressions
System.Text.RegularExpressions.Regex
正则表达式 - 快速参考(以 Word 格式下载)Regular Expressions - Quick Reference (download in Word format)
正则表达式 — 快速参考(以 PDF 格式下载)Regular Expressions - Quick Reference (download in PDF format)