.NET 規則運算式.NET Regular Expressions

規則運算式提供功能強大、彈性且有效率的方法來處理文字。Regular expressions provide a powerful, flexible, and efficient method for processing text. 規則運算式的廣泛模式比對標記法可讓您快速剖析大量文字,以尋找特定的字元模式;驗證文字,以確保其符合預先定義的模式 (例如電子郵件地址);擷取、編輯、取代或刪除文字子字串;以及將擷取的字串加入至集合,以產生報告。The extensive pattern-matching notation of regular expressions enables you to quickly parse large amounts of text to find specific character patterns; to validate text to ensure that it matches a predefined pattern (such as an email address); to extract, edit, replace, or delete text substrings; and to add the extracted strings to a collection in order to generate a report. 對許多處理字串或剖析大型文字區塊的應用程式而言,規則運算式是不可或缺的工具。For many applications that deal with strings or that parse large blocks of text, regular expressions are an indispensable tool.

規則運算式的運作方式How Regular Expressions Work

使用規則運算式來處理文字的核心是規則運算式引擎,以 .NET 中的 System.Text.RegularExpressions.Regex 物件來表示。The centerpiece of text processing with regular expressions is the regular expression engine, which is represented by the System.Text.RegularExpressions.Regex object in .NET. 使用規則運算式來處理文字時,至少需要提供規則運算式引擎以及下列兩個資訊項目:At a minimum, processing text using regular expressions requires that the regular expression engine be provided with the following two items of information:

  • 要在文字中識別的規則運算式模式。The regular expression pattern to identify in the text.

    在 .NET 中,規則運算式模式是以特殊的語法或語言來定義,其相容於 Perl 5 規則運算式,並新增一些其他功能,例如由右至左比對。In .NET, regular expression patterns are defined by a special syntax or language, which is compatible with Perl 5 regular expressions and adds some additional features such as right-to-left matching. 如需詳細資訊,請參閱規則運算式語言 - 快速參考For more information, see Regular Expression Language - Quick Reference.

  • 要為規則運算式模式剖析的文字。The text to parse for the regular expression pattern.

Regex 類別的方法可讓您執行下列作業:The methods of the Regex class let you perform the following operations:

如需規則運算式物件模型概觀,請參閱規則運算式物件模型For an overview of the regular expression object model, see The Regular Expression Object Model.

如需規則運算式語言的詳細資訊,請參閱規則運算式語言 - 快速參考,或下載並列印下列其中一本小手冊:For more information about the regular expression language, see Regular Expression Language - Quick Reference or download and print one of these brochures:

Word (.docx) 格式的快速參考Quick Reference in Word (.docx) format
PDF (.pdf) 格式的快速參考Quick Reference in PDF (.pdf) format

規則運算式範例Regular Expression Examples

String 類別包含數種字串搜尋和取代方法,可供您在大型字串中尋找常值字串時使用。The String class includes a number of string search and replacement methods that you can use when you want to locate literal strings in a larger string. 當您想要在大型字串中尋找數個子字串時,或是當您想要識別字串中的模式時,規則運算式最為好用,如下列範例所示。Regular expressions are most useful either when you want to locate one of several substrings in a larger string, or when you want to identify patterns in a string, as the following examples illustrate.

範例 1:取代子字串Example 1: Replacing Substrings

假設郵寄清單包含的名稱有時候會包括稱謂 (Mr.、Mrs.、Miss 或 Ms.) 以及姓名。Assume that a mailing list contains names that sometimes include a title (Mr., Mrs., Miss, or Ms.) along with a first and last name. 當您從清單產生信封標籤時,如果不想包括稱謂,就可以使用規則運算式來移除稱謂,如下列範例所示。If you do not want to include the titles when you generate envelope labels from the list, you can use a regular expression to remove the titles, as the following example illustrates.

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string pattern = "(Mr\\.? |Mrs\\.? |Miss |Ms\\.? )";
      string[] names = { "Mr. Henry Hunt", "Ms. Sara Samuels", 
                         "Abraham Adams", "Ms. Nicole Norris" };
      foreach (string name in names)
         Console.WriteLine(Regex.Replace(name, pattern, String.Empty));
   }
}
// The example displays the following output:
//    Henry Hunt
//    Sara Samuels
//    Abraham Adams
//    Nicole Norris
Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Dim pattern As String = "(Mr\.? |Mrs\.? |Miss |Ms\.? )"
      Dim names() As String = { "Mr. Henry Hunt", "Ms. Sara Samuels", _
                                "Abraham Adams", "Ms. Nicole Norris" }
      For Each name As String In names
         Console.WriteLine(Regex.Replace(name, pattern, String.Empty))
      Next                                
   End Sub
End Module
' The example displays the following output:
'    Henry Hunt
'    Sara Samuels
'    Abraham Adams
'    Nicole Norris

規則運算式模式 (Mr\.? |Mrs\.? |Miss |Ms\.? ) 會比對所出現的任何 "Mr "、"Mr. "、"Mrs "、"Mrs. "、"Miss "、"Ms 或 "Ms. "。The regular expression pattern(Mr\.? |Mrs\.? |Miss |Ms\.? ) matches any occurrence of "Mr ", "Mr. ", "Mrs ", "Mrs. ", "Miss ", "Ms or "Ms. ". 呼叫 Regex.Replace 方法會將相符的字串取代為 String.Empty;換句話說,就是將其從原始字串中移除。The call to the Regex.Replace method replaces the matched string with String.Empty; in other words, it removes it from the original string.

範例 2:識別重複的文字Example 2: Identifying Duplicated Words

不小心重複文字是作者常犯的錯誤。Accidentally duplicating words is a common error that writers make. 規則運算式可用來識別重複的文字,如下列範例所示。A regular expression can be used to identify duplicated words, as the following example shows.

using System;
using System.Text.RegularExpressions;

public class Class1
{
   public static void Main()
   {
      string pattern = @"\b(\w+?)\s\1\b";
      string input = "This this is a nice day. What about this? This tastes good. I saw a a dog.";
      foreach (Match match in Regex.Matches(input, pattern, RegexOptions.IgnoreCase))
         Console.WriteLine("{0} (duplicates '{1}') at position {2}", 
                           match.Value, match.Groups[1].Value, match.Index);
   }
}
// The example displays the following output:
//       This this (duplicates 'This') at position 0
//       a a (duplicates 'a') at position 66
Imports System.Text.RegularExpressions

Module modMain
   Public Sub Main()
      Dim pattern As String = "\b(\w+?)\s\1\b"
      Dim input As String = "This this is a nice day. What about this? This tastes good. I saw a a dog."
      For Each match As Match In Regex.Matches(input, pattern, RegexOptions.IgnoreCase)
         Console.WriteLine("{0} (duplicates '{1}') at position {2}", _
                           match.Value, match.Groups(1).Value, match.Index)
      Next
   End Sub
End Module
' The example displays the following output:
'       This this (duplicates 'This') at position 0
'       a a (duplicates 'a') at position 66

規則運算式模式 \b(\w+?)\s\1\b 可解譯如下:The regular expression pattern \b(\w+?)\s\1\b can be interpreted as follows:

\b 從字緣開始。Start at a word boundary.
(\w+?)(\w+?) 比對一或多個字元,但字元數愈少愈好。Match one or more word characters, but as few characters as possible. 這些一起構成可稱之為 \1 的群組。Together, they form a group that can be referred to as \1.
\s 比對空白字元。Match a white-space character.
\1 比對等同於名為 \1 之群組的子字串。Match the substring that is equal to the group named \1.
\b 比對字邊界。Match a word boundary.

呼叫 Regex.Matches 方法時,規則運算式選項設為 RegexOptions.IgnoreCaseThe Regex.Matches method is called with regular expression options set to RegexOptions.IgnoreCase. 因此,比對作業不區分大小寫,而且此範例會將子字串 "This this" 視為重複。Therefore, the match operation is case-insensitive, and the example identifies the substring "This this" as a duplication.

請注意,輸入字串包括子字串 "this?Note that the input string includes the substring "this? This"。This". 不過,因為中間有標點符號,所以不會將其視為重複。However, because of the intervening punctuation mark, it is not identified as a duplication.

範例 3:動態建立區分文化特性的規則運算式Example 3: Dynamically Building a Culture-Sensitive Regular Expression

下列範例說明規則運算式結合 .NET 全球化功能所提供的彈性,功能有多麼強大。The following example illustrates the power of regular expressions combined with the flexibility offered by .NET's globalization features. 它會使用 NumberFormatInfo 物件來判定系統目前文化特性中的幣值格式,It uses the NumberFormatInfo object to determine the format of currency values in the system's current culture. 然後利用該資訊動態建構可從文字擷取幣值的規則運算式。It then uses that information to dynamically construct a regular expression that extracts currency values from the text. 針對每個比對,它會擷取僅包含數值字串的子群組,將其轉換成 Decimal 值,並計算執行總計。For each match, it extracts the subgroup that contains the numeric string only, converts it to a Decimal value, and calculates a running total.

using System;
using System.Collections.Generic;
using System.Globalization;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      // Define text to be parsed.
      string input = "Office expenses on 2/13/2008:\n" + 
                     "Paper (500 sheets)                      $3.95\n" + 
                     "Pencils (box of 10)                     $1.00\n" + 
                     "Pens (box of 10)                        $4.49\n" + 
                     "Erasers                                 $2.19\n" + 
                     "Ink jet printer                        $69.95\n\n" + 
                     "Total Expenses                        $ 81.58\n"; 
      
      // Get current culture's NumberFormatInfo object.
      NumberFormatInfo nfi = CultureInfo.CurrentCulture.NumberFormat;
      // Assign needed property values to variables.
      string currencySymbol = nfi.CurrencySymbol;
      bool symbolPrecedesIfPositive = nfi.CurrencyPositivePattern % 2 == 0;
      string groupSeparator = nfi.CurrencyGroupSeparator;
      string decimalSeparator = nfi.CurrencyDecimalSeparator;

      // Form regular expression pattern.
      string pattern = Regex.Escape( symbolPrecedesIfPositive ? currencySymbol : "") + 
                       @"\s*[-+]?" + "([0-9]{0,3}(" + groupSeparator + "[0-9]{3})*(" + 
                       Regex.Escape(decimalSeparator) + "[0-9]+)?)" + 
                       (! symbolPrecedesIfPositive ? currencySymbol : ""); 
      Console.WriteLine( "The regular expression pattern is:");
      Console.WriteLine("   " + pattern);      

      // Get text that matches regular expression pattern.
      MatchCollection matches = Regex.Matches(input, pattern, 
                                              RegexOptions.IgnorePatternWhitespace);               
      Console.WriteLine("Found {0} matches.", matches.Count); 

      // Get numeric string, convert it to a value, and add it to List object.
      List<decimal> expenses = new List<Decimal>();
                     
      foreach (Match match in matches)
         expenses.Add(Decimal.Parse(match.Groups[1].Value));      

      // Determine whether total is present and if present, whether it is correct.
      decimal total = 0;
      foreach (decimal value in expenses)
         total += value;
      
      if (total / 2 == expenses[expenses.Count - 1]) 
         Console.WriteLine("The expenses total {0:C2}.", expenses[expenses.Count - 1]);
      else
         Console.WriteLine("The expenses total {0:C2}.", total);
   }  
}
// The example displays the following output:
//       The regular expression pattern is:
//          \$\s*[-+]?([0-9]{0,3}(,[0-9]{3})*(\.[0-9]+)?)
//       Found 6 matches.
//       The expenses total $81.58.
Imports System.Collections.Generic
Imports System.Globalization
Imports System.Text.RegularExpressions

Public Module Example
   Public Sub Main()
      ' Define text to be parsed.
      Dim input As String = "Office expenses on 2/13/2008:" + vbCrLf + _
                            "Paper (500 sheets)                      $3.95" + vbCrLf + _
                            "Pencils (box of 10)                     $1.00" + vbCrLf + _
                            "Pens (box of 10)                        $4.49" + vbCrLf + _
                            "Erasers                                 $2.19" + vbCrLf + _
                            "Ink jet printer                        $69.95" + vbCrLf + vbCrLf + _
                            "Total Expenses                        $ 81.58" + vbCrLf
      ' Get current culture's NumberFormatInfo object.
      Dim nfi As NumberFormatInfo = CultureInfo.CurrentCulture.NumberFormat
      ' Assign needed property values to variables.
      Dim currencySymbol As String = nfi.CurrencySymbol
      Dim symbolPrecedesIfPositive As Boolean = CBool(nfi.CurrencyPositivePattern Mod 2 = 0)
      Dim groupSeparator As String = nfi.CurrencyGroupSeparator
      Dim decimalSeparator As String = nfi.CurrencyDecimalSeparator

      ' Form regular expression pattern.
      Dim pattern As String = Regex.Escape(CStr(IIf(symbolPrecedesIfPositive, currencySymbol, ""))) + _
                              "\s*[-+]?" + "([0-9]{0,3}(" + groupSeparator + "[0-9]{3})*(" + _
                              Regex.Escape(decimalSeparator) + "[0-9]+)?)" + _
                              CStr(IIf(Not symbolPrecedesIfPositive, currencySymbol, "")) 
      Console.WriteLine("The regular expression pattern is: ")
      Console.WriteLine("   " + pattern)      

      ' Get text that matches regular expression pattern.
      Dim matches As MatchCollection = Regex.Matches(input, pattern, RegexOptions.IgnorePatternWhitespace)               
      Console.WriteLine("Found {0} matches. ", matches.Count)

      ' Get numeric string, convert it to a value, and add it to List object.
      Dim expenses As New List(Of Decimal)
                     
      For Each match As Match In matches
         expenses.Add(Decimal.Parse(match.Groups.Item(1).Value))      
      Next

      ' Determine whether total is present and if present, whether it is correct.
      Dim total As Decimal
      For Each value As Decimal In expenses
         total += value
      Next
      
      If total / 2 = expenses(expenses.Count - 1) Then
         Console.WriteLine("The expenses total {0:C2}.", expenses(expenses.Count - 1))
      Else
         Console.WriteLine("The expenses total {0:C2}.", total)
      End If   
   End Sub
End Module
' The example displays the following output:
'       The regular expression pattern is:
'          \$\s*[-+]?([0-9]{0,3}(,[0-9]{3})*(\.[0-9]+)?)
'       Found 6 matches.
'       The expenses total $81.58.

在目前文化特性為 English - United States (en-US) 的電腦上,此範例會動態建立規則運算式 \$\s*[-+]?([0-9]{0,3}(,[0-9]{3})*(\.[0-9]+)?)On a computer whose current culture is English - United States (en-US), the example dynamically builds the regular expression \$\s*[-+]?([0-9]{0,3}(,[0-9]{3})*(\.[0-9]+)?). 此規則運算式模式可解譯如下:This regular expression pattern can be interpreted as follows:

\$ 在輸入字串中尋找單獨出現的貨幣符號 ($)。Look for a single occurrence of the dollar symbol ($) in the input string. 規則運算式模式字串包含反斜線,表示貨幣符號要解譯為字面意義,而不是規則運算式錨點。The regular expression pattern string includes a backslash to indicate that the dollar symbol is to be interpreted literally rather than as a regular expression anchor. ($ 符號單獨出現表示規則運算式引擎應該嘗試在字串結尾處開始其比對。)為了確保目前文化特性的貨幣符號不會誤譯為規則運算式符號,此範例呼叫 Regex.Escape 方法以逸出字元。(The $ symbol alone would indicate that the regular expression engine should try to begin its match at the end of a string.) To ensure that the current culture's currency symbol is not misinterpreted as a regular expression symbol, the example calls the Regex.Escape method to escape the character.
\s* 尋找出現零或多次的空格字元。Look for zero or more occurrences of a white-space character.
[-+]? 尋找出現一或多次的正號或負號。Look for zero or one occurrence of either a positive sign or a negative sign.
([0-9]{0,3}(,[0-9]{3})*(\.[0-9]+)?) 此運算式外面括號將其定義成擷取群組或子運算式。The outer parentheses around this expression define it as a capturing group or a subexpression. 如果找到相符項目,從 Group 屬性傳回之 GroupCollection 物件中的第二個 Match.Groups 物件,擷取此部分比對字串的相關資訊。If a match is found, information about this part of the matching string can be retrieved from the second Group object in the GroupCollection object returned by the Match.Groups property. (集合中的第一個項目代表整個比對。)(The first element in the collection represents the entire match.)
[0-9]{0,3} 尋找出現零到三次的十進位數字 0 到 9。Look for zero to three occurrences of the decimal digits 0 through 9.
(,[0-9]{3})* 尋找出現零或多次、後面接三個十進位數字的群組分隔符號。Look for zero or more occurrences of a group separator followed by three decimal digits.
\. 尋找單次出現的十進位分隔符號。Look for a single occurrence of the decimal separator.
[0-9]+ 尋找一或多個十進位數字。Look for one or more decimal digits.
(\.[0-9]+)? 尋找出現零或一次、後接至少一個十進位數字的十進位分隔符號。Look for zero or one occurrence of the decimal separator followed by at least one decimal digit.

如果在輸入字串中找到上述每個子模式,則比對成功,並且會將包含此比對相關資訊的 Match 物件加入至 MatchCollection 物件。If each of these subpatterns is found in the input string, the match succeeds, and a Match object that contains information about the match is added to the MatchCollection object.

標題Title 說明Description
規則運算式語言 - 快速參考Regular Expression Language - Quick Reference 提供您可以用來定義規則運算式之字元、運算子和建構組合的資訊。Provides information on the set of characters, operators, and constructs that you can use to define regular expressions.
規則運算式物件模型The Regular Expression Object Model 提供資訊和程式碼範例,說明如何使用規則運算式類別。Provides information and code examples that illustrate how to use the regular expression classes.
規則運算式行為的詳細資訊Details of Regular Expression Behavior 提供 .NET 規則運算式之功能和行為的相關資訊。Provides information about the capabilities and behavior of .NET regular expressions.
規則運算式範例Regular Expression Examples 提供程式碼範例,以說明規則運算式的一般用法。Provides code examples that illustrate typical uses of regular expressions.

參考資料Reference

System.Text.RegularExpressions
System.Text.RegularExpressions.Regex
規則運算式 - 快速參考 (以 Word 格式下載)Regular Expressions - Quick Reference (download in Word format)
規則運算式 - 快速參考 (以 PDF 格式下載)Regular Expressions - Quick Reference (download in PDF format)