正则表达式中的字符类Character classes in regular expressions

一个字符类定义一组字符,其中的任一字符均可出现在输入字符串中以便成功匹配。A character class defines a set of characters, any one of which can occur in an input string for a match to succeed. .NET 中的正则表达式语言支持以下字符类:The regular expression language in .NET supports the following character classes:

  • 正字符组。Positive character groups. 输入字符串中的字符必须匹配一组指定的字符中的某个字符。A character in the input string must match one of a specified set of characters. 有关详细信息,请参阅正字符组For more information, see Positive Character Group.

  • 负字符组。Negative character groups. 输入字符串中的字符不得匹配一组指定的字符中的某个字符。A character in the input string must not match one of a specified set of characters. 有关详细信息,请参阅负字符组For more information, see Negative Character Group.

  • 任意字符。Any character. 正则表达式中的 .(圆点或句点)字符是匹配除 \n 之外的任何字符的通配符字符。The . (dot or period) character in a regular expression is a wildcard character that matches any character except \n. 有关详细信息,请参阅任意字符For more information, see Any Character.

  • 通用 Unicode 类别或命名块。A general Unicode category or named block. 输入字符串中的字符必须为特定 Unicode 类别的成员,或必须位于一系列连续的 Unicode 字符中才能成功匹配。A character in the input string must be a member of a particular Unicode category or must fall within a contiguous range of Unicode characters for a match to succeed. 有关详细信息,请参阅 Unicode 类别或 Unicode 块For more information, see Unicode Category or Unicode Block.

  • 负通用 Unicode 类别或命名块。A negative general Unicode category or named block. 输入字符串中的字符不得为特定 Unicode 类别的成员,也不得位于一系列连续的 Unicode 字符中以便成功匹配。A character in the input string must not be a member of a particular Unicode category or must not fall within a contiguous range of Unicode characters for a match to succeed. 有关详细信息,请参阅负 Unicode 类别或 Unicode 块For more information, see Negative Unicode Category or Unicode Block.

  • 单词字符。A word character. 输入字符串中的字符可以属于适合单词中字符的任何 Unicode 类别。A character in the input string can belong to any of the Unicode categories that are appropriate for characters in words. 有关详细信息,请参阅单词字符For more information, see Word Character.

  • 非单词字符。A non-word character. 输入字符串中的字符可以属于作为非单词字符的任何 Unicode 类别。A character in the input string can belong to any Unicode category that is not a word character. 有关详细信息,请参阅非单词字符For more information, see Non-Word Character.

  • 空白字符。A white-space character. 输入字符串中的字符可以是任何 Unicode 分隔符字符以及众多控制字符中的任一字符。A character in the input string can be any Unicode separator character, as well as any one of a number of control characters. 有关详细信息,请参阅空白字符For more information, see White-Space Character.

  • 非空白字符。A non-white-space character. 输入字符串中的字符可以是作为非空白字符的任何字符。A character in the input string can be any character that is not a white-space character. 有关详细信息,请参阅非空白字符For more information, see Non-White-Space Character.

  • 十进制数字。A decimal digit. 输入字符串中的字符可以是归类为 Unicode 十进制数字的众多字符中的任一字符。A character in the input string can be any of a number of characters classified as Unicode decimal digits. 有关详细信息,请参阅十进制数字字符For more information, see Decimal Digit Character.

  • 非十进制数字。A non-decimal digit. 输入字符串中的字符可以是任何非 Unicode 十进制数字。A character in the input string can be anything other than a Unicode decimal digit. 有关详细信息,请参阅十进制数字字符For more information, see Decimal Digit Character.

.NET 支持字符类减法表达式,通过该表达式可以定义一组字符作为从一个字符类中排除另一字符类的结果。.NET supports character class subtraction expressions, which enables you to define a set of characters as the result of excluding one character class from another character class. 有关详细信息,请参阅字符类减法For more information, see Character Class Subtraction.

备注

按类别匹配字符的字符类(如用于匹配字词字符的 \w,或用于匹配 Unicode 类别的 \p{})依赖 CharUnicodeInfo 类提供字符类别信息。Character classes that match characters by category, such as \w to match word characters or \p{} to match a Unicode category, rely on the CharUnicodeInfo class to provide information about character categories. 从 .NET Framework 4.6.2 开始,字符类别基于 Unicode 标准 8.0.0 版Starting with the .NET Framework 4.6.2, character categories are based on The Unicode Standard, Version 8.0.0. 从 .NET Framework 4 到 .NET Framework 4.6.1,字符类别基于 Unicode 标准 6.3.0 版In the .NET Framework 4 through the .NET Framework 4.6.1, they are based on The Unicode Standard, Version 6.3.0.

正字符组:[ ]Positive character group: [ ]

正字符组指定一个字符列表,其中的任何一个字符可出现在输入字符串中以便进行匹配。A positive character group specifies a list of characters, any one of which may appear in an input string for a match to occur. 此字符列表可以单独指定和/或作为范围指定。This list of characters may be specified individually, as a range, or both.

用于指定各个字符列表的语法如下所示:The syntax for specifying a list of individual characters is as follows:

[*character_group*]

其中,character_group 是单个字符的列表,这些字符可出现在输入字符串中以便成功匹配。where character_group is a list of the individual characters that can appear in the input string for a match to succeed. character_group 可以包含一个或多个文本字符、转义字符或字符类的任意组合。character_group can consist of any combination of one or more literal characters, escape characters, or character classes.

用于指定字符范围的语法如下:The syntax for specifying a range of characters is as follows:

[firstCharacter-lastCharacter]

其中,firstCharacter 是范围的开始字符,lastCharacter 是范围的结束字符。where firstCharacter is the character that begins the range and lastCharacter is the character that ends the range. 字符范围是通过以下方式定义的一系列连续字符:指定系列中的第一个字符,连字符 (-),然后指定系列中的最后一个字符。A character range is a contiguous series of characters defined by specifying the first character in the series, a hyphen (-), and then the last character in the series. 如果两个字符具有相邻的 Unicode 码位,则这两个字符是连续的。Two characters are contiguous if they have adjacent Unicode code points. firstCharacter 必须是码位较低的字符,而 lastCharacter 必须是码位较高的字符。firstCharacter must be the character with the lower code point, and lastCharacter must be the character with the higher code point.

备注

由于正字符组可以包含一组字符和一个字符范围,因此连字符 (-) 始终被解释为范围分隔符,除非它是该组的第一个或最后一个字符。Because a positive character group can include both a set of characters and a character range, a hyphen character (-) is always interpreted as the range separator unless it is the first or last character of the group.

下表列出了一些常见的包含正字符类的正则表达式模式。Some common regular expression patterns that contain positive character classes are listed in the following table.

模式Pattern 描述Description
[aeiou] 匹配所有元音。Match all vowels.
[\p{P}\d] 匹配所有标点符号和十进制数字字符。Match all punctuation and decimal digit characters.
[\s\p{P}] 匹配所有空白和标点符号。Match all white space and punctuation.

下面的示例定义包含字符“a”和“e”的正字符组,以使输入字符串必须包含单词“grey”或“gray”且后跟另一个单词以便进行匹配。The following example defines a positive character group that contains the characters "a" and "e" so that the input string must contain the words "grey" or "gray" followed by another word for a match to occur.

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string pattern = @"gr[ae]y\s\S+?[\s\p{P}]";
      string input = "The gray wolf jumped over the grey wall.";
      MatchCollection matches = Regex.Matches(input, pattern);
      foreach (Match match in matches)
         Console.WriteLine($"'{match.Value}'");
   }
}
// The example displays the following output:
//       'gray wolf '
//       'grey wall.'
Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Dim pattern As String = "gr[ae]y\s\S+?[\s\p{P}]"
      Dim input As String = "The gray wolf jumped over the grey wall."
      Dim matches As MatchCollection = Regex.Matches(input, pattern)
      For Each match As Match In matches
         Console.WriteLine($"'{match.Value}'")
      Next
   End Sub
End Module
' The example displays the following output:
'       'gray wolf '
'       'grey wall.'

按以下方式定义正则表达式 gr[ae]y\s\S+?[\s|\p{P}]The regular expression gr[ae]y\s\S+?[\s|\p{P}] is defined as follows:

模式Pattern 描述Description
gr 匹配文本字符“gr”。Match the literal characters "gr".
[ae] 匹配“a”或“e”。Match either an "a" or an "e".
y\s 匹配后跟空白字符的文本字符“y”。Match the literal character "y" followed by a white-space character.
\S+? 匹配一个或多个非空白字符(但尽可能少)。Match one or more non-white-space characters, but as few as possible.
[\s\p{P}] 匹配空白字符或标点符号。Match either a white-space character or a punctuation mark.

下面的示例匹配以任何大写字母开头的单词。The following example matches words that begin with any capital letter. 它使用子表达式 [A-Z] 表示从 A 到 Z 的大写字母范围。It uses the subexpression [A-Z] to represent the range of capital letters from A to Z.

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string pattern = @"\b[A-Z]\w*\b";
      string input = "A city Albany Zulu maritime Marseilles";
      foreach (Match match in Regex.Matches(input, pattern))
         Console.WriteLine(match.Value);
   }
}
// The example displays the following output:
//       A
//       Albany
//       Zulu
//       Marseilles
Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Dim pattern As String = "\b[A-Z]\w*\b"
      Dim input As String = "A city Albany Zulu maritime Marseilles"
      For Each match As Match In Regex.Matches(input, pattern)
         Console.WriteLine(match.Value)
      Next
   End Sub
End Module

正则表达式 \b[A-Z]\w*\b 的定义如下表所示。The regular expression \b[A-Z]\w*\b is defined as shown in the following table.

模式Pattern 描述Description
\b 在单词边界处开始。Start at a word boundary.
[A-Z] 匹配从 A 到 Z 的所有大写字符。Match any uppercase character from A to Z.
\w* 匹配零个或多个单词字符。Match zero or more word characters.
\b 与字边界匹配。Match a word boundary.

负字符组:[^]Negative character group: [^]

负字符组指定一个字符列表,这些字符不得出现在输入字符串中以便进行匹配。A negative character group specifies a list of characters that must not appear in an input string for a match to occur. 此字符列表可以单独指定和/或作为范围指定。The list of characters may be specified individually, as a range, or both.

用于指定各个字符列表的语法如下所示:The syntax for specifying a list of individual characters is as follows:

[*^character_group*]

其中,character_group 是单个字符的列表,这些字符不可出现在输入字符串中以便成功匹配。where character_group is a list of the individual characters that cannot appear in the input string for a match to succeed. character_group 可以包含一个或多个文本字符、转义字符或字符类的任意组合。character_group can consist of any combination of one or more literal characters, escape characters, or character classes.

用于指定字符范围的语法如下:The syntax for specifying a range of characters is as follows:

[^*firstCharacter*-*lastCharacter*]

其中,firstCharacter 是范围的开始字符,lastCharacter 是范围的结束字符。where firstCharacter is the character that begins the range and lastCharacter is the character that ends the range. 字符范围是通过以下方式定义的一系列连续字符:指定系列中的第一个字符,连字符 (-),然后指定系列中的最后一个字符。A character range is a contiguous series of characters defined by specifying the first character in the series, a hyphen (-), and then the last character in the series. 如果两个字符具有相邻的 Unicode 码位,则这两个字符是连续的。Two characters are contiguous if they have adjacent Unicode code points. firstCharacter 必须是码位较低的字符,而 lastCharacter 必须是码位较高的字符。firstCharacter must be the character with the lower code point, and lastCharacter must be the character with the higher code point.

备注

由于负字符组可以包含一组字符和一个字符范围,因此连字符 (-) 始终被解释为范围分隔符,除非它是该组的第一个或最后一个字符。Because a negative character group can include both a set of characters and a character range, a hyphen character (-) is always interpreted as the range separator unless it is the first or last character of the group.

可以连接两个或更多字符范围。Two or more character ranges can be concatenated. 例如,若要指定从“0”至“9”的十进制数范围、从“a”至“f”的小写字母范围,以及从“A”至“F”的大写字母范围,请使用 [0-9a-fA-F]For example, to specify the range of decimal digits from "0" through "9", the range of lowercase letters from "a" through "f", and the range of uppercase letters from "A" through "F", use [0-9a-fA-F].

负字符组中的前导符 (^) 是强制的,指示字符组为负字符组,而不是正字符组。The leading carat character (^) in a negative character group is mandatory and indicates the character group is a negative character group instead of a positive character group.

重要

较大正则表达式模式中的负字符组不是零宽度断言。A negative character group in a larger regular expression pattern is not a zero-width assertion. 也就是说,在评估负字符组后,正则表达式引擎会在输入字符串中提升一个字符。That is, after evaluating the negative character group, the regular expression engine advances one character in the input string.

下表列出了一些常见的包含负字符组的正则表达式模式。Some common regular expression patterns that contain negative character groups are listed in the following table.

模式Pattern 描述Description
[^aeiou] 匹配除元音以外的所有字符。Match all characters except vowels.
[^\p{P}\d] 匹配标点符号和十进制数字字符以外的所有字符。Match all characters except punctuation and decimal digit characters.

下面的示例匹配以字符“th”开头且后面不跟“o”的任何单词。The following example matches any word that begins with the characters "th" and is not followed by an "o".

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string pattern = @"\bth[^o]\w+\b";
      string input = "thought thing though them through thus thorough this";
      foreach (Match match in Regex.Matches(input, pattern))
         Console.WriteLine(match.Value);
   }
}
// The example displays the following output:
//       thing
//       them
//       through
//       thus
//       this
Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Dim pattern As String = "\bth[^o]\w+\b"
      Dim input As String = "thought thing though them through thus " + _
                            "thorough this"
      For Each match As Match In Regex.Matches(input, pattern)
         Console.WriteLine(match.Value)
      Next
   End Sub
End Module
' The example displays the following output:
'       thing
'       them
'       through
'       thus
'       this

正则表达式 \bth[^o]\w+\b 的定义如下表所示。The regular expression \bth[^o]\w+\b is defined as shown in the following table.

模式Pattern 描述Description
\b 在单词边界处开始。Start at a word boundary.
th 匹配文本字符“th”。Match the literal characters "th".
[^o] 与不是“o”的任何字符匹配。Match any character that is not an "o".
\w+ 匹配一个或多个单词字符。Match one or more word characters.
\b 在字边界结束。End at a word boundary.

任意字符:.Any character: .

句点字符 (.) 匹配除 \n(换行符 \u000A)之外的任何字符,有以下两个限制:The period character (.) matches any character except \n (the newline character, \u000A), with the following two qualifications:

  • 如果通过 RegexOptions.Singleline 选项修改正则表达式模式,或者通过 . 选项修改包含 s 字符类的模式的部分,则 . 可匹配任何字符。If a regular expression pattern is modified by the RegexOptions.Singleline option, or if the portion of the pattern that contains the . character class is modified by the s option, . matches any character. 有关详细信息,请参阅 正则表达式选项For more information, see Regular Expression Options.

    下面的示例阐释了默认情况下以及使用 . 选项的情况下 RegexOptions.Singleline 字符类的不同的行为。The following example illustrates the different behavior of the . character class by default and with the RegexOptions.Singleline option. 正则表达式 ^.+ 在字符串开头开始并匹配每个字符。The regular expression ^.+ starts at the beginning of the string and matches every character. 默认情况下,匹配在第一行的结尾结束;正则表达式模式匹配回车符、\r 或 \u000D,但不匹配 \nBy default, the match ends at the end of the first line; the regular expression pattern matches the carriage return character, \r or \u000D, but it does not match \n. 由于 RegexOptions.Singleline 选项将整个输入字符串解释为单行,因此它匹配输入字符串中的每个字符,包括 \nBecause the RegexOptions.Singleline option interprets the entire input string as a single line, it matches every character in the input string, including \n.

    using System;
    using System.Text.RegularExpressions;
    
    public class Example
    {
       public static void Main()
       {
          string pattern = "^.+";
          string input = "This is one line and" + Environment.NewLine + "this is the second.";
          foreach (Match match in Regex.Matches(input, pattern))
             Console.WriteLine(Regex.Escape(match.Value));
    
          Console.WriteLine();
          foreach (Match match in Regex.Matches(input, pattern, RegexOptions.Singleline))
             Console.WriteLine(Regex.Escape(match.Value));
       }
    }
    // The example displays the following output:
    //       This\ is\ one\ line\ and\r
    //       
    //       This\ is\ one\ line\ and\r\nthis\ is\ the\ second\.
    
    Imports System.Text.RegularExpressions
    
    Module Example
       Public Sub Main()
          Dim pattern As String = "^.+"
          Dim input As String = "This is one line and" + vbCrLf + "this is the second."
          For Each match As Match In Regex.Matches(input, pattern)
             Console.WriteLine(Regex.Escape(match.Value))
          Next
          Console.WriteLine()
          For Each match As Match In Regex.Matches(input, pattern, RegexOptions.SingleLine)
             Console.WriteLine(Regex.Escape(match.Value))
          Next
       End Sub
    End Module
    ' The example displays the following output:
    '       This\ is\ one\ line\ and\r
    '       
    '       This\ is\ one\ line\ and\r\nthis\ is\ the\ second\.
    

备注

由于它匹配除 \n 之外的任何字符,因此 . 字符类也匹配 \r(回车符 \u000D)。Because it matches any character except \n, the . character class also matches \r (the carriage return character, \u000D).

  • 正字符组或负字符组中的句点字符将被视为原义句点字符,而非字符类。In a positive or negative character group, a period is treated as a literal period character, and not as a character class. 有关详细信息,请参阅本主题前面部分的正字符组负字符组For more information, see Positive Character Group and Negative Character Group earlier in this topic. 下面的示例通过定义包括句点字符 (.) 的正则表达式作为字符类和正字符组的成员来进行这方面的演示。The following example provides an illustration by defining a regular expression that includes the period character (.) both as a character class and as a member of a positive character group. 正则表达式 \b.*[.?!;:](\s|\z) 在字边界处开始,匹配任何字符直到遇到五个标点符号标记之一(包括句点),然后匹配空白字符或字符串的末尾。The regular expression \b.*[.?!;:](\s|\z) begins at a word boundary, matches any character until it encounters one of five punctuation marks, including a period, and then matches either a white-space character or the end of the string.

    using System;
    using System.Text.RegularExpressions;
    
    public class Example
    {
       public static void Main()
       {
          string pattern = @"\b.*[.?!;:](\s|\z)";
          string input = "this. what: is? go, thing.";
          foreach (Match match in Regex.Matches(input, pattern))
             Console.WriteLine(match.Value);
       }
    }
    // The example displays the following output:
    //       this. what: is? go, thing.
    
    Imports System.Text.RegularExpressions
    
    Module Example
       Public Sub Main()
          Dim pattern As STring = "\b.*[.?!;:](\s|\z)"
          Dim input As String = "this. what: is? go, thing."
          For Each match As Match In Regex.Matches(input, pattern)
             Console.WriteLine(match.Value)
          Next   
       End Sub
    End Module
    ' The example displays the following output:
    '       this. what: is? go, thing.
    

备注

由于它匹配任何字符,因此当正则表达式模式尝试多次匹配任何字符时,. 语言元素通常会与惰性限定符一起使用。Because it matches any character, the . language element is often used with a lazy quantifier if a regular expression pattern attempts to match any character multiple times. 有关更多信息,请参见 数量词For more information, see Quantifiers.

Unicode 类别或 Unicode 块:\p{}Unicode category or Unicode block: \p{}

Unicode 标准为每个常规类别分配一个字符。The Unicode standard assigns each character a general category. 例如,特定字符可以是大写字母(由 Lu 类别表示),十进制数字(Nd 类别)、数学符号(Sm 类别)或段落分隔符(Zl 类别)。For example, a particular character can be an uppercase letter (represented by the Lu category), a decimal digit (the Nd category), a math symbol (the Sm category), or a paragraph separator (the Zl category). Unicode 标准中的特定字符集也占据连续码位的特定区域或块。Specific character sets in the Unicode standard also occupy a specific range or block of consecutive code points. 例如,可在 \u0000 和 \u007F 之间找到基本拉丁字符集,并可在 \u0600 和 \u06FF 之间找到阿拉伯语字符集。For example, the basic Latin character set is found from \u0000 through \u007F, while the Arabic character set is found from \u0600 through \u06FF.

正则表达式构造The regular expression construct

\p{ name }\p{ name }

匹配属于 Unicode 常规类别或命名块的任何字符,其中,name 是类别缩写或命名块的名称。matches any character that belongs to a Unicode general category or named block, where name is the category abbreviation or named block name. 有关类别缩写的列表,请参阅本主题稍后的支持的 Unicode 常规类别部分。For a list of category abbreviations, see the Supported Unicode General Categories section later in this topic. 有关命名块的列表,请参阅本主题稍后的支持的命名块部分。For a list of named blocks, see the Supported Named Blocks section later in this topic.

下面的示例使用 \p{名称} 构造以匹配 Unicode 常规类别(在该示例中为 Pd 或“标点,短划线”类别)和命名块(IsGreekIsBasicLatin 命名块)。The following example uses the \p{name} construct to match both a Unicode general category (in this case, the Pd, or Punctuation, Dash category) and a named block (the IsGreek and IsBasicLatin named blocks).

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string pattern = @"\b(\p{IsGreek}+(\s)?)+\p{Pd}\s(\p{IsBasicLatin}+(\s)?)+";
      string input = "Κατα Μαθθαίον - The Gospel of Matthew";

      Console.WriteLine(Regex.IsMatch(input, pattern));        // Displays True.
   }
}
Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Dim pattern As String = "\b(\p{IsGreek}+(\s)?)+\p{Pd}\s(\p{IsBasicLatin}+(\s)?)+"
      Dim input As String = "Κατα Μαθθαίον - The Gospel of Matthew"

      Console.WriteLine(Regex.IsMatch(input, pattern))         ' Displays True.
   End Sub
End Module

正则表达式 \b(\p{IsGreek}+(\s)?)+\p{Pd}\s(\p{IsBasicLatin}+(\s)?)+ 的定义如下表所示。The regular expression \b(\p{IsGreek}+(\s)?)+\p{Pd}\s(\p{IsBasicLatin}+(\s)?)+ is defined as shown in the following table.

模式Pattern 描述Description
\b 在单词边界处开始。Start at a word boundary.
\p{IsGreek}+ 匹配一个或多个希腊语字符。Match one or more Greek characters.
(\s)? 匹配零个或一个空白字符。Match zero or one white-space character.
(\p{IsGreek}+(\s)?)+ 匹配一个或多个希腊语字符后跟零个或一个空白字符的模式一次或多次。Match the pattern of one or more Greek characters followed by zero or one white-space characters one or more times.
\p{Pd} 匹配“标点,短划线”字符。Match a Punctuation, Dash character.
\s 与空白字符匹配。Match a white-space character.
\p{IsBasicLatin}+ 匹配一个或多个基本拉丁字符。Match one or more basic Latin characters.
(\s)? 匹配零个或一个空白字符。Match zero or one white-space character.
(\p{IsBasicLatin}+(\s)?)+ 匹配一个或多个基本拉丁字符后跟零个或一个空白字符的模式一次或多次。Match the pattern of one or more basic Latin characters followed by zero or one white-space characters one or more times.

负 Unicode 类别或 Unicode 块:\P{}Negative Unicode category or Unicode block: \P{}

Unicode 标准为每个常规类别分配一个字符。The Unicode standard assigns each character a general category. 例如,特定字符可以是大写字母(由 Lu 类别表示),十进制数字(Nd 类别)、数学符号(Sm 类别)或段落分隔符(Zl 类别)。For example, a particular character can be an uppercase letter (represented by the Lu category), a decimal digit (the Nd category), a math symbol (the Sm category), or a paragraph separator (the Zl category). Unicode 标准中的特定字符集也占据连续码位的特定区域或块。Specific character sets in the Unicode standard also occupy a specific range or block of consecutive code points. 例如,可在 \u0000 和 \u007F 之间找到基本拉丁字符集,并可在 \u0600 和 \u06FF 之间找到阿拉伯语字符集。For example, the basic Latin character set is found from \u0000 through \u007F, while the Arabic character set is found from \u0600 through \u06FF.

正则表达式构造The regular expression construct

\P{ name }\P{ name }

匹配不属于 Unicode 常规类别或命名块的任何字符,其中,name是类别缩写或命名块的名称。matches any character that does not belong to a Unicode general category or named block, where name is the category abbreviation or named block name. 有关类别缩写的列表,请参阅本主题稍后的支持的 Unicode 常规类别部分。For a list of category abbreviations, see the Supported Unicode General Categories section later in this topic. 有关命名块的列表,请参阅本主题稍后的支持的命名块部分。For a list of named blocks, see the Supported Named Blocks section later in this topic.

下面的示例使用 \P{name}构造来删除数字字符串中的任何货币符号(在该示例中为 Sc 或“符号,货币”类别)。The following example uses the \P{name} construct to remove any currency symbols (in this case, the Sc, or Symbol, Currency category) from numeric strings.

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string pattern = @"(\P{Sc})+";
      
      string[] values = { "$164,091.78", "£1,073,142.68", "73¢", "€120" };
      foreach (string value in values)
         Console.WriteLine(Regex.Match(value, pattern).Value);
   }
}
// The example displays the following output:
//       164,091.78
//       1,073,142.68
//       73
//       120
Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Dim pattern As String = "(\P{Sc})+"
      
      Dim values() As String = { "$164,091.78", "£1,073,142.68", "73¢", "€120"}
      For Each value As String In values
         Console.WriteLine(Regex.Match(value, pattern).Value)
      Next
   End Sub
End Module
' The example displays the following output:
'       164,091.78
'       1,073,142.68
'       73
'       120

正则表达式模式 (\P{Sc})+ 匹配不为货币符号的一个或多个字符;它有效地从结果字符串中抽出任何货币符号。The regular expression pattern (\P{Sc})+ matches one or more characters that are not currency symbols; it effectively strips any currency symbol from the result string.

单词字符:\wWord character: \w

\w 与任何单词字符匹配。\w matches any word character. 单词字符是下表中列出的任何 Unicode 类别的成员。A word character is a member of any of the Unicode categories listed in the following table.

类别Category 描述Description
LlLl 字母,小写Letter, Lowercase
LuLu 字母,大写Letter, Uppercase
LtLt 字母,首字母大写Letter, Titlecase
LoLo 字母,其他Letter, Other
LmLm 字母,修饰符Letter, Modifier
MnMn 标记,非间距Mark, Nonspacing
NdNd 数字,十进制数Number, Decimal Digit
PcPc 标点,连接符。Punctuation, Connector. 此类别包含 10 个字符,最常用的字符是 LOWLINE 字符 (),u+005F。This category includes ten characters, the most commonly used of which is the LOWLINE character (), u+005F.

如果指定了符合 ECMAScript 的行为,则 \w 等效于 [a-zA-Z_0-9]If ECMAScript-compliant behavior is specified, \w is equivalent to [a-zA-Z_0-9]. 有关 ECMAScript 正则表达式的信息,请参阅正则表达式选项中的“ECMAScript 匹配行为”部分。For information on ECMAScript regular expressions, see the "ECMAScript Matching Behavior" section in Regular Expression Options.

备注

由于它匹配任何单词字符,因此当正则表达式模式尝试多次匹配任何单词字符且后跟特定单词字符时,\w 语言元素通常会与惰性限定符一起使用。Because it matches any word character, the \w language element is often used with a lazy quantifier if a regular expression pattern attempts to match any word character multiple times, followed by a specific word character. 有关更多信息,请参见 数量词For more information, see Quantifiers.

下面的示例使用 \w 语言元素来匹配单词中的重复字符。The following example uses the \w language element to match duplicate characters in a word. 该示例定义可按如下方式解释的正则表达式模式 (\w)\1The example defines a regular expression pattern, (\w)\1, which can be interpreted as follows.

元素Element 描述Description
(\w)(\w) 匹配单词字符。Match a word character. 这是第一个捕获组。This is the first capturing group.
\1\1 匹配第一次捕获的值。Match the value of the first capture.
using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string pattern = @"(\w)\1";
      string[] words = { "trellis", "seer", "latter", "summer", 
                         "hoarse", "lesser", "aardvark", "stunned" };
      foreach (string word in words)
      {
         Match match = Regex.Match(word, pattern);
         if (match.Success)
            Console.WriteLine("'{0}' found in '{1}' at position {2}.", 
                              match.Value, word, match.Index);
         else
            Console.WriteLine("No double characters in '{0}'.", word);
      }                                                  
   }
}
// The example displays the following output:
//       'll' found in 'trellis' at position 3.
//       'ee' found in 'seer' at position 1.
//       'tt' found in 'latter' at position 2.
//       'mm' found in 'summer' at position 2.
//       No double characters in 'hoarse'.
//       'ss' found in 'lesser' at position 2.
//       'aa' found in 'aardvark' at position 0.
//       'nn' found in 'stunned' at position 3.
Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Dim pattern As String = "(\w)\1"
      Dim words() As String = { "trellis", "seer", "latter", "summer", _
                                "hoarse", "lesser", "aardvark", "stunned" }
      For Each word As String In words
         Dim match As Match = Regex.Match(word, pattern)
         If match.Success Then
            Console.WriteLine("'{0}' found in '{1}' at position {2}.", _
                              match.Value, word, match.Index)
         Else
            Console.WriteLine("No double characters in '{0}'.", word)
         End If
      Next                                                  
   End Sub
End Module
' The example displays the following output:
'       'll' found in 'trellis' at position 3.
'       'ee' found in 'seer' at position 1.
'       'tt' found in 'latter' at position 2.
'       'mm' found in 'summer' at position 2.
'       No double characters in 'hoarse'.
'       'ss' found in 'lesser' at position 2.
'       'aa' found in 'aardvark' at position 0.
'       'nn' found in 'stunned' at position 3.

非单词字符:\WNon-word character: \W

\W 匹配任何非单词字符。\W matches any non-word character. \W 语言元素等效于以下字符类:The \W language element is equivalent to the following character class:

[^\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}\p{Pc}\p{Lm}]

换言之,它与下表列出的 Unicode 类别中的字符以外的任何字符匹配。In other words, it matches any character except for those in the Unicode categories listed in the following table.

类别Category 描述Description
LlLl 字母,小写Letter, Lowercase
LuLu 字母,大写Letter, Uppercase
LtLt 字母,首字母大写Letter, Titlecase
LoLo 字母,其他Letter, Other
LmLm 字母,修饰符Letter, Modifier
MnMn 标记,非间距Mark, Nonspacing
NdNd 数字,十进制数Number, Decimal Digit
PcPc 标点,连接符。Punctuation, Connector. 此类别包含 10 个字符,最常用的字符是 LOWLINE 字符 (),u+005F。This category includes ten characters, the most commonly used of which is the LOWLINE character (), u+005F.

如果指定了符合 ECMAScript 的行为,则 \W 等效于 [^a-zA-Z_0-9]If ECMAScript-compliant behavior is specified, \W is equivalent to [^a-zA-Z_0-9]. 有关 ECMAScript 正则表达式的信息,请参阅正则表达式选项中的“ECMAScript 匹配行为”部分。For information on ECMAScript regular expressions, see the "ECMAScript Matching Behavior" section in Regular Expression Options.

备注

由于它匹配任何非单词字符,因此当正则表达式模式尝试多次匹配任何非单词字符且后跟特定非单词字符时,\W 语言元素通常会与惰性限定符一起使用。Because it matches any non-word character, the \W language element is often used with a lazy quantifier if a regular expression pattern attempts to match any non-word character multiple times followed by a specific non-word character. 有关更多信息,请参见 数量词For more information, see Quantifiers.

下面的示例阐释 \W 字符类。The following example illustrates the \W character class. 它定义正则表达式模式 \b(\w+)(\W){1,2},该模式匹配后跟一个或两个非单词字符(例如,空白或标点符号)的单词。It defines a regular expression pattern, \b(\w+)(\W){1,2}, that matches a word followed by one or two non-word characters, such as white space or punctuation. 正则表达式模式可以解释为下表中所示内容。The regular expression is interpreted as shown in the following table.

元素Element 描述Description
\b\b 在单词边界处开始匹配。Begin the match at a word boundary.
(\w+)(\w+) 匹配一个或多个单词字符。Match one or more word characters. 这是第一个捕获组。This is the first capturing group.
(\W){1,2}(\W){1,2} 匹配非单词字符一次或两次。Match a non-word character either one or two times. 这是第二个捕获组。This is the second capturing group.
using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string pattern = @"\b(\w+)(\W){1,2}";
      string input = "The old, grey mare slowly walked across the narrow, green pasture.";
      foreach (Match match in Regex.Matches(input, pattern))
      {
         Console.WriteLine(match.Value);
         Console.Write("   Non-word character(s):");
         CaptureCollection captures = match.Groups[2].Captures;
         for (int ctr = 0; ctr < captures.Count; ctr++)
             Console.Write(@"'{0}' (\u{1}){2}", captures[ctr].Value, 
                           Convert.ToUInt16(captures[ctr].Value[0]).ToString("X4"), 
                           ctr < captures.Count - 1 ? ", " : "");
         Console.WriteLine();
      }   
   }
}
// The example displays the following output:
//       The
//          Non-word character(s):' ' (\u0020)
//       old,
//          Non-word character(s):',' (\u002C), ' ' (\u0020)
//       grey
//          Non-word character(s):' ' (\u0020)
//       mare
//          Non-word character(s):' ' (\u0020)
//       slowly
//          Non-word character(s):' ' (\u0020)
//       walked
//          Non-word character(s):' ' (\u0020)
//       across
//          Non-word character(s):' ' (\u0020)
//       the
//          Non-word character(s):' ' (\u0020)
//       narrow,
//          Non-word character(s):',' (\u002C), ' ' (\u0020)
//       green
//          Non-word character(s):' ' (\u0020)
//       pasture.
//          Non-word character(s):'.' (\u002E)
Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Dim pattern As String = "\b(\w+)(\W){1,2}"
      Dim input As String = "The old, grey mare slowly walked across the narrow, green pasture."
      For Each match As Match In Regex.Matches(input, pattern)
         Console.WriteLine(match.Value)
         Console.Write("   Non-word character(s):")
         Dim captures As CaptureCollection = match.Groups(2).Captures
         For ctr As Integer = 0 To captures.Count - 1
             Console.Write("'{0}' (\u{1}){2}", captures(ctr).Value, _
                           Convert.ToUInt16(captures(ctr).Value.Chars(0)).ToString("X4"), _
                           If(ctr < captures.Count - 1, ", ", ""))
         Next
         Console.WriteLine()
      Next
   End Sub
End Module
' The example displays the following output:
'       The
'          Non-word character(s):' ' (\u0020)
'       old,
'          Non-word character(s):',' (\u002C), ' ' (\u0020)
'       grey
'          Non-word character(s):' ' (\u0020)
'       mare
'          Non-word character(s):' ' (\u0020)
'       slowly
'          Non-word character(s):' ' (\u0020)
'       walked
'          Non-word character(s):' ' (\u0020)
'       across
'          Non-word character(s):' ' (\u0020)
'       the
'          Non-word character(s):' ' (\u0020)
'       narrow,
'          Non-word character(s):',' (\u002C), ' ' (\u0020)
'       green
'          Non-word character(s):' ' (\u0020)
'       pasture.
'          Non-word character(s):'.' (\u002E)

由于第二个捕获组的 Group 对象仅包含单个捕获的非单词字符,因此该示例将从 CaptureCollection 属性返回的 Group.Captures 对象中检索所有捕获的非单词字符。Because the Group object for the second capturing group contains only a single captured non-word character, the example retrieves all captured non-word characters from the CaptureCollection object that is returned by the Group.Captures property.

空格字符:\sWhitespace character: \s

\s 匹配任意空格字符。\s matches any whitespace character. 它等效于下表中列出的转义序列和 Unicode 类别。It is equivalent to the escape sequences and Unicode categories listed in the following table.

类别Category 描述Description
\f 窗体换页符,\u000C。The form feed character, \u000C.
\n 换行符,\u000A。The newline character, \u000A.
\r 回车符,\u000D。The carriage return character, \u000D.
\t 制表符,\u0009。The tab character, \u0009.
\v 垂直制表符,\u000B。The vertical tab character, \u000B.
\x85 省略号或 NEXT LINE (NEL) 字符 (…),\u0085。The ellipsis or NEXT LINE (NEL) character (…), \u0085.
\p{Z} 匹配任何分隔符。Matches any separator character.

如果指定了符合 ECMAScript 的行为,则 \s 等效于 [ \f\n\r\t\v]If ECMAScript-compliant behavior is specified, \s is equivalent to [ \f\n\r\t\v]. 有关 ECMAScript 正则表达式的信息,请参阅正则表达式选项中的“ECMAScript 匹配行为”部分。For information on ECMAScript regular expressions, see the "ECMAScript Matching Behavior" section in Regular Expression Options.

下面的示例阐释 \s 字符类。The following example illustrates the \s character class. 它定义正则表达式模式 \b\w+(e)?s(\s|$),该模式匹配以“s”或“es”结尾且后跟一个空白字符或输入字符串末尾的单词。It defines a regular expression pattern, \b\w+(e)?s(\s|$), that matches a word ending in either "s" or "es" followed by either a white-space character or the end of the input string. 正则表达式模式可以解释为下表中所示内容。The regular expression is interpreted as shown in the following table.

元素Element 描述Description
\b\b 在单词边界处开始匹配。Begin the match at a word boundary.
\w+\w+ 匹配一个或多个单词字符。Match one or more word characters.
(e)?(e)? 匹配“e”零次或一次。Match an "e" either zero or one time.
ss 匹配“s”。Match an "s".
(\s|$)(\s|$) 匹配空白字符或输入字符串的末尾。Match either a white-space character or the end of the input string.
using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string pattern = @"\b\w+(e)?s(\s|$)";
      string input = "matches stores stops leave leaves";
      foreach (Match match in Regex.Matches(input, pattern))
         Console.WriteLine(match.Value);
   }
}
// The example displays the following output:
//       matches
//       stores
//       stops
//       leaves
Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Dim pattern As String = "\b\w+(e)?s(\s|$)"
      Dim input As String = "matches stores stops leave leaves"
      For Each match As Match In Regex.Matches(input, pattern)
         Console.WriteLine(match.Value)      
      Next
   End Sub
End Module
' The example displays the following output:
'       matches
'       stores
'       stops
'       leaves

非空格字符:\SNon-whitespace character: \S

\S 匹配任何非空白字符。\S matches any non-white-space character. 它等效于 [^\f\n\r\t\v\x85\p{Z}] 正则表达式模式或与等效于 \s 的正则表达式模式(与空白字符匹配)相反。It is equivalent to the [^\f\n\r\t\v\x85\p{Z}] regular expression pattern, or the opposite of the regular expression pattern that is equivalent to \s, which matches white-space characters. 有关详细信息,请参阅空白字符:\sFor more information, see White-Space Character: \s.

如果指定了符合 ECMAScript 的行为,则 \S 等效于 [^ \f\n\r\t\v]If ECMAScript-compliant behavior is specified, \S is equivalent to [^ \f\n\r\t\v]. 有关 ECMAScript 正则表达式的信息,请参阅正则表达式选项中的“ECMAScript 匹配行为”部分。For information on ECMAScript regular expressions, see the "ECMAScript Matching Behavior" section in Regular Expression Options.

下面的示例阐释 \S 语言元素。The following example illustrates the \S language element. 正则表达式模式 \b(\S+)\s? 匹配由空白字符分隔的字符串。The regular expression pattern \b(\S+)\s? matches strings that are delimited by white-space characters. 匹配项的 GroupCollection 对象中的第二个元素包含匹配的字符串。The second element in the match's GroupCollection object contains the matched string. 正则表达式可按下表中的方式解释。The regular expression can be interpreted as shown in the following table.

元素Element 描述Description
\b 在单词边界处开始匹配。Begin the match at a word boundary.
(\S+) 匹配一个或多个非空白字符。Match one or more non-white-space characters. 这是第一个捕获组。This is the first capturing group.
\s? 匹配零个或一个空白字符。Match zero or one white-space character.
using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string pattern = @"\b(\S+)\s?";
      string input = "This is the first sentence of the first paragraph. " + 
                            "This is the second sentence.\n" + 
                            "This is the only sentence of the second paragraph.";
      foreach (Match match in Regex.Matches(input, pattern))
         Console.WriteLine(match.Groups[1]);
   }
}
// The example displays the following output:
//    This
//    is
//    the
//    first
//    sentence
//    of
//    the
//    first
//    paragraph.
//    This
//    is
//    the
//    second
//    sentence.
//    This
//    is
//    the
//    only
//    sentence
//    of
//    the
//    second
//    paragraph.
Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Dim pattern As String = "\b(\S+)\s?"
      Dim input As String = "This is the first sentence of the first paragraph. " + _
                            "This is the second sentence." + vbCrLf + _
                            "This is the only sentence of the second paragraph."
      For Each match As Match In Regex.Matches(input, pattern)
         Console.WriteLine(match.Groups(1))
      Next
   End Sub
End Module
' The example displays the following output:
'    This
'    is
'    the
'    first
'    sentence
'    of
'    the
'    first
'    paragraph.
'    This
'    is
'    the
'    second
'    sentence.
'    This
'    is
'    the
'    only
'    sentence
'    of
'    the
'    second
'    paragraph.

十进制数字字符:\dDecimal digit character: \d

\d 匹配任何十进制数字。\d matches any decimal digit. 它等效于 \p{Nd} 正则表达式模式,该模式包含标准的十进制数字 0-9 以及众多其他字符集的十进制数字。It is equivalent to the \p{Nd} regular expression pattern, which includes the standard decimal digits 0-9 as well as the decimal digits of a number of other character sets.

如果指定了符合 ECMAScript 的行为,则 \d 等效于 [0-9]If ECMAScript-compliant behavior is specified, \d is equivalent to [0-9]. 有关 ECMAScript 正则表达式的信息,请参阅正则表达式选项中的“ECMAScript 匹配行为”部分。For information on ECMAScript regular expressions, see the "ECMAScript Matching Behavior" section in Regular Expression Options.

下面的示例阐释 \d 语言元素。The following example illustrates the \d language element. 它测试输入字符串是否表示美国和加拿大的有效电话号码。It tests whether an input string represents a valid telephone number in the United States and Canada. 正则表达式模式 ^(\(?\d{3}\)?[\s-])?\d{3}-\d{4}$ 的定义如下表所示。The regular expression pattern ^(\(?\d{3}\)?[\s-])?\d{3}-\d{4}$ is defined as shown in the following table.

元素Element 描述Description
^ 从输入字符串的开头部分开始匹配。Begin the match at the beginning of the input string.
\(? 匹配零个或一个“(”文本字符。Match zero or one literal "(" character.
\d{3} 匹配三个十进制数字。Match three decimal digits.
\)? 匹配零个或一个“)”文本字符。Match zero or one literal ")" character.
[\s-] 匹配连字符或空白字符。Match a hyphen or a white-space character.
(\(?\d{3}\)?[\s-])? 匹配后跟三个十进制数字的可选左括号、可选右括号和空白字符或连字符零次或一次。Match an optional opening parenthesis followed by three decimal digits, an optional closing parenthesis, and either a white-space character or a hyphen zero or one time. 这是第一个捕获组。This is the first capturing group.
\d{3}-\d{4} 匹配后跟连字符和四个以上的十进制数字的三个十进制数字。Match three decimal digits followed by a hyphen and four more decimal digits.
$ 匹配输入字符串的末尾部分。Match the end of the input string.
using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string pattern = @"^(\(?\d{3}\)?[\s-])?\d{3}-\d{4}$";
      string[] inputs = { "111 111-1111", "222-2222", "222 333-444", 
                          "(212) 111-1111", "111-AB1-1111", 
                          "212-111-1111", "01 999-9999" };
      
      foreach (string input in inputs)
      {
         if (Regex.IsMatch(input, pattern)) 
            Console.WriteLine(input + ": matched");
         else
            Console.WriteLine(input + ": match failed");
      }
   }
}
// The example displays the following output:
//       111 111-1111: matched
//       222-2222: matched
//       222 333-444: match failed
//       (212) 111-1111: matched
//       111-AB1-1111: match failed
//       212-111-1111: matched
//       01 999-9999: match failed
Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Dim pattern As String = "^(\(?\d{3}\)?[\s-])?\d{3}-\d{4}$"
      Dim inputs() As String = { "111 111-1111", "222-2222", "222 333-444", _
                                 "(212) 111-1111", "111-AB1-1111", _
                                 "212-111-1111", "01 999-9999" }
      
      For Each input As String In inputs
         If Regex.IsMatch(input, pattern) Then 
            Console.WriteLine(input + ": matched")
         Else
            Console.WriteLine(input + ": match failed")
         End If   
      Next
   End Sub
End Module
' The example displays the following output:
'       111 111-1111: matched
'       222-2222: matched
'       222 333-444: match failed
'       (212) 111-1111: matched
'       111-AB1-1111: match failed
'       212-111-1111: matched
'       01 999-9999: match failed

非数字字符:\DNon-digit character: \D

\D 匹配任何非数字字符。\D matches any non-digit character. 它等效于 \P{Nd} 正则表达式模式。It is equivalent to the \P{Nd} regular expression pattern.

如果指定了符合 ECMAScript 的行为,则 \D 等效于 [^0-9]If ECMAScript-compliant behavior is specified, \D is equivalent to [^0-9]. 有关 ECMAScript 正则表达式的信息,请参阅正则表达式选项中的“ECMAScript 匹配行为”部分。For information on ECMAScript regular expressions, see the "ECMAScript Matching Behavior" section in Regular Expression Options.

下面的示例阐释了 \D 语言元素。The following example illustrates the \D language element. 它测试部件号等字符串是否包含适当的十进制和非十进制数字字符的组合。It tests whether a string such as a part number consists of the appropriate combination of decimal and non-decimal characters. 正则表达式模式 ^\D\d{1,5}\D*$ 的定义如下表所示。The regular expression pattern ^\D\d{1,5}\D*$ is defined as shown in the following table.

元素Element 描述Description
^ 从输入字符串的开头部分开始匹配。Begin the match at the beginning of the input string.
\D 匹配非数字字符。Match a non-digit character.
\d{1,5} 匹配一到五个十进制数字。Match from one to five decimal digits.
\D* 匹配零个、一个或多个非十进制字符。Match zero, one, or more non-decimal characters.
$ 匹配输入字符串的末尾部分。Match the end of the input string.
using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string pattern = @"^\D\d{1,5}\D*$"; 
      string[] inputs = { "A1039C", "AA0001", "C18A", "Y938518" }; 
      
      foreach (string input in inputs)
      {
         if (Regex.IsMatch(input, pattern))
            Console.WriteLine(input + ": matched");
         else
            Console.WriteLine(input + ": match failed");
      }
   }
}
// The example displays the following output:
//       A1039C: matched
//       AA0001: match failed
//       C18A: matched
//       Y938518: match failed
Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Dim pattern As String = "^\D\d{1,5}\D*$" 
      Dim inputs() As String = { "A1039C", "AA0001", "C18A", "Y938518" } 
      
      For Each input As String In inputs
         If Regex.IsMatch(input, pattern) Then
            Console.WriteLine(input + ": matched")
         Else
            Console.WriteLine(input + ": match failed")
         End If   
      Next
   End Sub
End Module
' The example displays the following output:

支持的 Unicode 常规类别Supported Unicode general categories

Unicode 定义下表列出的常规类别。Unicode defines the general categories listed in the following table. 有关详细信息,请参阅 Unicode 字符数据库中的“UCD 文件格式”和“常规类别值”子主题。For more information, see the "UCD File Format" and "General Category Values" subtopics at the Unicode Character Database.

类别Category 描述Description
Lu 字母,大写Letter, Uppercase
Ll 字母,小写Letter, Lowercase
Lt 字母,首字母大写Letter, Titlecase
Lm 字母,修饰符Letter, Modifier
Lo 字母,其他Letter, Other
L 所有字母字符。All letter characters. 这包括 LuLlLtLmLo 字符。This includes the Lu, Ll, Lt, Lm, and Lo characters.
Mn 标记,非间距Mark, Nonspacing
Mc 标记,间距组合Mark, Spacing Combining
Me 标记,封闭Mark, Enclosing
M 所有音调符号标记。All diacritic marks. 这包括 MnMcMe 类别。This includes the Mn, Mc, and Me categories.
Nd 数字,十进制数Number, Decimal Digit
Nl 数字,字母Number, Letter
No 数字,其他Number, Other
N 所有数字。All numbers. 这包括 NdNlNo 类别。This includes the Nd, Nl, and No categories.
Pc 标点,连接符Punctuation, Connector
Pd 标点,短划线Punctuation, Dash
Ps 标点,开始Punctuation, Open
Pe 标点,结束Punctuation, Close
Pi 标点,前引号(根据具体使用情况,作用可能像 Ps 或 Pe)Punctuation, Initial quote (may behave like Ps or Pe depending on usage)
Pf 标点,后引号(根据具体使用情况,作用可能像 Ps 或 Pe)Punctuation, Final quote (may behave like Ps or Pe depending on usage)
Po 标点,其他Punctuation, Other
P 所有标点字符。All punctuation characters. 这包括 PcPdPs, PePiPfPo 类别。This includes the Pc, Pd, Ps, Pe, Pi, Pf, and Po categories.
Sm 符号,数学Symbol, Math
Sc 符号,货币Symbol, Currency
Sk 符号,修饰符Symbol, Modifier
So 符号,其他Symbol, Other
S 所有符号。All symbols. 这包括 SmScSkSo 类别。This includes the Sm, Sc, Sk, and So categories.
Zs 分隔符,空白Separator, Space
Zl 分隔符,行Separator, Line
Zp 分隔符,段落Separator, Paragraph
Z 所有分隔符字符。All separator characters. 这包括 ZsZlZp 类别。This includes the Zs, Zl, and Zp categories.
Cc 其他,控制Other, Control
Cf 其他,格式Other, Format
Cs 其他,代理项Other, Surrogate
Co 其他,私用Other, Private Use
Cn 其他,未赋值(任何字符都不具有此属性)Other, Not Assigned (no characters have this property)
C 所有控制字符。All control characters. 这包括 CcCfCsCoCn 类别。This includes the Cc, Cf, Cs, Co, and Cn categories.

可以通过将任何特定字符传递到 GetUnicodeCategory 方法来确定该字符的 Unicode 类别。You can determine the Unicode category of any particular character by passing that character to the GetUnicodeCategory method. 下面的示例使用 GetUnicodeCategory 方法来确定包含所选拉丁字符的数组中的每个元素的类别。The following example uses the GetUnicodeCategory method to determine the category of each element in an array that contains selected Latin characters.

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      char[] chars = { 'a', 'X', '8', ',', ' ', '\u0009', '!' };
      
      foreach (char ch in chars)
         Console.WriteLine("'{0}': {1}", Regex.Escape(ch.ToString()), 
                           Char.GetUnicodeCategory(ch));
   }
}
// The example displays the following output:
//       'a': LowercaseLetter
//       'X': UppercaseLetter
//       '8': DecimalDigitNumber
//       ',': OtherPunctuation
//       '\ ': SpaceSeparator
//       '\t': Control
//       '!': OtherPunctuation
Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Dim chars() As Char = { "a"c, "X"c, "8"c, ","c, " "c, ChrW(9), "!"c }
      
      For Each ch As Char In chars
         Console.WriteLine("'{0}': {1}", Regex.Escape(ch.ToString()), _
                           Char.GetUnicodeCategory(ch))
      Next         
   End Sub
End Module
' The example displays the following output:
'       'a': LowercaseLetter
'       'X': UppercaseLetter
'       '8': DecimalDigitNumber
'       ',': OtherPunctuation
'       '\ ': SpaceSeparator
'       '\t': Control
'       '!': OtherPunctuation

支持的命名块Supported named blocks

.NET 提供下表中所列的命名块。.NET provides the named blocks listed in the following table. 该组支持的命名块基于 Unicode 4.0 和 Perl 5.6。The set of supported named blocks is based on Unicode 4.0 and Perl 5.6. 有关正则表达式使用命名块,请参阅 Unicode 类别或 Unicode 块: \p{} 部分。For a regular expression that uses named blocks, see the Unicode category or Unicode block: \p{} section.

码位范围Code point range 块名称Block name
0000 - 007F0000 - 007F IsBasicLatin
0080 - 00FF0080 - 00FF IsLatin-1Supplement
0100 - 017F0100 - 017F IsLatinExtended-A
0180 - 024F0180 - 024F IsLatinExtended-B
0250 - 02AF0250 - 02AF IsIPAExtensions
02B0 - 02FF02B0 - 02FF IsSpacingModifierLetters
0300 - 036F0300 - 036F IsCombiningDiacriticalMarks
0370 - 03FF0370 - 03FF IsGreek

- 或 --or-

IsGreekandCoptic
0400 - 04FF0400 - 04FF IsCyrillic
0500 - 052F0500 - 052F IsCyrillicSupplement
0530 - 058F0530 - 058F IsArmenian
0590 - 05FF0590 - 05FF IsHebrew
0600 - 06FF0600 - 06FF IsArabic
0700 - 074F0700 - 074F IsSyriac
0780 - 07BF0780 - 07BF IsThaana
0900 - 097F0900 - 097F IsDevanagari
0980 - 09FF0980 - 09FF IsBengali
0A00 - 0A7F0A00 - 0A7F IsGurmukhi
0A80 - 0AFF0A80 - 0AFF IsGujarati
0B00 - 0B7F0B00 - 0B7F IsOriya
0B80 - 0BFF0B80 - 0BFF IsTamil
0C00 - 0C7F0C00 - 0C7F IsTelugu
0C80 - 0CFF0C80 - 0CFF IsKannada
0D00 - 0D7F0D00 - 0D7F IsMalayalam
0D80 - 0DFF0D80 - 0DFF IsSinhala
0E00 - 0E7F0E00 - 0E7F IsThai
0E80 - 0EFF0E80 - 0EFF IsLao
0F00 - 0FFF0F00 - 0FFF IsTibetan
1000 - 109F1000 - 109F IsMyanmar
10A0 - 10FF10A0 - 10FF IsGeorgian
1100 - 11FF1100 - 11FF IsHangulJamo
1200 - 137F1200 - 137F IsEthiopic
13A0 - 13FF13A0 - 13FF IsCherokee
1400 - 167F1400 - 167F IsUnifiedCanadianAboriginalSyllabics
1680 - 169F1680 - 169F IsOgham
16A0 - 16FF16A0 - 16FF IsRunic
1700 - 171F1700 - 171F IsTagalog
1720 - 173F1720 - 173F IsHanunoo
1740 - 175F1740 - 175F IsBuhid
1760 - 177F1760 - 177F IsTagbanwa
1780 - 17FF1780 - 17FF IsKhmer
1800 - 18AF1800 - 18AF IsMongolian
1900 - 194F1900 - 194F IsLimbu
1950 - 197F1950 - 197F IsTaiLe
19E0 - 19FF19E0 - 19FF IsKhmerSymbols
1D00 - 1D7F1D00 - 1D7F IsPhoneticExtensions
1E00 - 1EFF1E00 - 1EFF IsLatinExtendedAdditional
1F00 - 1FFF1F00 - 1FFF IsGreekExtended
2000 - 206F2000 - 206F IsGeneralPunctuation
2070 - 209F2070 - 209F IsSuperscriptsandSubscripts
20A0 - 20CF20A0 - 20CF IsCurrencySymbols
20D0 - 20FF20D0 - 20FF IsCombiningDiacriticalMarksforSymbols

- 或 --or-

IsCombiningMarksforSymbols
2100 - 214F2100 - 214F IsLetterlikeSymbols
2150 - 218F2150 - 218F IsNumberForms
2190 - 21FF2190 - 21FF IsArrows
2200 - 22FF2200 - 22FF IsMathematicalOperators
2300 - 23FF2300 - 23FF IsMiscellaneousTechnical
2400 - 243F2400 - 243F IsControlPictures
2440 - 245F2440 - 245F IsOpticalCharacterRecognition
2460 - 24FF2460 - 24FF IsEnclosedAlphanumerics
2500 - 257F2500 - 257F IsBoxDrawing
2580 - 259F2580 - 259F IsBlockElements
25A0 - 25FF25A0 - 25FF IsGeometricShapes
2600 - 26FF2600 - 26FF IsMiscellaneousSymbols
2700 - 27BF2700 - 27BF IsDingbats
27C0 - 27EF27C0 - 27EF IsMiscellaneousMathematicalSymbols-A
27F0 - 27FF27F0 - 27FF IsSupplementalArrows-A
2800 - 28FF2800 - 28FF IsBraillePatterns
2900 - 297F2900 - 297F IsSupplementalArrows-B
2980 - 29FF2980 - 29FF IsMiscellaneousMathematicalSymbols-B
2A00 - 2AFF2A00 - 2AFF IsSupplementalMathematicalOperators
2B00 - 2BFF2B00 - 2BFF IsMiscellaneousSymbolsandArrows
2E80 - 2EFF2E80 - 2EFF IsCJKRadicalsSupplement
2F00 - 2FDF2F00 - 2FDF IsKangxiRadicals
2FF0 - 2FFF2FF0 - 2FFF IsIdeographicDescriptionCharacters
3000 - 303F3000 - 303F IsCJKSymbolsandPunctuation
3040 - 309F3040 - 309F IsHiragana
30A0 - 30FF30A0 - 30FF IsKatakana
3100 - 312F3100 - 312F IsBopomofo
3130 - 318F3130 - 318F IsHangulCompatibilityJamo
3190 - 319F3190 - 319F IsKanbun
31A0 - 31BF31A0 - 31BF IsBopomofoExtended
31F0 - 31FF31F0 - 31FF IsKatakanaPhoneticExtensions
3200 - 32FF3200 - 32FF IsEnclosedCJKLettersandMonths
3300 - 33FF3300 - 33FF IsCJKCompatibility
3400 - 4DBF3400 - 4DBF IsCJKUnifiedIdeographsExtensionA
4DC0 - 4DFF4DC0 - 4DFF IsYijingHexagramSymbols
4E00 - 9FFF4E00 - 9FFF IsCJKUnifiedIdeographs
A000 - A48FA000 - A48F IsYiSyllables
A490 - A4CFA490 - A4CF IsYiRadicals
AC00 - D7AFAC00 - D7AF IsHangulSyllables
D800 - DB7FD800 - DB7F IsHighSurrogates
DB80 - DBFFDB80 - DBFF IsHighPrivateUseSurrogates
DC00 - DFFFDC00 - DFFF IsLowSurrogates
E000 - F8FFE000 - F8FF IsPrivateUseIsPrivateUseAreaIsPrivateUse or IsPrivateUseArea
F900 - FAFFF900 - FAFF IsCJKCompatibilityIdeographs
FB00 - FB4FFB00 - FB4F IsAlphabeticPresentationForms
FB50 - FDFFFB50 - FDFF IsArabicPresentationForms-A
FE00 - FE0FFE00 - FE0F IsVariationSelectors
FE20 - FE2FFE20 - FE2F IsCombiningHalfMarks
FE30 - FE4FFE30 - FE4F IsCJKCompatibilityForms
FE50 - FE6FFE50 - FE6F IsSmallFormVariants
FE70 - FEFFFE70 - FEFF IsArabicPresentationForms-B
FF00 - FFEFFF00 - FFEF IsHalfwidthandFullwidthForms
FFF0 - FFFFFFF0 - FFFF IsSpecials

字符类减法:[base_group - [excluded_group]]Character class subtraction: [base_group - [excluded_group]]

一个字符类定义一组字符。A character class defines a set of characters. 字符类减法将产生一组字符,该组字符是从一个字符类中排除另一个字符类中的字符的结果。Character class subtraction yields a set of characters that is the result of excluding the characters in one character class from another character class.

字符类减法表达式具有以下形式:A character class subtraction expression has the following form:

[ base_group -[ excluded_group ]][ base_group -[ excluded_group ]]

方括号 ([]) 和连字符 (-) 是强制的。The square brackets ([]) and hyphen (-) are mandatory. base_group 是正字符组负字符组The base_group is a positive character group or a negative character group. excluded_group 部分是另一个正字符组或负字符组,或者是另一个字符类减法表达式(即,可以嵌套字符类减法表达式)。The excluded_group component is another positive or negative character group, or another character class subtraction expression (that is, you can nest character class subtraction expressions).

例如,假设你有一个由从“a”至“z”范围内的字符组成的基本组。For example, suppose you have a base group that consists of the character range from "a" through "z". 若要定义由除字符“m”之外的基本组组成的字符集,请使用 [a-z-[m]]To define the set of characters that consists of the base group except for the character "m", use [a-z-[m]]. 若要定义由除字符集“d”、“j”和“p”之外的基本组组成的字符集,请使用 [a-z-[djp]]To define the set of characters that consists of the base group except for the set of characters "d", "j", and "p", use [a-z-[djp]]. 若要定义由除从“m”至“p”字符范围之外的基本组组成的字符集,请使用 [a-z-[m-p]]To define the set of characters that consists of the base group except for the character range from "m" through "p", use [a-z-[m-p]].

可考虑使用嵌套字符类减法表达式 [a-z-[d-w-[m-o]]]Consider the nested character class subtraction expression, [a-z-[d-w-[m-o]]]. 该表达式由最里面的字符范围向外计算。The expression is evaluated from the innermost character range outward. 首先,在从“d”至“w”的字符范围中减去从“m”至“o”的字符范围,这将产生从“d”至“l”和从“p”至“w”的字符集。First, the character range from "m" through "o" is subtracted from the character range "d" through "w", which yields the set of characters from "d" through "l" and "p" through "w". 然后,在从“a”至“z”的字符范围中减去该集合,这将产生字符集 [abcmnoxyz]That set is then subtracted from the character range from "a" through "z", which yields the set of characters [abcmnoxyz].

可以将任何字符类用于字符类减法。You can use any character class with character class subtraction. 若要定义字符集,且该字符集包括除空白字符 (\s)、标点通用类别中的字符 (\p{P})、IsGreek 命名块中的字符 (\p{IsGreek}) 以及 Unicode NEXT LINE 控制字符 (\x85) 之外的所有从 \u0000 至 \uFFFF 的 Unicode 字符,请使用 [\u0000-\uFFFF-[\s\p{P}\p{IsGreek}\x85]]To define the set of characters that consists of all Unicode characters from \u0000 through \uFFFF except white-space characters (\s), the characters in the punctuation general category (\p{P}), the characters in the IsGreek named block (\p{IsGreek}), and the Unicode NEXT LINE control character (\x85), use [\u0000-\uFFFF-[\s\p{P}\p{IsGreek}\x85]].

为字符类减法表达式选择将会产生有用结果的字符类。Choose character classes for a character class subtraction expression that will yield useful results. 避免使用产生空字符集的表达式,这将无法匹配任何内容,同时避免使用等效于初始基本组的表达式。Avoid an expression that yields an empty set of characters, which cannot match anything, or an expression that is equivalent to the original base group. 例如,表达式 [\p{IsBasicLatin}-[\x00-\x7F]]IsBasicLatin 常规类别中减去 IsBasicLatin 字符范围内的所有字符,其结果为空集合。For example, the empty set is the result of the expression [\p{IsBasicLatin}-[\x00-\x7F]], which subtracts all characters in the IsBasicLatin character range from the IsBasicLatin general category. 类似地,表达式 [a-z-[0-9]] 的结果为初始基本组。Similarly, the original base group is the result of the expression [a-z-[0-9]]. 这是因为,基本组(它是从“a”至“z”的字母组成的字符范围)不包含排除组(它是从“0”至“9”的十进制数组成的字符范围)中的任何字符。This is because the base group, which is the character range of letters from "a" through "z", does not contain any characters in the excluded group, which is the character range of decimal digits from "0" through "9".

下面的示例定义正则表达式 ^[0-9-[2468]]+$,该表达式匹配输入字符串中的零和奇数。The following example defines a regular expression, ^[0-9-[2468]]+$, that matches zero and odd digits in an input string. 正则表达式模式可以解释为下表中所示内容。The regular expression is interpreted as shown in the following table.

元素Element 描述Description
^ 从输入字符串的开头处开始进行匹配。Begin the match at the start of the input string.
[0-9-[2468]]+ 匹配任意字符(从 0 到 9,除了 2、4、6 和 8 之外)的一个或多个匹配项。Match one or more occurrences of any character from 0 to 9 except for 2, 4, 6, and 8. 换句话说,匹配零或奇数的一个或多个匹配项。In other words, match one or more occurrences of zero or an odd digit.
$ 在输入字符串末尾结束匹配。End the match at the end of the input string.
using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string[] inputs = { "123", "13579753", "3557798", "335599901" };
      string pattern = @"^[0-9-[2468]]+$";
      
      foreach (string input in inputs)
      {
         Match match = Regex.Match(input, pattern);
         if (match.Success) 
            Console.WriteLine(match.Value);
      }      
   }
}
// The example displays the following output:
//       13579753
//       335599901
Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Dim inputs() As String = { "123", "13579753", "3557798", "335599901" }
      Dim pattern As String = "^[0-9-[2468]]+$"
      
      For Each input As String In inputs
         Dim match As Match = Regex.Match(input, pattern)
         If match.Success Then Console.WriteLine(match.Value)
      Next
   End Sub
End Module
' The example displays the following output:
'       13579753
'       335599901

请参阅See also