The Microsoft code name "M" Modeling Language Specification - Text Pattern Expressions

November 2009

[This content is no longer valid. For the latest information on "M", "Quadrant", SQL Server Modeling Services, and the Repository, see the Model Citizen blog.]

[This documentation targets the Microsoft SQL Server Modeling CTP (November 2009) and is subject to change in future releases. Blank topics are included as placeholders.]

1: Introduction to "M"
2: Lexical Structure
3: Text Pattern Expressions
4: Productions
5: Rules
6: Languages
7: Types
8: Computed and Stored Values
9: Expressions
10: Module
11: Attributes
12: Catalog
13: Standard Library
14: Glossary

3 Text Pattern Expressions

Text pattern expressions perform operations on the sets of possible text values that one or more terms recognize.

3.1 Primary Expressions

A primary expression can be:

  • A text literal
  • A reference to a syntax or token rule
  • An expression indicating a repeated sequence of primary expressions of a specified length
  • An expression indicating any of a continuous range of characters
  • An inline sequence of pattern declarations

The following grammar reflects this structure.

syntax SyntaxPrimaries

    = TextLiteral

    | ReferencePrimary

    | RepetitionPrimary

    | InlineRulePrimary

    | CharacterClassPrimary

    | AnyPrimary;

3.1.1 Character Class

A character class is a compact syntax for a range of continuous characters. This expression requires that the text literals be of length 1 and that the Unicode offset of the right operand be greater than that of the left.

syntax CharacterClassPrimary

    = TextLiteral ".." TextLiteral

The expression "0".."9" is equivalent to:

"0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"

3.1.2 References

A grammar reference is the name of another rule possibly with arguments for parameterized rules. All rules defined within the same language can be accessed without qualification. The protocol to access rules defined in a different language within the same module are defined in §6.2. The protocol to access rules defined in a different module are defined in §10.3.

syntax GrammarReference

    = Identifier

    | GrammarReference "." Identifier

    | GrammarReference "."  Identifier "(" TypeArguments ")"

    | Identifier  "(" TypeArguments ")";

syntax TypeArguments

    = PrimaryExpression

    | TypeArguments "," PrimaryExpression;

Note that whitespace between a rule name and its arguments list is significant to discriminate between a reference to a parameterized rule and a reference without parameters and an inline rule. In a reference to a parameterized rule, no whitespace is permitted between the identifier and the arguments.

3.1.3 Repetition operators

The repetition operators recognize a primary expression repeated a specified number of times. The number of repetitions can be stated as a (possibly open) integer range or using one of the Kleene operators, ?, +, *.

syntax RepetitionPrimary

    = Primary Range

    | Primary CollectionRanges;

syntax Range

    = "?"

    | "*"

    | "+";

syntax CollectionRanges

    = "#" IntegerLiteral

    | "#" IntegerLiteral ".." IntegerLiteral?;

The left operand of .. must be greater than zero and less than the right operand of .., if present.

"A"#5  recognizes exactly 5 "A"s  "AAAAA"

"A"#2..4      recognizes from 2 to 4 "A"s"AA", "AAA", "AAAA"

"A"#3..       recognizes 3 or more "A"s  "AAA", "AAAA", "AAAAA", . . .

The Kleene operators can be defined in terms of the collection range operator:

"A"? is equivalent to "A"#0..1

"A"+ is equivalent to "A"1..

"A"* is equivalent to "A"#0..

3.1.4 Inline Rules

An inline rule is a means to group pattern declarations together as a term. 

syntax InlineRulePrimary

    = "(" ProductionDeclarations ")";

An inline rule is typically used in conjunction with a range operator:

"A" ("," "A")* recognizes 1 or more "A"s separated by commas.

Although syntactically legal, variable bindings within inline rules are not accessible within the constructor of the containing production. Inline rules are described further in §5.4.

3.1.5 Any

The any term is a wildcard that matches any text value of length 1.

syntax AnyPrimary

    = "any";

"1", "z", and "*" all match any.

3.2 Term Operators

A primary term expression can be thought of as the set of possible text values that it recognizes. The term operators perform the standard set difference, intersection, and negation operations on these sets. (Pattern declarations perform the union operation with |.)

syntax TextPatternExpression

    = Difference;

syntax Difference

    = Intersect

    | Difference "-" Intersect;

syntax Intersect

    = Inverse

    | Intersect  "&"  Inverse;

syntax Inverse

    = Primary

    | "!"  Primary;

Inverse requires every value in the set of possible text values to be of length 1.

("11" | "12") – ("12" | "13") // recognizes "11"

("11" | "12") & ("12" | "13") // recognizes "12"

!("11" | "12") // is an error

!("1" | "2")   // recognizes any text value of length 1 other than "1" or "2"