The Microsoft code name "M" Modeling Language Specification - Rules

Microsoft Corporation

November 2009

[This content is no longer valid. For the latest information on "M", "Quadrant", SQL Server Modeling Services, and the Repository, see theModel Citizen blog.]

[This documentation targets the Microsoft SQL Server Modeling CTP (November 2009) and is subject to change in future releases. Blank topics are included as placeholders.]

1: Introduction to "M"
2: Lexical Structure
3: Text Pattern Expressions
4: Productions
5: Rules
6: Languages
7: Types
8: Computed and Stored Values
9: Expressions
10: Module
11: Attributes
12: Catalog
13: Standard Library
14: Glossary

5 Rules

A rule is a named collection of alternative productions. There are three kinds of rules: syntax, token, and interleave. A text value conforms to a rule if it conforms to any one of the productions in the rule. If a text value conforms to more than one production in the rule, then the rule is ambiguous. The three different kinds of rules differ in how they treat ambiguity and how they handle their output.

syntax RuleDeclaration

    = Attributes? MemberModifier? Kind Name RuleParameters? RuleBody ";";

syntax Kind

    = "token"

    | "syntax"

    | "interleave";

syntax MemberModifier

    = "final";

syntax RuleBody

    = "="  ProductionDeclarations;

syntax ProductionDeclarations

    = ProductionDeclaration

    | ProductionDeclarations  "|"  ProductionDeclaration;

The rule Main below recognizes the two text values "Hello" and "Goodbye".

module HelloGoodby {

    language HelloGoodbye {

        syntax Main

          = "Hello"

          | "Goodbye";



5.1 Token Rules

Token rules recognize a restricted family of languages. However, token rules can be negated, intersected and subtracted which is not the case for syntax rules. Attempting to perform these operations on a syntax rule results in an error. The output from a token rule is the text matched by the token. No constructor may be defined.

5.1.1 Final Modifier

Token rules do not permit precedence directives in the rule body. They have a built in protocol to deal with ambiguous productions. A language processor attempts to match all tokens in the language against a text value starting with the first character, then the first two, etc. If two or more productions within the same token or two different tokens can match the beginning of a text value, a token rule will choose the production with the longest match. If all matches are exactly the same length, the language processor will choose a token rule marked final if present. If no token rule is marked final, all the matches succeed and the language processor evaluates whether each alternative is recognized in a larger context. The language processor retains all of the matches and begins attempting to match a new token starting with the first character that has not already been matched.

5.2 Syntax Rules

Syntax rules recognize all languages that M is capable of defining. The Main start rule must be a syntax rule. Syntax rules allow all precedence directives and may have constructors.

5.3 Interleave Rules

An interleave rule recognizes the same family of languages as a token rule and also cannot have constructors. Further, interleave rules cannot have parameters and the name of an interleave rule cannot be references.

Text that matches an interleave rule is excluded from further processing.

The following example demonstrates whitespace handling with an interleave rule:

module HelloWorld {

    language HelloWorld {

        syntax Main =

          = Hello World;

        token Hello

          = "Hello";

        token World

          = "World";

        interleave Whitespace

          = " ";



This language recognizes the text value "Hello World". It also recognizes "Hello    World",

"    Hello World", "Hello World    ", and "HelloWorld". It does not recognize

"He llo World" because "He" does not match any token.

5.4 Inline Rules

An inline rule is an anonymous rule embedded within the pattern of a production. The inline rule is processed as any other rule however it cannot be reused since it does not have a name. Variables defined within an inline rule are scoped to their productions as usual. A variable may be bound to the output of an inline rule as with any pattern.

In the following Example1 and Example2 recognize the same language and produce the same output. Example1 uses a named rule AppleOrOrange while Example2 states the same rule inline.

module Example {

    language Example1 {

        syntax Main

          = aos:AppleOrOrange*

            => aos;


        syntax AppleOrOrange

          = "Apple" => Apple{}

          | "Orange" => Orange{};


    language Example2 {

        syntax Main

          = aos:("Apple" => Apple{} | "Orange" => Orange{})*

            => aos;



5.5 Rule Parameters

A rule may define parameters which can be used within the body of the rule.

syntax RuleParameters

    = "(" RuleParameterList ")"; 

syntax RuleParameterList

    = RuleParameter

    | RuleParameterList "," RuleParameter;

syntax RuleParameter

    = Identifier;

A single rule identifier may have multiple definitions with different numbers of parameters. The following example uses List(Content,Separator) to define List(Content) with a default separator of ",".

module HelloWorld {

    language HelloWorld {

        syntax Main

          = List(Hello);

        token Hello

          = "Hello";

        syntax List(Content, Separator)

          = Content

          | List(Content,Separator) Separator Content;


        syntax List(Content) = List(Content, ",");



This language will recognize "Hello", "Hello,Hello", "Hello,Hello,Hello", etc.