The Microsoft code name "M" Modeling Language Specification - Introduction

[This content is no longer valid. For the latest information on "M", "Quadrant", SQL Server Modeling Services, and the Repository, see the Model Citizen blog**.]****

 

Microsoft Corporation

November 2009

[This documentation targets the Microsoft SQL Server Modeling CTP (November 2009) and is subject to change in future releases. Blank topics are included as placeholders.]

Sections:
1: Introduction to "M"
2: Lexical Structure
3: Text Pattern Expressions
4: Productions
5: Rules
6: Languages
7: Types
8: Computed and Stored Values
9: Expressions
10: Module
11: Attributes
12: Catalog
13: Standard Library
14: Glossary

The Microsoft code name "M" Modeling Language, hereinafter referred to as M, is a language for modeling domains using text. A domain is any collection of related concepts or objects. Modeling domain consists of selecting certain characteristics to include in the model and implicitly excluding others deemed irrelevant. Modeling using text has some advantages and disadvantages over modeling using other media such as diagrams or clay. A goal of the M language is to exploit these advantages and mitigate the disadvantages. 

A key advantage of modeling in text is ease with which both computers and humans can store and process text. Text is often the most natural way to represent information for presentation and editing by people. However, the ability to extract that information for use by software has been an arcane art practiced only by the most advanced developers. The language feature of M enables information to be represented in a textual form that is tuned for both the problem domain and the target audience. The M language provides simple constructs for describing the shape of a textual language – that shape includes the input syntax as well as the structure and contents of the underlying information. To that end, M acts as both a schema language that can validate that textual input conforms to a given language as well as a transformation language that projects textual input into data structures that are amenable to further processing or storage.

M builds on 4 basic concepts:

1.            Language – a collection of rules that recognize free text and produce a structured representation of relevant concepts in the text.

2.            Data – a sparse textual representation of information amenable to automated storage, transformation and communication.

3.            Constraint – a rule that recognizes specific structure and relationships within data.

4.            Transformation – a mapping between source data and result data.

1.1 Expressions

The easiest way to get started with M is to look at some values. M has intrinsic support for constructing values. The following is a legal value in M:

"Hello, world"

The quotation marks tell M that this is the text value Hello, world. M literals can also be numbers. The following literal:

1

is the numeric value one. Finally, there are two literals that represent logical values:

true

false

We’ve just seen examples of using literals to write down textual, numeric, and logical values. We can also use expressions to write down values that are computed.

An M expression applies an operator to zero or more operands to produce a result. An operator is either a built-in operator (e.g., +) or a user-defined function (which we’ll look at in Section Error! Reference source not found.). An operand is a value that is used by the operator to calculate the result of the expression, which is itself a value. Expressions nest, so the operands themselves can be expressions.

We'll write the result of evaluating an expression as expression ⇓ result. The down arrow is not part of the M language. It is used for documentation purposes only. Here are some examples:

1 ⇓ 1              // A literal evaluates to itself

1 + 1 ⇓ 2          // Math works as usual

1 + "a" ⇓ error    // Some expressions results in error

M defines two equality operators: equals, ==, and not equals, !=, both of which result in either true or false based on the equivalence/nonequivalence of the two operands. Here are some expressions that use the equality operators:

1 == 1 ⇓ true

"Hello" == "hELLO" ⇓ false

true != false ⇓ true

M defines the standard four relational operators less-than <, greater-than >, less-than-or-equal <=, and greater-than-or-equal >=, which work over numeric and textual values. M also defines the standard three logical operators: and &&, or ||, and not ! that combine logical values.

The following expressions show these operators in action:

1 < 4 ⇓ true

1 == 1 ⇓ true

1 > 4 ⇓ false

1 + 1 == 3 ⇓ false

(1 + 1 == 3) || (2 + 2 < 10) ⇓ true

(1 + 1 == 2) && (2 + 2 < 10) ⇓ true

1.1.1 Collections

An M collection is a value that groups together zero or more elements which themselves are values. We can write down a collection using { }:

{ 1, 2 }

{ 1 }

{ }

As with simple values, the equivalence operators == and != are defined over collections. In M, two collections are considered equivalent if and only if each element has a distinct equivalent element in the other collection. That allows us to write the following equivalence expressions:

{ 1, 2 } == { 1, 2 }

{ 1, 2 } != { 1 }

both of which are true.

The elements of a collection can consist of different kinds of values:

{ true, "Hello" }

and these values can be the result of arbitrary calculation:

{ 1 + 2, 99 – 3, 4 < 9 }

which is equivalent to the collection:

{ 3, 96, true }

The order of elements in a collection is not significant.

{ 1, 2 } == { 2, 1 } ⇓ true

Finally, collections can contain duplicate elements, which are significant.

{ 1, 2, 2 } == { 1, 2 } ⇓ false

M defines a set of built-in operators that are specific to collections. The in operator tests whether a given value is an element of the collection. The result of the in operator is a logical value that indicates whether the value is or is not an element of the collection.

1 in { 1, 2, 3 } ⇓ true

1 in { "Hello", 9 } ⇓ false

M defines a Count member on collections that calculates the number of elements in a collection.

{ 1, 2, 2, 3 }.Count ⇓ 4

The postfix # operator also returns the count of a collection

{ 1, 2, 2, 3 }# ⇓ 4

As noted earlier, M collections may contain duplicates. You can apply the Distinct member to get a version of the collection with any duplicates removed:

{ 1, 2, 3, 1 }.Distinct ⇓ { 1, 2, 3 }

The result of Distinct is not just a collection but is also a set, i.e. a collection of distinct elements.

M also defines set union "|" and set intersection "&" operators, which also yield sets:

({ 1, 2, 3, 1 } | { 1, 2, 4 }) == { 1, 2, 3, 4 }

({ 1, 2, 3, 1 } & { 1, 2, 4 }) == { 1, 2 }

Note that union and intersection always return collections that are sets, even when applied to collections that contain duplicates.

M defines the subset and superset using <= and >=. Again these operations convert collections to sets. The following expressions evaluate to true.

{ 1, 2 } <= { 1, 2, 3 } ⇓ true

{ "Hello", "World" } >= { "World" } ⇓ true

{ 1, 2, 1 } <= { 1, 2, 3 } ⇓ true

The where operator applies a logical expression (called the predicate) to each element in a collection (called the domain) and results in a new collection that consists of only the elements for which the predicate holds true. To allow the element to be used in the predicate, the where operator introduces the symbol value to stand in for the specific element being tested.

For example, consider this expression that uses a where operator:

{ 1, 2, 3, 4, 5, 6 } where value > 3 ⇓ { 4, 5, 6 }

In this example, the domain is the collection { 1, 2, 3, 4, 5, 6 } and the predicate is the expression value > 3. Note that the identifier value is available only within the scope of the predicate expression. The result of this expression is the collection { 4, 5, 6 }.

While the where operator allows elements to be accessed based on a calculation over the values of each element. There are situations where it would be much more convenient to simply assign names to each element and then access the element values by its assigned name. M defines a distinct kind of value called an entity for just this purpose.

1.1.2 Lists

Basic collections are unordered. Ordered collections are represented by lists. The following values are lists:

[1, 2, 3]

["hello", 1, true]

[]

Elements in a list have a distinct position. Two lists with the same elements but in different positions are not equal:

[1,2,3] == [3,2,1] ⇓ false

Ordered collections are collections so members and operators defined on collections work on lists as well. Where possible the order of the collection is maintained:

[1, 2, 3].Count ⇓ 3

[1, 2, 3, 4, 5, 6, 7, 8, 9] where value > 5 ⇓ [6, 7, 8, 9]

[1, 2, 3] select value * value ⇓ [1, 4, 9]

1.1.3 Entities

An entity consists of zero or more name-value pairs called fields. Here’s a simple entity value:

{ X => 100, Y => 200 }

This entity has two fields: one named X with the value of 100, the other named Y with the value of 200.

Entity initializers can use arbitrary expressions as field values:

{ X => 50 + 50, Y => 300 - 100 }

And the names of members can be arbitrary Unicode text:

{ @[Horizontal Coordinate] => 100, @[Vertical Coordinate] => 200 }

If the member name matches the Identifer pattern, it can be written without the surrounding ' @[]'. An identifier must begin with an upper or lowercase letter or "_" and be followed by a sequence of letters, digits, "_", and "$".

Here are a few examples:

HelloWorld => 1      // matches the Identifier pattern

'Hello World' => 1   // doesn’t match identifier pattern – escape it

_HelloWorld => 1     // matches the Identifier pattern

A => 1               // matches the Identifier pattern

'1' => 1             // doesn’t match identifier pattern – escape it

It is always legal to use '@[] ' to escape symbolic names; however, most of the examples in this document use names that don’t require escaping and therefore do not use escaping syntax for readability.

M imposes no limitations on the values of entity members. It is legal for the value of an entity member to refer to another entity:

{ TopLeft { X => 100, Y => 200 }, BottomRight { X => 400, Y => 100 } }

or a collection:

{ LotteryPicks { 1, 18, 25, 32, 55, 61 }, Odds => 0.00000001 }

or a collection of entities:

{

  Color => "Red",

  Path {

    { X => 100, Y => 100 },

    { X => 200, Y => 200 },

    { X => 300, Y => 100 },

    { X => 300, Y => 100 },

  }

}

This last example illustrates that entity values are legal for use as elements in collections.

Entity initializers are useful for constructing new entity values. M defines the dot, ".", operator over entities for accessing the value of a given member. For example, this expression:

{ X => 100, Y => 200 }.X

yields the value of the X member, which in this case is 100. The result of the dot operator is just a value that is subject to subsequent operations. For example, this expression:

{ Center { X => 100, Y => 200 }, Radius => 3 }.Center.Y

yields the value 200.

1.2 Language

1.2.1 Basics

A M language definition consists of one or more named rules, each of which describe some part of the language. The following fragment is a simple language definition:

language HelloLanguage {

  syntax Main = "Hello, World";

}

The language being specified is named HelloLanguage and it is described by one rule named Main. A language may contain more than one rule; the name Main is used to designate the initial rule that all input documents must match in order to be considered valid with respect to the language.

Rules use patterns to describe the set of input values that the rule applies to. The Main rule above has only one pattern, "Hello, World" that describes exactly one legal input value:

Hello, World

If that input is fed to the M processor for this language, the processor will report that the input is valid. Any other input will cause the processor to report the input as invalid.

Typically, a rule will use multiple patterns to describe alternative input formats that are logically related. For example, consider this language:

language PrimaryColors {

  syntax Main = "Red" | "Green" | "Blue";

}

The Main rule has three patterns – input must conform to one of these patterns in order for the rule to apply. That means that the following is valid:

Red

as well as this:

Green

and this:

Blue

No other input values are valid in this language.

Most patterns in the wild are more expressive than those we’ve seen so far – most patterns combine multiple terms. Every pattern consists of a sequence of one or more grammar terms, each of which describes a set of legal text values. Pattern matching has the effect of consuming the input as it sequentially matches the terms in the pattern. Each term in the pattern consumes zero or more initial characters of input – the remainder of the input is then matched against the next term in the pattern. If all of the terms in a pattern cannot be matched the consumption is “undone” and the original input will used as a candidate for matching against other patterns within the rule.

A pattern term can either specify a literal value (like in our first example) or the name of another rule. The following language definition matches the same input as the first example:

language HelloLanguage2 {

  syntax Main = Prefix ", " Suffix;

  syntax Prefix = "Hello";

  syntax Suffix = "World";

}

Like functions in a traditional programming language, rules can be declared to accept parameters. A parameterized rule declares one or more “holes” that must be specified to use the rule. The following is a parameterized rule:

syntax Greeting(salutation, separator) = salutation separator "World";

To use a parameterized rule, one simply provides actual rules as arguments to be substituted for the declared parameters:

syntax Main = Greeting(Prefix, ", ");

A given rule name may be declared multiple times provided each declaration has a different number of parameters. That is, the following is legal:

syntax Greeting(salutation, sep, subject) = salutation sep subject;

syntax Greeting(salutation, sep) = salutation sep "World";

syntax Greeting(sep) = "Hello" sep "World";

syntax Greeting = "Hello" ", " "World";

The selection of which rule is used is determined based on the number of arguments present in the usage of the rule.

A pattern may indicate that a given term may match repeatedly using the standard Kleene operators (e.g., ?, *, and +). For example, consider this language:

language HelloLanguage3 {

  syntax Main = Prefix ", "? Suffix*;

  syntax Prefix = "Hello";

  syntax Suffix = "World";

}

This language considers the following all to be valid:

Hello

Hello,

Hello, World

Hello, WorldWorld

HelloWorldWorldWorld

Terms can be grouped using parentheses to indicate that a group of terms must be repeated:

language HelloLanguage3 {

  syntax Main = Prefix (", " Suffix)+;

  syntax Prefix = "Hello";

  syntax Suffix = "World";

}

which considers the following to all be valid input:

Hello, World

Hello, World, World

Hello, World, World, World

The use of the + operator indicates that the group of terms must match at least once.

1.2.2 Character Processing

In the previous examples of the HelloLanguage, the pattern term for the comma separator included a trailing space. That trailing space was significant, as it allowed the input text to include a space after the comma:

Hello, World

More importantly, the pattern indicates that the space is not only allowed, but is required. That is, the following input is not valid:

Hello,World

Moreover, exactly one space is required, making this input invalid as well:

Hello,   World

To allow any number of spaces to appear either before or after the comma, we could have written the rule like this:

syntax Main = "Hello"  " "*   ","  " "*  "World";

While this is correct, in practice most languages have many places where secondary text such as whitespace or comments can be interleaved with constructs that are primary in the language. To simplify specifying such languages, a language may specify one or more named interleave patterns.

An interleave pattern specifies text streams that are not considered part of the primary flow of text. When processing input, the M processor implicitly injects interleave patterns between the terms in all syntax patterns. For example, consider this language:

language HelloLanguage {

  syntax Main = "Hello"  ","  "World";

  interleave Secondary = " "+;

}

This language now accepts any number of whitespace characters before or after the comma. That is,

Hello,World

Hello, World

Hello   ,               World

are all valid with respect to this language.

Interleave patterns simplify defining languages that have secondary text like whitespace and comments. However, many languages have constructs in which such interleaving needs to be suppressed. To specify that a given rule is not subject to interleave processing, the rule is written as a token rule rather than a syntax rule.

Token rules identify the lowest level textual constructs in a language – by analogy token rules identify words and syntax rules identify sentences. Like syntax rules, token rules use patterns to identify sets of input values. Here’s a simple token rule:

token BinaryValueToken  = ("0" | "1")+;

It identifies sequences of 0 and 1 characters much like this similar syntax rule:

syntax BinaryValueSyntax = ("0" | "1")+;

The main distinction between the two rules is that interleave patterns do not apply to token rules. That means that if the following interleave rule was in effect:

interleave IgnorableText = " "+;

then the following input value:

0 1011 1011

would be valid with respect to the BinaryValueSyntax rule but not with respect to the BinaryValueToken rule, as interleave patterns do not apply to token rules.

M provides a shorthand notation for expressing alternatives that consist of a range of Unicode characters. For example, the following rule:

token AtoF = "A" | "B" | "C" | "D" | "E" | "F";

can be rewritten using the range operator as follows:

token AtoF = "A".."F";

Ranges and alternation can compose to specify multiple non-contiguous ranges:

token AtoGnoD = "A".."C" | "E".."G";

which is equivalent to this longhand form:

token AtoGnoD = "A" | "B" | "C" | "E" | "F" | "G";

Note that the range operator only works with text literals that are exactly one character in length.

The patterns in token rules have a few additional features that are not valid in syntax rules. Specifically, token patterns can be negated to match anything not included in the set, by using the difference operator (-). The following example combines difference with any. Any matches any single character. The expression below matches any character that is not a vowel:

any - ("A"|"E"|"I"|"O"|"U")

Token rules are named and may be referred to by other rules:

token AorBorCorEorForG = (AorBorC | EorForG)+;

token AorBorC = "A".."C";

token EorForG = "E".."G";

Because token rules are processed before syntax rules, token rules cannot refer to syntax rules:

syntax X = "Hello";

token HelloGoodbye = X | "Goodbye"; // illegal

However, syntax rules may refer to token rules:

token X = "Hello";

syntax HelloGoodbye = X | "Goodbye"; // legal

The M processor treats all literals in syntax patterns as anonymous token rules. That means that the previous example is equivalent to the following:

token X = "Hello";

token temp = "Goodbye";

syntax HelloGoodbye = X | temp;

Operationally, the difference between token rules and syntax rules is when they are processed. Token rules are processed first against the raw character stream to produce a sequence of named tokens. The M processor then processes the language’s syntax rules against the token stream to determine whether the input is valid and optionally to produce structured data as output. The next section describes how that output is formed.

1.2.3 Output

M processing transforms text into structured data. The shape and content of that data is determined by the syntax rules of the language being processed. Each syntax rule consists of a set of productions, each of which consists of a pattern and an optional projection. Patterns were discussed in the previous sections and describe a set of legal character sequences that are valid input. Projections describe how the information represented by that input should be produced.

Each production is like a function from text to structured data. The primary way to write projections is to use a simple construction syntax that produces graph-structured data suitable for programs and stores. For example, consider this rule:

syntax Rock =

    "Rock" => Item { Heavy { true }, Solid { true } } ;

This rule has one production that has a pattern that matches "Rock" and a projection that produces the following value (using a notation known as M graphs):

Item {

  Heavy { true },

  Solid { true }

}

Rules can contain more than one production in order to allow different input to produce very different output. Here’s an example of a rule that contains three productions with very different projections:

syntax Contents

    = "Rock" => Item { Heavy { true }, Solid { true } }

    | "Water" => Item { Consumable { true }, Solid { false } }

    | "Hamster" => Pet { Small { true }, Legs { 4 } } ;

When a rule with more than one production is processed, the input text is tested against all of the productions in the rule to determine whether the rule applies. If the input text matches the pattern from exactly one of the rule’s productions, then the corresponding projection is used to produce the result. In this example, when presented with the input text "Hamster", the rule would yield:

Pet {

  Small { true },

  Legs { 4 }

}

as a result.

To allow a syntax rule to match no matter what input it is presented with, a syntax rule may specify a production that uses the empty pattern, which will be selected if and only if none of the other productions in the rule match:

syntax Contents

    = "Rock" => Item { Heavy { true }, Solid { true } }

    | "Water" => Item { Consumable { true }, Solid { false } }

    | "Hamster" => Pet { Small { true }, Legs { 4 } }

    | empty => NoContent { } ;

When the production with the empty pattern is chosen, no input is consumed as part of the match.

To allow projections to use the input text that was used during pattern matching, pattern terms associate a variable name with individual pattern terms by prefixing the pattern with an identifier separated by a colon. These variable names are then made available to the projection. For example, consider this language:

language GradientLang {

  syntax Main

    = from:Color ", " to:Color => Gradient { Start { from }, End { to } } ;

  token Color

    = "Red" | "Green" | "Blue";

}

Given this input value:

Red, Blue

The M processor would produce this output:

Gradient {

  Start { "Red" },

  End { "Blue" }

}

Like all projection expressions we’ve looked at, literal values may appear in the output graph. The set of literal types supported by M and a couple examples follow:

  • Text literals – "ABC", "\u[Smile]"
  • Integer literals – 25, -34
  • Real literals – 0.0, -5.0E15
  • Logical literals – true, false
  • Null literal – null

The projections we’ve seen so far all attach a label to each graph node in the output (e.g., Gradient, Start, etc.). The label is optional and can be omitted:

syntax Naked = t1:First t2:Second => { t1, t2 };

The label can be an arbitrary string – to allow labels to be escaped, one uses the id operator:

syntax Fancy = t1:First t2:Second => id("Label with Spaces!"){ t1, t2 };

The id operator works with either literal strings or with variables that are bound to input text:

syntax Fancy = name:Name t1:First t2:Second => id(name){ t1, t2 };

Using id with variables allows the labeling of the output data to be driven dynamically from input text rather than statically defined in the language. This example works when the variable name is bound to a literal value. If the variable was bound to a structured node that was returned by another rule, that node’s label can be accessed using the labelof operator:

syntax Fancier = p:Point => id(labelof(p)) { 1, 2, 3 };

The labelof operator returns a string that can be used both in the id operator as well as a node value.

The projection expressions shown so far have no notion of order. That is, this projection expression:

A { X { 100 }, Y { 200 } }

is semantically equivalent to this:

A { Y { 200 }, X { 100 } }

and implementations of M are not required to preserve the order specified by the projection. To indicate that order is significant and must be preserved, brackets are used rather than braces. This means that this projection expression:

A [ X { 100 }, Y { 200 } ]

is not semantically equivalent to this:

A [ Y { 200 }, X { 100 } ]

The use of brackets is common when the sequential nature of information is important and positional access is desired in downstream processing.

Sometimes it is useful to splice the nodes of a value together into a single collection. The valuesof operator will return the values of a node (labeled or unlabeled) as top-level values that are then combinable with other values as values of new node.

syntax ListOfA

    = a:A => [a]

    | list:ListOfA "," a:A => [ valuesof(list), a ];

Here, valuesof(list) returns the all the values of the list node, combinable with a to form a new list. 

Productions that do not specify a projection get the default projection.

For example, consider this simple language that does not specify projections:

language GradientLanguage {

  syntax Main = Gradient | Color;

  syntax Gradient = from:Color " on " to:Color;

  token Color = "Red" | "Green" | "Blue";

}

When presented with the input "Blue on Green” the language processor returns the following output:

Main[ Gradient [ "Blue", " on ", "Green" ] ] ]

These default semantics allows grammars to be authored rapidly while still yielding understandable output. However, in practice explicit projection expressions provide language designers complete control over the shape and contents of the output.

1.3 Types

Expressions give us a great way to write down how to calculate values based on other values. Often, we want to write down how to categorize values for the purposes of validation or allocation. In M, we categorize values using types.

An M type describes a collection of acceptable or conformant values. We use types to constrain which values may appear in a particular context (e.g., an operand, a storage location).

With a few notable exceptions, M allows types to be used as collections. For example, we can use the in operator to test whether a value conforms to a given type. The following expressions are true:

1 in Number

"Hello, world" in Text

Note that the names of the built-in types are available directly in the M language. We can introduce new names for types using type declarations. For example, this type declaration introduces the type name as a synonym for the Text simple type:

type @[My Text] : Text;

With this type name now available, we can write the following:

"Hello, world" in @[My Text]

Note that the name of the type @[My Text] contains a space and is subject to the same escaping rules as the member names in entities.

While it is moderately useful to introduce your own names for an existing type, it’s far more useful to apply a predicate to the underlying type:

type SmallText : Text where value.Count < 7;

In this example, we’ve constrained the universe of possible Text values to those in which the value contains less than seven characters. That means that the following holds true:

"Terse" in SmallText

!("Verbose" in SmallText)

Type declarations compose:

type TinyText : SmallText where value.Count < 6;

The preceding is equivalent to the following:

type TinyText : Text where value.Count < 6;

It’s important to note that the name of the type exists so an M declaration or expression can refer to it. We can assign any number of names to the same type (e.g., Text where value.Count < 7) and a given value either conforms to all of them or to none of them. For example, consider this example:

type A : Number where value < 100;

type B : Number where value < 100;

Given these two type definitions, both of the following expressions will evaluate to true:

1 in A

1 in B

If we introduce the following third type:

type C : Number where value > 0;

we can also state this:

1 in C

In M types are sets of values and it is possible to define a new type by explicitly enumerating those values. 

type PrimaryColors { "Red", "Blue", "Yellow" }

This is how an enumeration is defined in M. Any type in M is a collection of values. For example the types Logical and Integer8 defined below could be defined as the collections:

{ true, false }

{-128, -127, ..., -1, 0, 1, ..., 127}

A general principle of M is that a given value may conform to any number of types. This is a departure from the way many object-based systems work, in which a value is bound to a specific type at initialization-time and is a member of the finite set of supertypes that were specified when the type was defined.

One last type related operation bears discussion –the type ascription operator ":". The type ascription operator asserts that a given value conforms to a specific type.

In general, when we see values in expressions, M has some notion of the expected type of that value based on the declared result type for the operator or function being applied. For example, the result of the logical and operator "&&" is declared to be conformant with type Logical.

It is occasionally useful (or even required) to apply additional constraints to a given value – typically to use that value in another context that has differing requirements.

For example, consider the following simple type definition:

type SuperPositive : Number where value > 5;

And let’s now assume that there’s a function named CalcIt that is declared to accept a value of type SuperPositive as an operand. We’d like M to allow expressions like this:

CalcIt(20)

CalcIt(42 + 99)

and prohibit expressions like this:

CalcIt(-1)

CalcIt(4)

In fact, M does exactly what we want for these four examples. This is because these expressions express their operands in terms of simple built-in operators over constants. All of the information needed to determine the validity of the expressions is readily and cheaply available the moment the M source text for the expression is encountered.

However, if the expression draws upon dynamic sources of data or user-defined functions, we must use the type ascription operator to assert that a value will conform to a given type.

To understand how the type ascription operator works with values, let’s assume that there is a second function, GetVowelCount, that is declared to accept an operand of type Text and return a value of type Number that indicates the number of vowels in the operand.

Since we can’t know based on the declaration of GetVowelCount whether its results will be greater than five or not, the following expression is not a legal M expression:

CalcIt( GetVowelCount(someTextVariable) )

Because GetVowelCount’s declared result type Number includes values that do not conform to the declared operand type of CalcIt which is SuperPositive, M assumes that this expression was written in error and will refuse to even attempt to evaluate the expression.

When we rewrite this expression to the following legal expression using the type ascription operator:

CalcIt( GetVowelCount(someTextVariable) : SuperPositive )

we are telling M that we have enough understanding of the GetVowelCount  function to know that we’ll always get a value that conforms to the type SuperPositive. In short, we’re telling M we know what we’re doing.

But what if we don’t? What if we misjudged how the GetVowelCount function works and a particular evaluation results in a negative number? Because the CalcIt function was declared to only accept values that conform to SuperPositive, the system will ensure that all values passed to it are greater than five. To ensure this constraint is never violated, the system may need to inject a dynamic constraint test that has a potential to fail when evaluated. This failure will not occur when the M source text is first processed (as was the case with CalcIt(-1)) – rather it will occur when the expression is actually evaluated.

Here’s the general principle at play.

M implementations will typically attempt to report any constraint violations before the first expression is evaluated. This is called static enforcement and implementations will manifest this much like a syntax error. However, as we’ve seen, some constraints can only be enforced against live data and therefore require dynamic enforcement.

In general, the M philosophy is to make it easy for the user to write down their intention and put the burden on the M implementation to “make it work.” However, to allow a particular M program to be used in diverse environments, a fully featured M implementation should be configurable to reject M program that rely on dynamic enforcement for correctness in order to reduce the performance and operational costs of dynamic constraint violations.

1.3.1 Collection types

M defines a type constructor for specifying collection types. The collection type constructor restricts the type and count of elements a collection may contain. All collection types are restrictions over the intrinsic type Collection, which all collection values conform to:

{ } in Collection

{ 1, false } in Collection

! ("Hello" in Collection)

The last example is interesting, in that it illustrates that the collection types do not overlap with the simple types. There is no value that conforms to both a collection type and a simple type.

A collection type constructor specifies both the type of element and the acceptable element count. The element count is typically specified using one of the three operators:

{T*}  - zero or more Ts

{T+} - one or more Ts

{T#m..n} – between m and n Ts.

The collection type constructors can either use operators or be written longhand as a constraint over the intrinsic type Collection:

type SomeNumbers : {Number+};

type TwoToFourNumbers : {Number#2..4};

type ThreeNumbers : {Number#3};

type FourOrMoreNumbers : {Number#4..};

These types describe the same sets of values as these longhand definitions:

type SomeNumbers : { Number *} where value.Count >= 1 ;

type TwoToFourNumbers : { Number *} where value.Count >= 2

                                 && value.Count <= 4;

type ThreeNumbers : { Number *} where value.Count == 3;

type FourOrMoreNumbers : { Number *} where value.Count >= 4;

Independent of which form is used to declare the types, we can now assert the following hold:

!({ } in TwoToFourNumbers)

!({ "One", "Two", "Three" } in TwoToFourNumbers)

{ 1, 2, 3 } in TwoToFourNumbers

{ 1, 2, 3 } in ThreeNumbers

{ 1, 2, 3, 4, 5 } in FourOrMoreNumbers

The collection type constructors compose with the where operator, allowing the following type check to succeed:

{ 1, 2 } in {(Number where value < 3)*} where value.Count % 2 == 0

Note that the where inside the parentheses applies to elements of the collection, and the where outside the parentheses operator applies to the collection itself.

1.3.2 List Types

Lists types are specified in a similar manner to collection types. 

type ListOfNumbers : [ Number *];

1.3.3 Nullable types

We have seen many useful values: 42, "Hello", {1,2,3}. The distinguished value null serves as a place holder for some other value that is not known. A type with null in the value space is called a nullable type. The value null can be added to the value space of a type with an explicit union of the type and a collection containing null or using the postfix operator ?. The following expressions are true:

! (null in Integer)

null in Integer?

null in (Integer | { null } )

The ?? operator converts between a null value and known value:

null ?? 1 == 1

Arithmetic operations on a null operand return null:

1 + null == null

null * 3 == null

Logical operators, conditional, and constraints require non nullable operands.

1.3.4 Entity types

Just as we can use the collection type constructors to specify what kinds of collections are valid in a given context, we can do the same for entities using entity types.

An entity type declares the expected members for a set of entity values. The members of an entity type are called fields. The value of a field is stored. All entity types are restrictions over the Entity type.

Here is the simplest entity type:

type MyEntity : Language.Entity;

The type MyEntity does not declare any fields. In M, entity types are open in that entity values that conform to the type may contain fields whose names are not declared in the type. That means that the following type test:

{ X => 100, Y => 200 } in MyEntity

will evaluate to true, as the MyEntity type says nothing about fields named X and Y.

Most entity types contain one or more field declarations. At a minimum, a field declaration states the name of the expected field:

type Point { X; Y; }

This type definition describes the set of entities that contain at least fields named X and Y irrespective of the values of those fields. That means that the following type tests will all evaluate to true:

{ X => 100, Y => 200 } in Point

{ X => 100, Y => 200, Z => 300 } in Point // more fields than expected OK

! ({ X => 100 } in Point)               // not enough fields – not OK

{ X => true, Y => "Hello, world" } in Point

The last example demonstrates that the Point type does not constrain the values of the X and Y fields – any value is allowed. We can write a new type that constrains the values of X and Y to numeric values:

type NumericPoint {

  X : Number;

  Y : Number where value > 0;

}

Note that we’re using type ascription syntax to assert that the value of the X and Y fields must conform to the type Number. With this in place, the following expressions all evaluate to true:

{ X => 100, Y => 200 } in NumericPoint

{ X => 100, Y => 200, Z => 300 } in NumericPoint

! ({ X => true, Y => "Hello, world" } in NumericPoint)

! ({ X => 0, Y => 0 } in NumericPoint)

As we saw in the discussion of simple types, the name of the type exists only so that M declarations and expressions can refer to it. That is why both of the following type tests succeed:

{ X => 100, Y => 200 } in NumericPoint

{ X => 100, Y => 200 } in Point

even though the definitions of NumericPoint and Point are independent.

1.3.5 Declaring fields

Fields are named units of storage that hold values. M allows you to initialize the value of a field as part of an entity initializer. However, M does not specify any mechanism for changing the value of a field once it is initialized. In M, we assume that any changes to field values happen outside the scope of M.

A field declaration can indicate that there is a default value for the field. Field declarations that have a default value do not require conformant entities to have a corresponding field specified (we sometimes call such field declarations optional fields). For example, consider this type definition:

type Point3d {

  X : Number;

  Y : Number;

  Z => -1 : Number; // default value of negative one

}

Because the Z field has a default value, the following type test will succeed:

{ X => 100, Y => 200 } in Point3d

Moreover, if we apply a type ascription operator to the value:

({ X => 100, Y => 200 } : Point3d)

we can now access the Z field like this:

({ X => 100, Y => 200 } : Point3d).Z

This expression will yield the value -1.

If a field declaration does not have a corresponding default value, conformant entities must specify a value for that field. Default values are typically written down using the explicit syntax shown for the Z field of Point3d. If the type of a field is either nullable or a zero-to-many collection, then there is an implicit default value for the declaring field of null for optional and {} for the collection.

For example, consider this type:

type PointND {

  X : Number;

  Y : Number;

  Z : Number?;        // Z is optional

  BeyondZ : {Number*};  // BeyondZ is optional too

}

Again, the following type test will succeed:

{ X => 100, Y => 200 } in PointND

and ascribing the PointND to the value will allow us to get these defaults:

({ X => 100, Y => 200 } : PointND).Z == null      

({ X => 100, Y => 200 } : PointND).BeyondZ == { }

The choice of using a nullable type vs. an explicit default value to model optional fields typically comes down to style.

1.3.6 Constraints on entity types

Like all types, a constraint may be applied to an entity type using the where operator. Consider the following type definition:

type HighPoint {

  X : Number;

  Y : Number;

} where X < Y;

In this example, all values that conform to the type HighPoint are guaranteed to have an X value that is less than the Y value. That means that the following expressions:

{ X => 100, Y => 200 } in HighPoint

! ({ X => 300, Y => 200 } in HighPoint)

both evaluate to true.

Now consider the following type definitions:

type Point {

  X : Number;

  Y : Number;

}

type Visual {

  Opacity : Number;

}

type VisualPoint {

  DotSize : Number;

} where value in Point && value in Visual;

The third type, VisualPoint, names the set of entity values that have at least the numeric fields X, Y, Opacity, and DotSize.

Because it is a common desire to factor member declarations into smaller pieces that can be easily composed, M provides explicit syntax support for this. We can rewrite the VisualPoint type definition using that syntax:

type VisualPoint : Point, Visual {

  DotSize : Number;

}

To be clear, this is just shorthand for the long-hand definition above that used a constraint expression. Both of these definitions are equivalent to this even longer-hand definition:

type VisualPoint {

  X : Number;

  Y : Number;

  Opacity : Number;

  DotSize : Number;

}

Again, the names of the types are just ways to refer to types – the values themselves have no record of the type names used to describe them.

1.4 Queries

Queries operate over unordered and ordered collections. M provides LINQ query comprehensions and adds several features to make authoring simple queries more concise. The keywords, where and select are available as binary infix operators. Also, indexers are automatically added to strongly typed collections. These features allow common queries to be authored more compactly as illustrated below.

1.4.1 Filtering

Filtering extracts elements from an existing collection. Consider the following collection:

People {

  { First => "Mary", Last => "Smith", Age => 24 },

  { First => "John", Last => "Doe", Age => 32 },

  { First => "Dave", Last => "Smith", Age => 32 },

}

This query extracts people with Age == 32 from the People collection:

from p in People

where p.Age == 32

select p

An equivalent query can be written with either of the following expressions:

People where value.Age == 32

People.Age(32)

The where operator takes a collection on the left and a Logical expression on the right. The where operator introduces a keyword identifier value into the scope of the Logical expression that is bound to each member of the collection. The resulting collection contains the members for which the expression is true. The expression:

Collection where Expression

is exactly equivalent to:

from value in Collection

where Expression

select value

Collection types gain indexer members that correspond to the fields of their corresponding element type. That is, this:

Collection . Field ( Expression )

is equivalent to:

from value in Collection

where Field == Expression

select value

1.4.2 Selection

Select is also available as an infix operator. Consider the following simple query:

    from p in People

    select p.First + p.Last

This computes the select expression over each member of the collection and returns the result. Using the infix select it can be written equivalently as:

People select value.First + value.Last

The select operator takes a collection on the left and an arbitrary expression on the right. As with where, select introduces the keyword identifier value that ranges over each element in the collection. The select operator maps the expression over each element in the collection and returns the result. The expression:

Collection select Expression

Is exactly equivalent to:

from value in Collection

select Expression

A trivial use of the select operator is to extract a single field:

People select value.First

Collections are augmented with accessors to fields which can be extracted directly. For example People.First yields a new collection containing all the first names and People.Last yields a collection with all the last names.

1.5 Modules

All of the examples shown so far have been “loose M” that is taken out of context. To write a legal M program, all source text must appear in the context of a module definition. A module defines a top-level namespace for any type names that are defined. A module also defines a scope for defining extents that will store actual values, as well as computed values.

Here is a simple module definition:

module Geometry {

  // declare a type

  type Point {

    X : Integer; Y : Integer;

  }

  // declare some extents

  Points : {Point*};

  Origin : Point;

  // declare a computed value

  TotalPointCount { Points.Count + 1; }

}

In this example, the module defines one type named Geometry.Point. This type describes what point values will look like, but doesn’t mention any locations where those values can be stored.

This example also includes two module-scoped extents (Points and Origin). Module-scoped field declarations are identical in syntax to those used in entity types. However, fields declared in an entity type simply name the potential for storage once an extent has been determined; in contrast fields declared at module-scope name actual storage that must be mapped by an implementation in order to load and interpret the module.

Modules may refer to declarations in other modules by using an import directive to name the module containing the referenced declarations. For a declaration to be referenced by other modules, the declaration must be explicitly exported using an export directive.

Consider this module:

module MyModule {

  import HerModule; // declares HerType

  export MyType1;

  export MyExtent1;

  type MyType1 : {Logical*};

  type MyType2 : HerType;

  MyExtent1 : {Number*};

  MyExtent2 : HerType;

}

Note that only MyType1 and MyExtent1 are visible to other modules. This makes the following definition of HerModule legal:

module HerModule {

  import MyModule; // declares MyType1 and MyExtent1

  export HerType;

  type HerType : Text where value.Count < 100;

  type Private : Number where !(value in MyExtent1);

  SomeStorage : MyType1;

}

As this example shows, modules may have circular dependencies.