Overview of compiler extensibility

Article
06/27/2011

One of the aspects of Boo that caught my attention was that the compiler was extensible. In this post, I hope to give a brief overview of what you can do to extend the compiler, but to really understand in depth, I recommend the book DSLs in Boo.

Compiler Pipeline

Boo’s compiler is built as a pipeline of steps that transform your code, ultimately resulting in the binary you are used to having as output. The first step is usually parsing which turns the source file into a syntax tree (AST). The AST then gets transformed by the remainder of the pipeline steps.

In most cases, you probably are interested in transforming the AST after the code is initially parsed. You can use the example program showcompilersteps.boo included with Boo to examine the compiler steps. When I run this on an example source file I get the following output:

STEP01-Parsing: SAVED TO BOO FILE.
STEP02-PreErrorChecking: NO CHANGE TO AST.
STEP03-MergePartialClasses: NO CHANGE TO AST.
STEP04-InitializeNameResolutionService: NO CHANGE TO AST.
STEP05-IntroduceGlobalNamespaces: NO CHANGE TO AST.
STEP06-TransformCallableDefinitions: NO CHANGE TO AST.
STEP07-BindTypeDefinitions: SAVED TO BOO FILE.
STEP08-BindGenericParameters: NO CHANGE TO AST.
STEP09-ResolveImports: SAVED TO BOO FILE.
STEP10-BindBaseTypes: NO CHANGE TO AST.
STEP11-MacroAndAttributeExpansion: SAVED TO BOO FILE.
STEP12-ExpandAstLiterals: NO CHANGE TO AST.
STEP13-IntroduceModuleClasses: SAVED TO BOO FILE.
STEP14-NormalizeStatementModifiers: NO CHANGE TO AST.
STEP15-NormalizeTypeAndMemberDefinitions: NO CHANGE TO AST.
STEP16-NormalizeExpressions: NO CHANGE TO AST.
STEP17-BindTypeDefinitions: NO CHANGE TO AST.
STEP18-BindGenericParameters: NO CHANGE TO AST.
STEP19-BindEnumMembers: NO CHANGE TO AST.
STEP20-BindBaseTypes: SAVED TO BOO FILE.
STEP21-CheckMemberTypes: NO CHANGE TO AST.
STEP22-BindMethods: SAVED TO BOO FILE.
STEP23-ResolveTypeReferences: SAVED TO BOO FILE.
STEP24-BindTypeMembers: SAVED TO BOO FILE.
STEP25-CheckGenericConstraints: NO CHANGE TO AST.
STEP26-ProcessInheritedAbstractMembers: NO CHANGE TO AST.
STEP27-CheckMemberNames: NO CHANGE TO AST.
STEP28-ProcessMethodBodiesWithDuckTyping: SAVED TO BOO FILE.

This gives a good idea of which compiler steps there are and the order they execute. It turns out, if none of the other extensibility methods work for you, you can create your own compiler step, insert it in the pipeline, and transform the AST how you like.

Extending with Meta Methods

One way to extend the compiler is to use Meta methods. Meta methods are ones that take a specific AST node type as an argument and return a node type to replace the node passed as an arg. Returning null has the effect of removing the node from the AST.

Meta methods need to be compiled in a separate assembly. Here is an example:

namespace metamethods

import System
import Boo.Lang.Compiler.Ast

[Meta]
static def doexpr(expr as Expression):
    return [|
        if $expr:
            print “True”
    |]

This Meta method declaration causes the method name to become a keyword to the compiler and replaces an expression with an “if” statement. Here is an example usage:

doexpr 2*100

The above code results in “True” being printed to the console. The code enclosed in [| … |] is converted to an AST representation and returned from the Meta method. This is a little like text templating but is called quasi-quotation.

Meta methods work fine if you need to do a simple transformation on specific AST nodes, but you don’t have access to the compiler context like you do with AST macros.

Extending with AST attributes

An AST attribute is very much like an attribute you are used to using in C#, except in this case, it can be processed at compile time. One possible use would be to validate arguments passed to a method. For networking APIs, it is often the case you need to pass in a host and port something like this:

def Connect( [ArgValidate("host")] host as string, [ArgValidate("port")] port as int):
pass

Wouldn’t it be nice to have these attributes cause the arguments to be validated automatically? All it requires is generating some code at the beginning of the method to do that. Here is a possible implementation of the attribute:

class ArgValidateAttribute(AbstractAstAttribute):
"""Attribute to validate argument types"""
    _name as string

    def constructor(name as StringLiteralExpression):
        self._name = name.Value

    def Apply(target as Node):
        # Find the method definition
        method = target.ParentNode as Method

        pd = target as ParameterDeclaration
        assert pd != null

assert pd.Name != null, "pd.Name = "
reference = ReferenceExpression(pd.Name)

        if self._name.Equals("host", StringComparison.OrdinalIgnoreCase):
            s = [|
                    block:
                        if String.IsNullOrEmpty($(reference)) or $(reference).Length > 260:
                            raise ArgumentException("Invalid host name", $(pd.Name))
                |].Body.Statements[0]

            method.Body.Statements.Insert(0,s)

        elif self._name.Equals("port", StringComparison.OrdinalIgnoreCase):
            s = [|
                    block:
                        if $(reference) < 1 or $(reference) > 65000:
                            raise ArgumentException("Invalid port", $(pd.Name))
                |].Body.Statements[0]

            method.Body.Statements.Insert(0,s)
        else:
            s = [|
                    block:
                        print "Not validated"
                |].Body.Statements[0]

            method.Body.Statements.Add(s)

You can see that this attribute knows about validating hosts and port parameters. It inserts blocks of code at the beginning of the method to do just this.

Extending with AST macros

AST macros can do what Meta methods can and more. They can have arguments and you can nest them within each other to scope them. One possible use of macros to define the asynchronous result model might start this way:

operation Send(host as string, port as int, buffer as (byte), offset as int, size as int):

    step Connect:
        m_socket = Socket(AddressFamily.InterNetwork, SocketType.Stream, ProtocolType.Tcp)
        m_socket.BeginConnect(m_host, m_port )

    step Send:
        m_socket.BeginSend(m_buffer, m_offset, m_size, SocketFlags.None)

step Disconnect:
m_socket.BeginDisconnect(true)

Here, the custom macros operation and step each have one argument, and a block of code associated with them. In the case of operation, the argument is a MethodCallExpression and in the case of step it is a ReferenceExpression.

These macros can be declared nested as shown below to keep step scoped within the operation macro:

macro operation:
    macro step:
        nameExpression = step.Arguments[0] as ReferenceExpression

Notice how macros can access their arguments. Macros are expanded from the most deeply nested to the least nested.

The keyword macro is itself a macro, which creates a class and a special method to which the code within the macro is added. The macro has access to compiler context through the Context property of the generated class.

Extending with custom compiler steps

The most flexible way to extend the compiler is to implement a compiler step and insert it into the compiler pipeline. There are compiler steps that can visit every node in the AST and do transformations when the right structure is found.

I’ll likely explore this later when I’ve found that macros don’t quite do what I need. For now, let’s look at macros in a little more detail.

Understanding macro expansion

Knowing that macros get expanded from the most deeply nested to the least nested is only part of the picture. What happens if you have macros in different files that work together? What order do they get evaluated? What happens if a macro expands to another macro? Does it re-expand the inserted macro or cause a compiler error?

To find out we need to design some tests. To test file order evaluation I developed these two dependent macros:

macro m1a:
    Context.Properties["m1a"] = System.DateTime.UtcNow

    if not Context.Properties.ContainsKey("m1b"):
        x = "m1a was expanded first."
    else:
        x = Context.Properties["m1b"].ToString()

    return [| print $x |]

macro m1b:
    Context.Properties["m1b"] = System.DateTime.UtcNow

    if not Context.Properties.ContainsKey("m1a"):
        x = "m1b was expanded first"
    else:
        x = Context.Properties["m1a"].ToString()

    return [| print $x |]

Each macro looks for the date information stored by the other macro when it is expanded, and if not found, indicates it is the first macro to be expanded. If it is not the first macro to be expanded, it prints the value of the other macro.

Since there is a circular dependency, you would expect that one of the macros would not find the context from the other macro. Two files were created to use the macros:

File1.boo

import System

public partial class One:
    def constructor():
        pass

    public def Run():
        m1a:
            pass

File2.boo

import System

public partial class Two:
    def constructor():
        pass

    public def Run():
        m1b:
            pass

And some code to run the code generated by the macros:

one = One()
one.Run()

two = Two()
two.Run()

When you run the code, you find that as expected, one macro was expanded first:

m1a was expanded first.

4/30/2011 2:41:04 PM

In looking through the APIs and object model there doesn’t seem to be a way to partially expand, delay the expansion conditionally once expansion has started, or specify the order of expansion in order to get around file ordering issues.

The second question is to explore what happens when a macro expands to another macro. To test this I wrote a recursive macro:

macro recurse:
    x = int.Parse(recurse.Arguments[0].ToString())
    if x > 0:
        return [|
            recurse $(x-1)
        |]
    else:
        return [|
            print "Done"
        |]

I used it as follows:

recurse 10

The output “Done” does gets printed as expected, so we can be confident that macros are recursively expanded. In fact, I initially forgot to expand to $(x-1) instead of $x and the compiler eventually figured out it was in an infinite loop, and issued an error.

Summary

Although we didn’t look at compiler steps closely yet, in this post we did get an idea of what compiler extensibility looks like in Boo. We also did some testing to understand how AST macros behave in a couple of edge cases. In the next post, we can start using compiler extensibility with our model for asynchronous results to see how they can work together to make writing asynchronous code easier.

Series

Start of series previous next

20110627_CompilerExtensibilityIntro.zip

Overview of compiler extensibility

Additional resources