Custom building and code generators in Visual Studio 2005

I'm a fervent fan of using code generator tools wherever possible to make your life easier. Although they come with issues related to effective building, diagnostics, and debugging, the amount of value they add to your application is immense: they can eliminate entire classes of potential bugs, save you a great deal of effort and time, make your module much easier to extend and maintain, and even yield runtime performance gains. Among the most frequently used code generator tools are the lexer and parser generators GNU Flex and Bison, based on the classic generators lex and yacc. Although I'll get into the details of how to use these tools effectively at a later time, today what I want to show you is a practical example of how to use the new custom build features of Visual Studio 2005 to effectively incorporate a code generator into your automatic build process.

Here is a very simple Bison grammar for evaluating arithmetic expressions involving addition, subtraction, and multiplication:

 /* example.y */
%{
#define YYSTYPE int
%}
%token PLUS MINUS STAR LPAREN RPAREN NUMBER NEWLINE
%left PLUS MINUS
%left STAR

%%
line : /* empty */
     | line expr NEWLINE      { printf("%d\n", $2); }
expr : LPAREN expr RPAREN     { $$ = $2; }
     | expr PLUS expr         { $$ = $1 + $3; }
     | expr MINUS expr        { $$ = $1 - $3; }
     | expr STAR expr         { $$ = $1 * $3; }
     | NUMBER                 { $$ = $1; }
     ;

%%

int yyerror (char const *msg) {
    printf("Error: %s\n", msg);
}

int main() {
    printf("%d\n", yyparse());
    return 0;
}

The Flex lexer used by this parser looks like this:

 /* example.lex */
%{
#include "example.parser.h"
%}
%option noyywrap

%%

[ \t]+    { /* ignore whitespace */ }
"("       { return LPAREN; }
")"       { return RPAREN; }
"+"       { return PLUS; }
"-"       { return MINUS; }
"*"       { return STAR; }
\n        { return NEWLINE; }
[0-9]+    { yylval = atoi(yytext); return NUMBER; }
.         { printf("Invalid character '%s'", yytext); }

%%

If we were writing this parser at a UNIX command line, we might generate the source files and compile the result using this sequence of commands:

 bison -v -d example.y -o example.parser.c
flex -oexample.lexer.c example.lex
gcc -o example example.lexer.c example.parser.c

Now say you wanted to build the same application for Windows using Visual Studio 2005. The tools are available on Windows (see flex for Win32, bison for Win32), and you could simply run the same first two commands at the command-line and then build the resulting source files in Visual Studio. However, this simple approach sacrifices many of the advantages that Visual Studio provides for its built-in source file types: it doesn't rebuild the generated source files as needed, it doesn't allow you to jump to errors that occur during generation, and it doesn't allow you to configure build options using a nice GUI. Let's see how we can reclaim these advantages for Flex and Bison files.

Creating a simple custom build type

Our first goal is simply to be able to build Flex and Bison files. First, use the Flex and Bison setup binaries from the links above to install the tools. A bin directory will be created in the installation directory. Add this to your system path. You should be able to execute both flex and bison from a command prompt without specifying a path.

Next, we create a new C++ console application. Uncheck the option to use precompiled headers - I'll explain how to use these with Flex and Bison later. Remove the main source file created for you by the wizard. Next, right-click the project in Solution Explorer and choose Custom Build Rules. The following dialog appears:

A build rule establishes how to build a file of a particular type. A group of related build rules are stored in a build rule file, which can be saved, distributed, and reused in many projects. We'll start by creating a new build rule file for our Flex and Bison rules:

  1. Click New Rule File.
  2. Enter "GNU Tools" for Display Name and File Name.
  3. Choose a suitable directory for the build rule file. If it asks you if you want to add the directory to your search path, say yes.

Now we'll create a build rule for Bison files:

  1. Click Add Build Rule.
  2. Enter the following values:
    1. Name: Bison
    2. File Extensions: *.y
    3. Outputs: $(InputName).parser.c;$(InputName).parser.h
    4. Command Line: bison -d [inputs] -o $(InputName).parser.c
    5. Execution Description: Generating parser...
  3. Click OK twice, then check the box labelled "GNU Tools" and click OK.
  4. Add the example.y file above to your project. Right-click on the file and choose Compile. You should receive no errors.
  5. Create a new file folder under the project called "Generated Files". Add the existing file example.parser.c to this folder.

If you build now, you should receive only an error complaining that yylex() is undefined. Now, go back to Custom Build Tools and click Modify Rule File on GNU Tools. Create a rule for Flex:

  1. Click Add Build Rule.
  2. Enter the following values:
    1. Name: Flex
    2. File Extensions: *.lex
    3. Outputs: $(InputName).lexer.c
    4. Command Line: flex -o$(InputName).lexer.c [inputs]
    5. Execution Description: Generating lexer...
  3. Click OK three times.
  4. Add the example.lex file above to your project. Right-click on the file and choose Compile. You should receive no errors.
  5. Add the existing file example.lexer.c to your project.

If you build now, you should receive no errors and be able to run the application successfully. Now in any project you can simply check the "GNU Tools" box, add the .lex and .y files to your project, and build. What happens if you modify the example.y and build? It runs Bison again and recompiles example.parser.c, because it was regenerated, and example.lexer.c, because it includes a header file that was regenerated. If we modify the .lex file, Flex is rerun and example.lexer.c is recompiled, but example.parser.c is not rebuilt. If you had a larger parser, you'd appreciate how much time this incremental rebuilding saves you.

Improving diagnostic support

Delete one of the "%%" marks in the .y file and build. Unsurprisingly, Bison fails. However, the Error List tells you no more than this. It'd be more helpful if you could find out what errors the tool produced. If you look at the output window, Bison did produce some errors, but if you double click on them to visit the error location, it just takes you to the top of the file. What gives?

The reason for this is that Visual Studio only recognizes one error format, that used by its own tools. Here's an example:

 c:\myprojects\myproject\hello.cpp(10) : error C2065: 'i' : undeclared identifier

Bison doesn't output errors in this format, and so they aren't parsed. Flex uses yet another different format. What to do? The simplest way to deal with this is to invoke a simple script on the output of the tools as part of the build rule which parses the output and converts it to the desired format. You can write this script in any language; I wrote them in C# using the .NET Framework's regular expressions. Here's what I wrote inside the Main() function for the Bison converter tool (error checking and such omitted):

 string line;
while ((line = Console.In.ReadLine()) != null)
{
    Match match = Regex.Match(line, "([^:]+):([0-9]+)\\.[^:]*: (.*)");
    if (match != null)
    {
        Console.WriteLine("{0}({1}): error BISON: {2}",
                          Path.GetFullPath(match.Groups[1].Value),
                          match.Groups[2].Value, match.Groups[3].Value);
    }
    else
    {
        Console.WriteLine(line);
    }
}

I deploy the binary, say it's called BisonErrorFilter.exe, to the same directory as bison.exe. I then change the Command Line of the Bison build rule to the following (click the arrow in the right of the field to access a multiline text box):

 bison.exe -d [inputs] -o $(InputName).parser.c > bison.err 2>&1
BisonErrorFilter < bison.err

If you compile the .y file now, any errors should appear in the error list, as desired, and you can double-click them to visit their locations. I wrote a similar script for the lexer output. Be careful when doing this, though, because if you miss any errors, Visual Studio might look at the error return of the last command and interpret it as success. A better way to do this would be to wrap the tool in a script that passes its arguments to the tool, collects the tool's output and return code, converts and prints the output, and then returns its return code.

I haven't figured out how, but I believe it's possible to also create custom help entries for each error message, then have the filter tool produce the right error code for each one. This way, users can get help for each error individually by just clicking on it and pressing F1.

Properties

Properties enable you to control how the command-line tool is executed directly from the properties page for each individual file you wish to build with it. Let's start with a simple example: a handy lexer switch is -d, which prints out an informative message each time a token is recognized. We don't want it on all the time, and certainly not in release mode, but it'd be handy to be able to turn on and off as necessary.

To create a property for this, first return to the lexer build rule. Then follow these steps:

  1. Click Add Property.
  2. Choose Boolean for the User Property Type.
  3. Enter the following values:
    1. Name: debug
    2. Display Name: Print debug traces
    3. Switch: -d
    4. Description: Displays an informative message each time a token is recognized.
  4. Click OK. Then, add [debug] right after "flex" in the Command Line field.
  5. Click OK three times.
  6. Right-click on example.lex in Solution Explorer and choose Properties.
  7. In the left pane, click the plus next to Flex. Click General.
  8. You'll see your property. Click on it and its description will appear at the bottom. Set it to Yes.
  9. Click Command Line in the left pane. You'll see that the -d flag has been added.
  10. Click OK and build.
  11. Run the app and type an arithmetic expression. You'll see trace messages.
  12. View the project properties. You'll see that it now has a Flex node also. Here you can set the default settings for all files of that type in the project which don't have specific overriding settings set.

Adding more properties is just as simple. You can go through the man page for the tool and add properties for each switch, using the Category field to group them into categories. You can use the other property types for switches accepting arguments. If you want, you can create a detailed help file with additional explanation and examples for each switch. When you're done you have an impressive looking property sheet for your files reminiscent of those for built-in types:

You can also set different settings for debug and release builds. For example, for Flex, it's good to set table size to slowest and smallest for the Debug version, to speed up compilation, and to set it to the recommended full tables with equivalence classes for the Release version, which is a good tradeoff of table size and speed.

Finally, once you're done adding all the properties you like, you can take the resulting .rules files and give it to everyone on your team, or distribute it on a website, so that everyone can easily integrate the tool into Visual Studio. Perhaps eventually tools like Flex and Bison will ship with a .rules file.

Conclusion

In Visual Studio 2003 you would have had to write a plug-in to come close to achieving this level of integration with a third-party tool. Although it has limitations, I hope the problems solved by these new features help encourage you to incorporate more tools and code generation into your regular development. Now that you know how to use Flex and Bison from the Visual Studio IDE, next time I'll talk about how to use the tools themselves, going through some of the development and debugging processes that a grammar developer goes through, and show you some similar tools for other .NET languages. Thanks for reading, everyone.