February 2015

Volume 30 Number 2

The Working Programmer - Rise of Roslyn, Part 2: Writing Diagnostics

Ted Neward | February 2015

Ted NewardBy now, readers will have heard much of the buzz surrounding the strategies Microsoft seems to be pursuing for the next generation of Microsoft developer tools: more open source, more cross-platform, more openness and more transparency. “Roslyn”—the code name for the .NET Compiler Platform project—forms a major part of that story, being the first time that Microsoft has really committed production-quality compiler tool infrastructure to an open development model. With the announcement that Roslyn is now the compiler used by the Microsoft .NET Framework teams themselves to build .NET, Roslyn has achieved a certain degree of “inception”: The platform and its language tools are now being built by the platform and its language tools. And, as you’ll see in this article, you can use the language tools to build more language tools to help you build for the platform.

Confused? Don’t be—it’ll all make sense in just a bit.

‘But We Don’t Do That’

Since the first programmer started working with the second programmer—and found him “doing it wrong,” at least in the first programmer’s opinion—teams have struggled to create some semblance of unity and consistency in the way code is written, the degree of error checking done, the manner in which objects are used and so on. Historically, this has been the province of “coding standards,” essentially a set of rules that every programmer is supposed to follow when writing code for the company. Sometimes, programmers even go so far as to read them. But without any sort of coherent and consistent enforcement—usually through that time-honored practice of “code review” during which every­body bickers over where the curly braces should go and what the variables should be named—coding standards really end up having little impact overall on code quality.

Over time, as language tools got more mature, developers started looking to tools themselves to provide this level of enforcement. After all, if there’s one thing a computer is good at, it’s repeatedly performing the same kinds of detailed analysis, over and over again, without fail or hesitation or mistake. Remember, that’s part of the job of a compiler in the first place: Discover common human mistakes that can lead to error-prone code, and fail early so programmers are required to fix them before end users see them. Tools that analyze code, looking for error patterns, are called “static analysis tools” and can help identify bugs long before you even run the unit tests.

Historically in the .NET Framework, it’s been difficult to build and maintain such tools. Static analysis tools require a significant development effort and must be updated as languages and libraries evolve; for companies working in both C# and Visual Basic .NET, the effort doubles. Binary analysis tools, such as FxCop, work at the Intermediate Language (IL) level, avoiding language complexities. However, at the very least, there’s a structural loss of information in the translation from source to IL, making it that much more difficult to relate issues back to the level where the programmer is working—the source. Binary analysis tools also run after compilation, preventing IntelliSense-like feedback during the programming process.

Roslyn, however, was built from the beginning to be extended. Roslyn uses the term “analyzer” to describe source-code analysis extensions that can—and do—run in the background while devel­opers are programming. By creating an analyzer, you can ask Roslyn to enforce additional, higher-order kinds of “rules,” helping to eliminate bugs without having to run additional tools.

What Could Go Wrong?

It’s a sad, sad day to admit this, but periodically we see code like this:

  int x = 5; int y = 0;
  // Lots of code here
  int z = x / y;
catch (Exception ex)
  // TODO: come back and figure out what to do here

Often, that TODO is written with the best of intentions. But, as the old saying goes, the road to perdition is paved with good intentions. Naturally, the coding standard says this is bad, but it’s only a violation if somebody catches you. Sure, a text-file scan would reveal the “TODO,” but the code is littered with TODOs, none of which are hiding errors as ugly as this. And, of course, you’ll only find this line of code after a major demo bombs silently and you slowly, painfully backtrack the devastation until you find that this code, which should’ve failed loudly with an exception, instead simply swallowed it and allowed the program to carry on in blissful ignorance of its impending doom.

The coding standard likely has a case for this: Always throw the exception, or always log the exception to a standard diagnostic stream or both, or .... but, again, without enforcement, it’s just a paper document that nobody reads.

With Roslyn, you can build a diagnostic that catches this and even (when configured to do so) works with Visual Studio Team Foundation Server to prevent this code from ever being checked in until that empty catch block is fixed.

Roslyn Diagnostics

As of this writing, project Roslyn is a preview release, installed as part of Visual Studio 2015 Preview. Once the Visual Studio 2015 Preview SDK and Roslyn SDK templates are installed, diagnostics can be written using the provided Extensibility template, Diagnostic with Code Fix (NuGet + VSIX). To start, as shown in Figure 1, select the diagnostic template and name the project EmptyCatchDiagnostic.

Diagnostic with Code Fix (NuGet + VSIX) Project Template
Figure 1 Diagnostic with Code Fix (NuGet + VSIX) Project Template

The second step is to write a Syntax Node Analyzer that walks the Abstract Syntax Tree (AST), looking for empty catch blocks. A tiny AST fragment is shown in Figure 2. The good news is the Roslyn compiler walks the AST for you. You need only provide code to analyze the nodes of interest. (For those familiar with classic “Gang-of-Four” design patterns, this is the Visitor pattern at work.) Your analyzer must inherit from the abstract base class DiagnosticAnalyzer and implement these two methods:

public abstract
  ImmutableArray<DiagnosticDescriptor> SupportedDiagnostics { get; }
public abstract void Initialize(AnalysisContext context);

Roslyn Abstract Syntax Tree for the Code Fragment: if (score > 100) grade = “A++”;
Figure 2 Roslyn Abstract Syntax Tree for the Code Fragment: if (score > 100) grade = “A++”;

The SupportedDiagnostics method is a simple one, returning a description of each analyzer you’re offering up to Roslyn. The Initialize method is where you register your analyzer code with Roslyn. During initialization you provide Roslyn with two things:  the kind of nodes in which you’re interested; and the code to execute when one of these nodes is encountered during compilation. Because Visual Studio performs compilation in the background, these calls will occur while the user is editing, providing immediate feedback on possible errors.

Start by modifying the pre-generated template code into what you need for your empty catch diagnostic. This can be found in the source code file DiagnosticAnalyzer.cs within the EmptyCatch­Diagnostic project (the solution will contain additional projects you can safely ignore). In the code that follows, what you see in boldface are the changes in relation to the pre-generated code. First, some strings describing our diagnostic:

internal const string Title = "Catch Block is Empty";
internal const string MessageFormat =  
  "'{0}' is empty, app could be unknowingly missing exceptions";
internal const string Category = "Safety";

The generated SupportedDiagnostics method is correct; you only have to change the Initialize method to register your custom-written syntax analysis routine, AnalyzeSyntax:

public override void Initialize(AnalysisContext context)
    AnalyzeSyntax, SyntaxKind.CatchClause);

As part of the registration, note that you inform Roslyn you’re only interested in catch clauses within the AST. This cuts down on the number of nodes fed to you and also helps keep the analyzer clean, simple and single-purposed.

During compilation, when a catch clause node is encountered in the AST, your analysis method AnalyzeSyntax is called. This is where you look at the number of statements in the catch block, and if that number is zero, you display a diagnostic warning because the block is empty. As shown in Figure 3, when your analyzer finds an empty catch block, you create a new diagnostic warning, position it at the location of the catch keyword and report it.

Figure 3 Encountering a Catch Clause

// Called when Roslyn encounters a catch clause.
private static void AnalyzeSyntax(SyntaxNodeAnalysisContext context)
  // Type cast to what we know.
  var catchBlock = context.Node as CatchClauseSyntax;
  // If catch is present we must have a block, so check if block empty?
  if (catchBlock?.Block.Statements.Count == 0)
    // Block is empty, create and report diagnostic warning.
    var diagnostic = Diagnostic.Create(Rule,
      catchBlock.CatchKeyword.GetLocation(), "Catch block");

The third step is to build and run the diagnostic. What happens next is really interesting, and makes sense once you think about it. You just built a compiler-driven diagnostic—so how do you test it? By starting up Visual Studio, installing the diagnostic, opening a project with empty catch blocks and seeing what happens! This is depicted in Figure 4. The default project type is a VSIX installer, so when you “run” the project, Visual Studio starts up another instance of Visual Studio and runs the installer for it. Once that second instance is up, you can test it. Alas, automated testing of diagnostics is a bit beyond the scope of the project for now, but if the diagnostics are kept simple and single-focused, then it’s not too hard to test manually.

Visual Studio Running an Empty Catch Block Diagnostic in Another Instance of Visual Studio
Figure 4 Visual Studio Running an Empty Catch Block Diagnostic in Another Instance of Visual Studio

Don’t Just Stand There, Fix It!

Unfortunately, a tool that points out an error that it could easily fix—but doesn’t—is really just annoying. Sort of like that cousin of yours who watched you struggle to open the door for hours before deciding to mention that it’s locked, then watched you struggle to find another way in for even longer before mentioning that he has the key.

Roslyn doesn’t want to be that guy.

A code fix provides one or more suggestions to the developer—suggestions to hopefully fix the issue detected by the analyzer. In the case of an empty catch block, an easy code fix is to add a throw statement so that any exception caught is immediately rethrown. Figure 5 illustrates how the code fix appears to the developer in Visual Studio, as a familiar tooltip.

A Code Fix Suggesting a Throw Within the Empty Catch Block
Figure 5 A Code Fix Suggesting a Throw Within the Empty Catch Block

In this case focus your attention on the other pre-generated source file in the project, CodeFixProvider.cs. Your job is to inherit from the abstract base class CodeFixProvider and implement three methods. The key method is ComputeFixesAsync, which offers suggestions to the developer:

public sealed override async Task ComputeFixesAsync(CodeFixContext context)

When the analyzer reports an issue, this method is called by the Visual Studio IDE to see if there are any suggested code fixes. If so, the IDE displays a tooltip containing the suggestions, from which the developer may select. If one is selected, the given document—which denotes the AST for the source file—is updated with the suggested fix.

This implies that a code fix is nothing more than a suggested modification to the AST. By modifying the AST, the change is carried through to the remaining phases of the compiler, as if the developer had written that code. In this case, the suggestion is to add a throw statement. Figure 6 is an abstract depiction of what’s going on.

Updating the Abstract Syntax Tree
Figure 6 Updating the Abstract Syntax Tree

So your method builds a new subtree to replace the existing catch block subtree in the AST. You build this new subtree bottom up:  a new throw statement, then a list to contain the statement, then a block to scope the list and, finally, a catch to anchor the block:

public sealed override async Task ComputeFixesAsync(
  CodeFixContext context)
  // Create a new block with a list that contains a throw statement.
  var throwStmt = SyntaxFactory.ThrowStatement();
  var stmtList = new SyntaxList<StatementSyntax>().Add(throwStmt);
  var newBlock = SyntaxFactory.Block().WithStatements(stmtList);
  // Create a new, replacement catch block with our throw statement.
  var newCatchBlock = SyntaxFactory.CatchClause().WithBlock(newBlock).

The next step is to grab the root of the AST for this source file, find the catch block identified by the analyzer and build a new AST. For example, newRoot denotes a newly rooted AST for this source file:

var root = await context.Document.GetSyntaxRootAsync(
  var diagnostic = context.Diagnostics.First();
  var diagnosticSpan = diagnostic.Location.SourceSpan;
  var token = root.FindToken(diagnosticSpan.Start); // This is catch keyword.
  var catchBlock = token.Parent as CatchClauseSyntax; // This is catch block.
  var newRoot = root.ReplaceNode(catchBlock, newCatchBlock); // Create new AST.

The last step is to register a code action that will invoke your fix and update the AST:

var codeAction =
    CodeAction.Create("throw", context.Document.WithSyntaxRoot(newRoot));
  context.RegisterFix(codeAction, diagnostic);

For a variety of good reasons, most data structures in Roslyn are immutable, including the AST. This is a particularly good choice here, because you don’t want to update the AST unless the developer actually selects the code fix. Because the existing AST is immutable, the method returns a new AST, which is substituted for the current AST by the IDE if the code fix is selected.

You might be concerned that immutability comes at the high cost of memory consumption. If the AST is immutable, does that imply a complete copy is needed every time a change is made? Fortunately, only the differences are stored in the AST (on the grounds that it’s easier to store the deltas than to deal with the concurrency and consistency issues that making the AST entirely mutable would create) to minimize the amount of copying that occurs to ensure immutability.

Breaking New Ground

Roslyn breaks some new ground by opening up the compiler (and the IDE, as well!) this way. For years, C# has touted itself as a “strongly typed” language, suggesting that up-front compilation helps reduce errors. In fact, C# even took a few steps to try to avoid common mistakes from other languages (such as treating integer comparisons as Boolean values, leading to the infamous “if (x = 0)” bug that often hurt C++ developers). But compilers have always had to be extremely selective about what rules they could or would apply, because those decisions were industry-wide, and different organizations often had different opinions on what was “too strict” or “too loose.” Now, with Microsoft opening up the compiler’s innards to developers, you can begin to enforce “house rules” on code, without having to become compiler experts on your own.

Check out the Roslyn project page at roslyn.codeplex.com for details on how to get started with Roslyn. If you want to dive more deeply into parsing and lexing, numerous books are available, including the venerable “Dragon Book,” officially published as “Compilers: Principles, Techniques & Tools” (Addison Wesley, 2006) by Aho, Lam, Sethi and Ullman. For those interested in a more .NET-centric approach, consider “Compiling for the .NET Common Language Runtime (CLR)” (Prentice Hall, 2001) by John Gough, or Ronald Mak’s “Writing Compilers and Interpeters: A Software Engineering Approach” (Wiley, 2009).

Happy coding!

Ted Neward is the CTO at iTrellis, a consulting services company. He has written more than 100 articles and authored a dozen books, including “Professional F# 2.0” (Wrox, 2010). He’s an F# MVP and speaks at conferences around the world. He consults and mentors regularly—reach him at ted@tedneward.com or ted@itrellis.com if you’re interested.

Joe Hummel, Ph.D, is a research associate professor at the University of Illinois, Chicago, a content creator for Pluralsight, a Visual C++ MVP, and a private consultant. He earned a Ph.D. at UC Irvine in the field of high-performance computing and is interested in all things parallel. He resides in the Chicago area, and when he isn’t sailing can be reached at joe@joehummel.net.

Thanks to the following Microsoft technical expert for reviewing this article: Kevin Pilch-Bisson