March 2009

Volume 24 Number 03

MSBuild - Best Practices For Creating Reliable Builds, Part 2

By Sayed Ibrahim | March 2009

This article is the second part of my discussion of best practices that developers should follow when using MSBuild, the engine used by Visual Studio to build managed projects. In Part 1, I described some basic practices and techniques that apply to most every project. Here in Part 2, I'll describe techniques more specific to build configurations that require heavier customization largely because of their size. The topics I'll focus on are using incremental builds, creating custom tasks, managing build references, and building large source trees.

If you are not familiar with MSBuild, take a look at my previous MSDN articles: "Best Practices For Creating Reliable Builds, Part 1", "Inside MSBuild: Compile Apps Your Way with Custom Tasks for the Microsoft Build Engine", and "WiX Tricks: Automate Releases with MSBuild and Windows Installer XML."

Using Incremental Builds

As builds become more complex, the time required to build applications increases. Building a large product might take several hours. Because builds are time-consuming, you should ensure that only out-of-date targets are executed. This concept, known as incremental building, is supported in MSBuild through the Inputs and Outputs attributes of the Target element. When MSBuild encounters a target with inputs and outputs, it compares the time stamps of the output files with the time stamps of the input files. If the input files are more recent, the target executes; otherwise, it is skipped.

To demonstrate this mechanism, I'll use the example from the section "Batching Tasks" (in part 1 of this article) in which a set of files is copied from one location to another. The listing in Figure 1 shows the Copy02.proj file, a modified version of the example.

<Project xmlns="http://schemas.microsoft.com/developer/msbuild/2003" ToolsVersion="3.5">
  <PropertyGroup>
    <SourceFolder>src\</SourceFolder>
    <DestFolder>dest\</DestFolder>
  </PropertyGroup>
  <ItemGroup>
    <!-- Get all files under src\ except svn files -->
    <SrcFiles Include="$(SourceFolder)**\*" Exclude="**\.svn\**\*"/>
  </ItemGroup>
  <Target Name="CopyToDest" Inputs="@(SrcFiles)" 
          Outputs= "@(SrcFiles->'$(DestFolder)%(RecursiveDir)%(Filename)%(Extension)')">
    <Copy SourceFiles="@(SrcFiles)" 
          DestinationFiles="@(SrcFiles->'$(DestFolder)%(RecursiveDir)%(Filename)%(Extension)')" />
  </Target>
  <Target Name="Clean">
    <ItemGroup>
      <_FilesToDelete Include="$(Dest)**\*"/>
    </ItemGroup>
    <Delete Files="@(_FilesToDelete)"/>
  </Target>
</Project>

The main difference between this file and the Copy01.proj file is the use of Inputs and Outputs. Here, the CopyToDest target declares the files being copied as inputs and the destination files as outputs. If the copied files are up to date, the target is skipped. Take a look at how this works in Figure 2.

C:\Samples\Batching>msbuild Copy02.proj /fl /nologo 
Build started 10/26/2008 6:15:37 PM. 
Project "C:\Samples\Batching\Copy02.proj" on node 0 (default targets). 
Skipping target "CopyToDest" because all output files are up-to-date with respect to the input files. 
Done Building Project "C:\Samples\Batching\Copy02.proj" (default targets). 
Build succeeded. 
0 Warning(s) 0 Error(s)

In certain circumstances, some of the outputs are up to date but not all of them. When MSBuild detects this situation, a subset of the inputs is passed to the target to be executed.

Here's an example. The Copy03.proj file has the same contents as Copy02.proj but with an additional target, DeleteRandomOutputFiles, which deletes two files from the output directory. :

<Target Name="DeleteRandomOutputFiles">
  <!-- Arbitrarily delete two files from dest folder -->
  <ItemGroup>
    <_RandomFilesToDelete Include="$(DestFolder)class3.cs"/>
    <_RandomFilesToDelete Include="$(DestFolder)Admin\admin_class2.cs"/>
  </ItemGroup>
  <Delete Files="@(_RandomFilesToDelete)"/>
</Target>

Because these two files are deleted from the destination, the target's inputs are no longer up to date, so the target should not be skipped. If you execute the command msbuild Copy03.proj /t:DeleteRandomOutputFiles;CopyToDest after CopyToDest has been called once already, you'll see the results shown in Figure 3.

C:\Samples\Batching>msbuild Copy03.proj /t:DeleteRandomOutputFiles;CopyToDest 
Build started 10/26/2008 11:09:56 PM. 
Project "C:\Samples\Batching\Copy03.proj" on node 0 (DeleteRandomOutputFiles;CopyToDest target(s)). 
Deleting file "dest\class3.cs". 
Deleting file "dest\Admin\admin_class2.cs". 
CopyToDest: Building target "CopyToDest" partially, because some output files are out of date with respect to their input files. 
Copying file from "src\Admin\admin_class2.cs" to "dest\Admin\admin_class2.cs". 
Copying file from "src\class3.cs" to "dest\class3.cs". 
Done Building Project "C:\Samples\Batching\Copy03.proj" (DeleteRandomOutputFiles;CopyToDest target(s)). 
Build succeeded. 
0 Warning(s) 0 Error(s)

From the output, you can see that the CopyToDest target was partially built, meaning that a partial list of its inputs was provided to the target. The target only had to copy two files instead of many more. For complex builds, incremental building is crucial, and you should always create targets that support this mechanism.

 

Creating Custom Tasks

When you write a custom task, design it to be as reusable as possible. For example, a SendEmail task is more useful than a SendEmailToJoe task. Here are some guidelines to follow when you create custom tasks. I'll expand on each guideline throughout this section.

  • Extend the Task or the ToolTask class.
  • Log messages appropriately.
  • Tasks must communicate transparently.
  • Use the Required attribute for all required input parameters.
  • Use the Output attribute on all properties that project files might be interested in.
  • Use ITaskItem for file references instead of string.
  • Always transfer metadata for new output values that are related to inputs.

When you are writing tasks, you don't need to implement the ITask (Microsoft.Build.Framework) interface directly. The easier approach is to extend either the Task or ToolTask class, both of which are found in the Microsoft.Build.Utilities namespace. If you are creating a task that wraps an executable, you should use the ToolTask class. For other situations, use the Task class.

When extending one of these classes, you should use the Log property to assist in sending messages to attached loggers. You should never rely on techniques such as Console.WriteLine to send messages to the console. When you log messages, it is important to log them at the appropriate level so that loggers can do their jobs more effectively. For example, you should not log events as errors when they are really just warnings. Also, if your task fails, the contract of the ITask interface is to return False from the Execute method. In this case, you should also always log at least one error so that a user knows how to resolve a problem. If the Execute method returns True, no error should be logged. This is easily achieved by using a return statement of return !Log.HasLoggedErrors.

Another critical principle in creating tasks is that they be self-contained. The only information that a task needs to know regarding a build is the properties that are passed into it. The opposite is also true: the project file should only gather information from a task by its outputs. If you design tasks that have other means of communicating with either consuming project files or loggers, you are tightly coupling them, increasing their complexity, and reducing their maintainability. You should always take a "what you see is what you get" approach to constructing tasks in the sense that your code can only see the task's declared inputs and outputs in the project file.

Input properties that a task requires must be decorated with the Required attribute (found in the Microsoft.Build.Framework namespace). MSBuild enforces this requirement at run time, and logs an error stating which required property has been provided. If your task contains properties that project files might be interested in, you should place the Output attribute (Microsoft.Build.Framework) on those properties. Because output parameters must be explicitly decorated in code with the Output attribute, you should err on the side of exposing a few additional properties as this may prevent you from having to redeploy the task simply to output another property.

If a task is designed to take an action against a file or a set of files, these files should be passed in and out of the task by using the ITaskItem interface instead of just string paths. For example, consider the two implementations shown in Figure 4.

//This example shows how to declare file based properties 
public class Move2 : Task
{
    [Required]
    public ITaskItem SourceFile
    {
        get;
        set;
    }
    [Required]
    [Output]
    public ITaskItem DestinationFile
    {
        get;
        set;
    }
    public override bool Execute()
    {
        //TODO: Implementation here 
        throw new NotImplementedException();
    }
}

The Move1 class uses a string to pass a file into and out of the task. Move2 exposes instances of ITaskItem instead. When you pass a string, all you get is a simple string. By contrast, ITaskItem objects have metadata associated with them.

From a consumer's perspective, the implementations are the same; MSBuild handles any conversions needed. The target shown here shows how both of these tasks could be consumed:

<Target Name="Example">
  <ItemGroup>
    <FileToMove Include="class1.cs"/>
  </ItemGroup>
  <Move1 SourceFile="@(FileToMove)" DestinationFile="dest\class1.cs" />
  <Move2 SourceFile="@(FileToMove)" DestinationFile="dest\class1.cs" />
</Target>

So you might ask why does it matter if your task doesn't use metadata? If you start writing tasks using ITaskItem for file-based references, your tasks will be much more flexible. Also, in this simple example, these tasks accept only a single file at a time. It would be better to create a task that could move multiple files. In such a case, the properties should be defined as ITaskItem[] instead of ITaskItem. By doing this you reduce the overhead of having to create multiple instances of the task and you make work easier for users of the task.

Finally, always transfer metadata for new output values that are related to inputs. In cases where you accept some input items and then create new outputs that are directly related to the inputs, all metadata from the original source should be transferred to the newly created output items, even metadata that you don't recognize. For example, if you create a task that creates object files from C++ files, the input property for the source files should be declared as ITaskItem[] and so should the output for the object files. When you create the output items, each object file will then have the metadata of the corresponding input item assigned to it.

This is important because it allows metadata to be added to the item in each successive step in the pipeline. For example, you could add a piece of metadata named DoNotLink to one of the C++ files. The compiler doesn't recognize this metadata, but it passes it through to the appropriate object file item, and then the Link target will exclude that object file.

Managing Build References

The most important aspect of dealing with build references is to never reference assemblies that reside in the global assembly cache (GAC). The GAC was designed strictly for run-time use and never should have been included in the reference search path. A better approach is to use project references or assembly references with strong names.

When you are creating assembly references, you can use a HintPath to help MSBuild determine where to look for the assembly. For example, the following reference is defined properly:

<Reference Include="IronMath, Version=1.1.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35, processorArchitecture=MSIL">
  <HintPath>..\contrib\IronPython-1.1\IronMath.dll</HintPath>
</Reference>

If you find yourself setting a HintPath for many references, here is an additional trick that might assist you: extend the reference search path by adding your own directory name. The item that contains the list of paths used for reference resolution in C# and Visual Basic .NET projects is the AssemblySearchPaths item. If you add your path to this item after the Import statement for either Microsoft.CSharp.targets or Microsoft.VisualBasic.targets, that path will be the first location checked. You can implement this with a property declared as follows:

<PropertyGroup>
  <AssemblySearchPaths> ..\..\MyReferences\; $(AssemblySearchPaths) </AssemblySearchPaths>
</PropertyGroup>

You should always use relative paths for both the AssemblySearchPaths and HintPath values. If you specify these values using full paths, it will be difficult to build the projects on other machines.

For large projects you should avoid setting the CopyLocal flag to True for references. When files are marked to be copied locally, every project that references that project will get a copy of its copy local references. Consider this example:

  • ClassLibray1—Contains 10 CopyLocal references
  • ClassLibrary2—Contains 5 CopyLocal references and references ClassLibrary1
  • WindowsFormApp1—References ClassLibrary2

In this situation, all of ClassLibrary1 references will be copied to the output folders for ClassLibrary1, ClassLibrary2, and WindowsFormApp1. All of ClassLibrary2 references will be copied to the output folders for ClassLibrary2 and WindowsFormApp1. So you have 10*3 + 5*2 = 40 file copies for a full build, and this example consists of only three projects. Imagine how this behavior affects builds consisting of one hundred or more projects.

To avoid the copy local behavior, you can set the reference's Private metadata value to False. For example, you could modify a reference to prevent copy local behavior by doing the following:

<Reference Include="IronMath, Version=1.1.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35, processorArchitecture=MSIL">
  <HintPath>..\contrib\IronPython-1.1\IronMath.dll</HintPath>
  <Private>False</Private>
</Reference>

Because the additional metadata is set on the reference, the file will not be copied locally. An even better approach is to use the ItemDefinitionGroup element, which defines default metadata value for items. In the case of references, you can set the Private metadata to be False by default by including the following snippet somewhere in your project file.

<ItemDefinitionGroup>
  <Reference>
    <Private>False</Private>
  </Reference>
</ItemDefinitionGroup>

Because Visual Studio does set this metadata value by default, it will be used for references that have not been specifically marked to be copied locally.

If you have a large build and you want to make sure that no files are copied locally, you can override the _CopyFilesMarkedCopyLocal target, which is responsible for the copy local behavior. If you override this target after the Import statement for Microsoft.CSharp.targets or Microsoft.VisualBasic.targets, you are guaranteed that no files will be copied locally. It is not generally recommended to override any element that includes a leading underscore, but this is a special case. Local reference copying can cause a lot of wasted time and drive space during large builds, so overriding the element can be justified.

Building Large Source Trees

When you are dealing with a large number of projects (more than 100), you need to organize your projects and have a build process that is efficient yet flexible enough to meet the needs of each of the projects. In this section, I describe one means for organizing your source code as well as an approach for integrating a build process into that structure. The structure I describe won't suit every team or every product, but the important ideas to take away are how to modularize your build process and how to introduce common build elements into all products that are being built.

You can organize your source into trees of related projects, with the most common projects at the top. This organization assumes that projects need to build any of the projects beneath them in the tree and potentially sibling projects, but they should not directly build projects that exist in the nodes above them. For example, Figure 5 shows the dependency relationships of several fictitious projects.

Project Dependencies

Figure 5 Project Dependencies

Here we have two products, SCalculator and SNotepad, and four libraries that they depend on. We could organize these projects into a tree similar to Figure 6.

ROOT
 +---Common
 +---Common.IO
 ¦
 ¦
 +---Common.UI
 ¦
 ¦
 ¦
 +---Common.UI.Editors
 ¦
 +---Contrib
 ¦
 +---Products
 ¦
 ¦
 ¦
 +---SCalculator
 ¦
 ¦
 ¦
 +---SNotepad
 ¦
 +---Properties

Because all the projects depend on the Common project, it is directly under the ROOT node. The SCalculator and SNotepad projects are placed inside the Products folder.

What you need here is a strategy that allows developers working on specific subtrees to build the pieces they need but not necessarily the entire structure. You can achieve this by using a convention in which each folder contains three MSBuild files:

  • NodeName.setting
  • NodeName.traversal.targets
  • dirs.proj

NodeName is the name of the current node—for example, root, common, or common.ui.editors. The NodeName.setting file contains any settings (captured as properties or items) that are used during the build process. For example, some settings included here might be BuildInParallel, Configuration, or Platform. The NodeName.traversal.targets file contains the targets that are used to build the projects. Finally, the dirs.proj file maintains a list of projects (in the ProjectFiles item) that need to be built for that subtree.

The NodeName.setting and NodeName.traversal.targets files will always import the top-level corresponding files—root.setting and root.traversal.targets. These top-level files contain the global settings and targets, and the node-level files are where customizations can be injected. In many cases these node-level files need to import only the root file.

Figure 7 shows the contents of the root.traversal.targets file. Fundamentally, there are three targets in this file: Build, Rebuild, and Clean. The properties and other targets are there simply to support these three targets. This file uses the ProjectFiles item, which is declared in the dirs.proj file for that specific directory. The requirements for the dirs.proj file are to:

  1. Define all projects to be built using ProjectFiles.
  2. Import the NodeName.setting file toward the top.
  3. Import the NodeName.targets file toward the bottom.
<Project xmlns="http://schemas.microsoft.com/developer/msbuild/2003" ToolsVersion="3.5">
  <!-- Targets used to build all the projects -->
  <PropertyGroup>
    <BuildDependsOn> $(BuildDependsOn); CoreBuild </BuildDependsOn>
  </PropertyGroup>
  <Target Name="Build" DependsOnTargets="$(BuildDependsOn)" />
  <Target Name="CoreBuild">
    <!-- Properties BuildInParallel and SkipNonexistentProjects should be defined in the .setting file. -->
    <MSBuild Projects="@(ProjectFiles)" 
             BuildInParallel="$(BuildInParallel)" 
             SkipNonexistentProjects="$(SkipNonexistentProjects)" Targets="Build" />
  </Target>
  <PropertyGroup>
    <RebuildDependsOn> $(RebuildDependsOn); CoreRebuild </RebuildDependsOn>
  </PropertyGroup>
  <Target Name="Rebuild" DependsOnTargets="$(RebuildDependsOn)" />
  <Target Name="CoreRebuild">
    <MSBuild Projects="@(ProjectFiles)" 
             BuildInParallel="$(BuildInParallel)" 
             SkipNonexistentProjects="$(SkipNonexistentProjects)" Targets="Rebuild" />
  </Target>
  <PropertyGroup>
    <CleanDependsOn> $(CleanDependsOn); CoreClean </CleanDependsOn>
  </PropertyGroup>
  <Target Name="Clean" DependsOnTargets="$(CleanDependsOn)" />
  <Target Name="CoreClean">
    <MSBuild Projects="@(ProjectFiles)" 
             BuildInParallel="$(BuildInParallel)" 
             SkipNonexistentProjects="$(SkipNonexistentProjects)" Targets="Clean" />
  </Target>
</Project>

The dirs.proj file should include all projects in that directory as well as all projects in subdirectories. It can include normal MSBuild projects, like C# or Visual Basic .NET projects, or other dirs.proj projects (for subdirectores). This file should not include projects that exist in directories above it. The dirs.proj file should assume that required projects that are higher in the directory structure are already built.

If you build a project that has a project reference to a project that is higher in the directory structure and that project is out of date, it will be built automatically. As a result, the dirs.proj file doesn't have to specify to build higher-level projects. Also, for massive builds, it is better to use file references instead of project references. With this approach, if you switch to project references, you do not have to modify your build process, only your references.

Here are the contents of the root.setting file:

<Project xmlns="http://schemas.microsoft.com/developer/msbuild/2003" ToolsVersion="3.5">
  <!-- Global properties defined in this file -->
  <PropertyGroup>
    <BuildInParallel Condition="'$(BuildInParallel)'==''">true</BuildInParallel>
    <SkipNonexistentProjects 
      Condition="'$(SkipNonexistentProjects)'==''">false</SkipNonexistentProjects>
  </PropertyGroup>
</Project>

This file contains only two properties: BuildInParallel and SkipNonexistentProjects. It is important to note that these properties use conditions to ensure that any pre-existing values are not overwritten, which allows these properties to be customized easily. Figure 8 contains the contents of the dirs.proj file for the ROOT directory.

<Project xmlns="http://schemas.microsoft.com/developer/msbuild/2003" ToolsVersion="3.5">
  <!-- Insert any customizations for settings here -->
  <Import Project="root.setting"/>
  <!-- Define all ProjectFiles here -->
  <ItemGroup>
    <ProjectFiles Include="Common\dirs.proj"/>
  </ItemGroup>
  <Import Project="root.traversal.targets"/>
  <!-- Insert any customizations for targets here -->
</Project>

This dirs.proj file meets all three conditions listed earlier. If any customizations for values in the root.settng file need to be specified, they would be placed above the Import for that file, and any customizations for targets would be placed after the Import for that file. This dirs.proj file defines the ProjectFiles item to just include the Common\dirs.proj file, which is responsible for building its content. There are no other projects in the ROOT folder that need to be built. Figure 9 shows the Common\dirs.proj file.

<Project xmlns="http://schemas.microsoft.com/developer/msbuild/2003" ToolsVersion="3.5">
  <!-- Insert any customizations for settings here -->
  <PropertyGroup>
    <SkipNonexistentProjects>true</SkipNonexistentProjects>
  </PropertyGroup>
  <Import Project="common.setting"/>
  <!-- Define all ProjectFiles here -->
  <ItemGroup>
    <ProjectFiles Include="Common.csproj"/>
    <ProjectFiles Include="Common.IO\dirs.proj"/>
    <ProjectFiles Include="Common.UI\dirs.proj"/>
    <ProjectFiles Include="Products\dirs.proj"/>
  </ItemGroup>
  <Import Project="common.traversal.targets"/>
  <!-- Insert any customizations for targets here -->
  <PropertyGroup>
    < BuildDependsOn > CommonPrepareForBuild; $(BuildDepdnsOn); CommonBuildComplete;
    </BuildDependsOn>
  </PropertyGroup>
  <Target Name="CommonPrepareForBuild">
    <Message Text="CommonPrepareForBuild executed" Importance="high"/>
  </Target>
  <Target Name="CommonBuildComplete">
    <Message Text="CommonBuildComplete executed" Importance="high"/>
  </Target>
</Project>

This file overrides the SkipNonexistentProjects property, setting it to True. The ProjetFiles item is populated with four values, and a couple of targets are added to the build dependency list, three of which are dirs.proj files. If you build the Common\dirs.proj file with the command msbuild.exe dirs.proj /t:Build, you will see that all the projects are built and that the custom targets execute. I will not include the results because of space limitations, but the source for these files can be downloaded with the other samples for this article.

In this article we have taken a look at a few key recommendations that you can use to create better build processes for your products. As with all best practices, there will be situations in which these rules may not apply one hundred percent of the time and need to be bent a little. The best way to learn which of these practices works for you is simply to try each one out for yourself. I would be honored to hear your stories, good and bad, about how they have worked for you.

I would like to thank Dan Moseley, from the MSBuild team, and Brian Kretzler for their invaluable help on this article.

Sayed Ibrahim Hashimi has a computer engineering degree from the University of Florida. He has been working with MSBuild since the early preview bits of Visual Studio 2005 were released. He is the author of Inside the Microsoft Build Engine: Using MSBuild and Team Foundation Build (Microsoft Press, 2009). He is also coauthor of Deploying .NET Applications: Learning MSBuild and Click Once (Apress, 2006) and has written several publications for magazines including MSDN Magazine. He works in Jacksonville, Florida, as a consultant and trainer, with expertise in the financial, education, and collection industries. You can reach Sayed at his blog, sedodream.com.