Chapter 1 - Introduction

Article
12/05/2007

Many aspects of porting software applications from UNIX to Windows have been identified and documented elsewhere, including operating system dependencies, methodologies, security, and team frameworks. All of these need to be addressed in the course of a software migration project. However, there is one aspect common to all application migrations that is rarely discussed: the software construction environment, sometimes called the build process or build system.

Because the software construction environment can be quite complex, the application migration project is incomplete if you have not planned how your software construction environment is going to be migrated. This chapter provides an introduction to the software construction methods and processes in UNIX and Windows, how they differ, and how the build environment can be migrated to Windows.

The Problem of Software Construction

Software construction consists of the transformation of source code into its deliverable format, which normally comprises one or more executable files. There may be any number of intermediate steps. For example, in C programming, it is typical to convert the C source file (hello.c) and its header file (hello.h) to an object file (hello.o), which is then linked with modules from the library (libc.a) and converted to an executable (hello on UNIX, hello.exe on Windows). There may be additional steps anywhere along this chain: perhaps the source file needs to be preprocessed before it is compiled; perhaps the object files need to be moved between directories in order to be linked with the correct libraries; or perhaps the finished executable must be renamed, compressed, stripped of symbol information, archived, or embedded in an installer package.

It should be obvious that changes to any of these primary source files can cause a change in the final product. The change may be as small as a corrected typo in a dialog box. In a sense, then, the final product is a representation of the source files, and it depends upon them.

In the sequence of transformations, the hello file came from (has a dependency on) the hello.o file and the libc.a file; the hello.o file came from (has a dependency on) the hello.c file and the hello.h file. A change to libc.a may affect the hello executable. A change to hello.h will affect hello.o and, therefore, will affect the hello executable.

A similar set of dependency relationships is shown in Figure 1.1. The application depends upon three files: libc.a, app.o, and libgd.a. The app.o file depends upon app.c and app.h. If app.c changes, then the file app.o must be regenerated and linked with libc.a and libgd.a to produce a new version of the application. However, if app.h changes, then almost all of the intermediate files must be regenerated to be sure that an application is produced that is synchronized with the source files.

Figure 1.1: A simple software dependency relationship

Figure 1.1: A simple software dependency relationship

It should be obvious, too, that there is an implicit assumption that software builds are, at any specific moment in time, reproducible — that a given set of source files and a known set of transformations will always produce the same product. Consequently, if a source file changes, the product also changes, and it must be generated again. The transformations must be applied to the source and intermediate files again to ensure that the current product is the one that should be produced by the current source files.

However, the diagram also reveals that the transformations applied by the compiler, the archiver, and the linker rely on the environment as well as command script files or configuration files. The environment is very important because the presence of an environment variable may override a value set elsewhere, or there may be important information contained in an environment variable. As a result, two different users with different environments could run the same command using the same files and get different results.

Most tools allow control information to be provided through additional mechanisms, but unless this control information and additional mechanisms are recorded somewhere, the build is not reproducible. Currently, there is no software technology that ensures that all of the information is recorded; there is only the commonly understood and applied injunction not to use environment information unless it is recorded or explicitly set in a file that is part of the build system. Unfortunately, this injunction alone is not an effective way to safeguard build environments.

If the source and the final product are out of synchronization, there are implications for your business. Your staff may spend time attempting to track down problems that have already been fixed, for example, or they may not be tracking down problems that have recently been introduced because the version of the product they are using does not represent that source code.

For any medium or large collection of source files, the number of possible relationships quickly multiplies. Keeping a record of which files changed, whether the change will have an effect on the final product, and which other parts of the product need to be regenerated is a daunting task. The task is automated by software build systems. They record such information as the list of source files, the final product or products, the intermediate forms, and the commands used to perform those transformations.

Almost all software build systems use the file time stamp (time of last change) to infer changes to files — if hello.o depends upon hello.c, and hello.c is newer than hello.o, the assumption is that hello.o is out of date and needs to be regenerated. Most solutions treat an absent file as being out of date; if the hello executable file does not exist, then it needs to be regenerated, even if hello.c is older than hello.o. (In this example, the build system should only go through the steps to transform hello.o into the hello file because the relationship between hello.o and hello.c is acceptable — hello.o is presumed to have been created from this version of hello.c.)

Both UNIX and Windows have developed methods for addressing the traditional problems of software construction. These methods are briefly described in the following sections. These are the native methods for software construction on UNIX and Windows, and any migration must take them into account.

The Traditional UNIX Build Process

The cornerstone of UNIX software construction is the utility called make. There is now an entire family of make-like applications, but they all descend from the make program Stu Feldman wrote in the 1970s (Feldman 1979).The make program is a file management utility that is used to maintain synchronization amongst a collection of files. The make utility does this by keeping track of dependency relationships between files based on their time stamps. It is typically used to update files that are derived from other files; for example, it can be used to update object files derived from corresponding source code files or to update executable files derived from object files.

The relationships between files (the dependencies) are specified in a description file. This description file normally has the name makefile or Makefile, and is generically referred to as a makefile.

What is a Makefile?

A makefile is a text file containing a list of rules that generally have three parts: a target, a list of dependencies, and a collection of one or more commands. The collection of commands is sometimes called the recipe. The purpose of make is to perform the commands required to ensure that the specified targets are kept up-to-date.

The following is a simple rule to build hello.exe from hello.c.

hello.exe: hello.c hello.h
cc -o hello.exe hello.c

The target, dependencies, and recipe for this rule are as follows:

Target. The target is the name of a file to be generated. In this example, it is the file
```
hello.exe
```

. A target is considered out-of-date (the opposite of up-to-date) if its modification time is less than those of any of the dependencies or if it does not exist. If it is determined to be out-of-date, then the commands in the recipe are executed.

Dependencies. The dependencies, sometimes referred to as prerequisites, are a list of file names. In this example, they are the file names
```
hello.c
```

and FakePre-41bbf4250bc1486cbd49294caa5181d5-1c3d42c8eb6f43139feebca28575897f. If the modification time on any of these files is newer than the target file, the target is considered to be out-of-date. make also treats the dependencies as targets and recursively ensures that these are up-to-date.

Recipe. The recipe is a list of commands that are executed when the target is out-of-date with respect to any of the dependencies. In this example, the recipe is the compile command
```
cc -o hello.exe hello.c
```

Different versions of the make utility may contain other features, depending on the version. A macro facility and mechanisms for specifying default recipes based on file names are common, if not universal. These mechanisms are widely used and are quite useful, but they can cause problems in migrating makefiles from one version of make to another.

How Make Works

The make utility processes a makefile in two stages. In the first stage, issuing the command make causes the make program to read and parse the makefile in terms of the first target in the file, which is the default target. If an alternate target is specified on the command line, then make uses this target instead of the default. After parsing the makefile, make assembles a dependency graph.

In the second stage, make checks the time stamp on the target and compares it to the time stamps on files identified as dependencies. If any of the dependent files are newer than the target, make invokes the shell to execute the recipe. It builds only one target at a time.

The makefile processing can be much more complicated than this simple example. Some recipes, for example, use make itself as the command to be run recursively. As another example of complication, there are inference rules that can be run that pair specific file type transformations (.c to .o, or .l to .c) to particular recipes.

For an in-depth understanding of how make works, refer to Managing Projects with make (Oram and Talbott 1991).

The Traditional Windows Build Construction Process

Windows developers use an Integrated Developer Environment (IDE) to write, construct, and debug their software. IDEs typically provide the following capabilities:

All the development tools needed for programming, including compilers, linkers, and project/configuration files that generate complete application.
File management for source files, header files, documentation, and other material to be included in the project.
"What you see is what you get" (WYSIWYG) editors that provide the means to create user interfaces that have built-in dialog box and resource editors.
Debugging of the application.
The capability to include any other program needed for development by adding it to a Tools menu.

Microsoft® Visual Studio® is an IDE that includes all of the functions and capabilities described in this list, including a complete set of development tools for building reusable applications in Microsoft Visual Basic®, Microsoft Visual C++®, Microsoft Visual J++® and Microsoft Visual FoxPro®.

At its essence, Visual Studio does what make does: it stores a list of files, dependencies, and recipes; it maintains a dependency graph; and when the command is given to build the software, it runs the correct recipes on files that are newer than their targets.

Beyond these essential functions, everything else is different. Where make is a command-line tool, Visual Studio is an integrated development environment. Where make stores information and explicitly describes the dependencies in a text file called a makefile, the Visual Studio IDE uses solution and project containers, and it stores all the information in files in a manner that is less explicit than the text file method employed by make.

Visual Studio .Net uses the concept of solution and project containers. A solution container is comprised of one or more projects, and each project is comprised of one or more items. Items can be files and other parts of your project, such as references, data connections, or folders. Included in the IDE is the Solution Explorer, which is an interface for viewing and managing these containers and their associated items.

Using these containers, you can take advantage of the integrated development environment in the following ways:

Manage settings for your solution as a whole or for individual projects.
Use Solution Explorer to handle the details of file management while you focus on items that make up your development effort.
Add items that are useful to multiple projects in the solution or to the solution without referencing the item in each project.
Work on miscellaneous files that are independent from solutions or projects.

This guide is not an exhaustive resource for information about the Visual Studio IDE. To use this IDE for migrating or rewriting your software construction process, you will need to further research the Visual Studio product by reading the information available on the Microsoft Web site and consulting the manuals associated with the product.

Strategies for Migrating Build Systems

When migrating from UNIX to Windows, there are always at least two possible migration strategies. These same strategies can be used when you migrate your application software or when you migrate your build system. For a build system, two possible strategies are:

Re-create it on Windows, using Windows tools such as the facilities built into Visual Studio.
Migrate as many of the existing makefiles as possible using a UNIX portability environment on Windows.

Each of these two choices has its advantages and disadvantages. These advantages and disadvantages are described in the following list:

Re-creation advantages. Re-creation is often possible for the build system because build systems contain fewer files than applications do. This choice is sometimes more convenient than migration because there is no need to attempt mapping between the two environments. A Visual Studio-based build system will also be easier to maintain as Windows tools continue to evolve.
Re-creation disadvantages. While re-creation offers advantages, you must still understand how the existing build process works in order to create it anew, and you must train your development staff in the new process. If the existing build system is complex, then the time spent in re-creating the build system in Visual Studio can have a deleterious effect on schedules or deadlines.
Migration advantages. Migration of the existing makefiles is often faster and enables you to use the existing knowledge of your developers and the information already stored in makefiles and scripts throughout your system.
Migration disadvantages. While migration has some obvious advantages, it requires an equivalent to the make utility on your Windows system. Because make calls other utilities, equivalents for these must also be available on Windows, and providing them entails installing an entire UNIX portability environment.

The following chapters tell you how to decide on your strategy and how to migrate your build system. You will find the process useful regardless of which strategy you adopt. You will find the technical information useful whether you plan to re-create your environment using the Visual Studio IDE or use a UNIX portability environment on Windows.