Test-Driven Infrastructures


by Mario Cardinal

Summary: IT shops must fulfill two roles: to "build" and to "run" software. Each role requires a different set of skills. The gap between "build" and "run" is almost always clearly visible in the organization chart. At the architecture level, on one side, there are the application architects involved in software development (build), and, on the other side, the infrastructure architects involved in software operation (run). Being an application architect, I believe that both teams should learn from each other's best practices. One best practice that the infrastructure team should learn from the software-development team is to express architecture decisions using test scripts. (9 printed pages)


Architecture Decisions
Nonambiguous Documentation
Explicit Consensus
Protecting Against Change
Operational Artifacts
Designing for Testability
Designing for Automation
Data-Driven Testing

Architecture Decisions

The software architecture discipline is centered on the idea of reducing complexity through abstraction and separation of concerns. The architect is the person responsible for identifying the structure of significant components, which are usually thought of as hard to change, and for simplifying the relationships between those components. The architect reduces complexity by dividing the problem space into a set of components and interfaces that will be more and more difficult to change as the project evolves.

The only way to simplify software is to structure all the main component interactions through interfaces and accept the fact that these interfaces are now almost impossible to change. If you pick any one component of software, then you can make it easy to change. However, the challenge is that it is almost impossible to make everything easy to change without increasing the level of complexity, as Ralph Johnson said in a paper that Martin Fowler wrote for IEEE Software: "Making something easy to change makes the overall system a little more complex, and making everything easy to change makes the entire system very complex. Complexity is what makes software hard to change."

Establishing overall component interactions through interfaces is making irreversible design decisions. This set of design decisions about logical and physical structure of a system, if made incorrectly, may cause your project to be cancelled. The key to success is to protect the system against instability. Good architects know how to identify the areas of change in existing requirements and protect them against the architecture's irreversibility.

Documenting architecture is explicitly communicating without ambiguity the set of irreversible design decisions.

Nonambiguous Documentation

Documenting architecture decisions facilitates communication between stakeholders. They are the persons that have a legitimate interest in the solution. There are two classes of stakeholders with different needs in regard to architecture specifications. The first class of stakeholders is the deciders who must know about the architecture in order to understand the constraints and limitations of the solution. Examples of such stakeholders are managers, customers, and users. The second class of stakeholders is the implementers who must know about those same decisions in order to build and run the solution. Examples of such stakeholders are developers, system engineers, and system administrators.

A narrative specification written as a document is the perfect documentation for the first class of stakeholders. For them, even a nonformal specification published as a Microsoft PowerPoint is good enough. It provides a high-level view of the architecture with almost no ambiguity in regards to the constraints and limitations of the solution.

However, a narrative specification is not the appropriate documentation for the second class of stakeholders. It does not provide concise information about design decisions to implementers. And, if it does so in a very thick formal specification, this document will be of no interest for the deciders. That first class does not care at all about the inner structure of significant architecture components. However, implementers do.

In my experience as an application architect, I have discovered that the easiest way to communicate all the intricacy of an irreversible design decision is not to write a narrative document, but instead to write a test that explains how I would validate a good implementation. When you use only words, it is very difficult for the architect and the implementers to understand each other with confidence. There is always a mismatch in how each party understands the meaning of a specific word.

Here is a simple example to demonstrate my point. Let's say that I wrote, in the architecture specification, the following design decision:

We will restore data only from the previous day in case of hard disk crash or data corruption. Users will need to retype lost data for the current day.

Obviously, implementers need clarification about my intent in order to implement correctly my specification. Here is the same decision written for them:

We will backup the SQL Server database every night. In case of recovery, we will restore the latest backup.

This is better than the original one, but there are still a lot of grey zones. I could rewrite the description over and over and provide more and more details for the implementers. I will end up with a formal detailed specification. However, a narrative documentation does not provide an explicit consensus about what a successful implementation is. Even with the best intent, usually the implementers misunderstand subtleties of my formal specification. For example, let's say that the solution that I was envisioning was to back up on tape and I forgot to make this explicit. What happens if the implementers back up the Microsoft SQL Server database on disks? Unless I discovered it during design review, I won't know about it until both the SQL Server disk and the backup disk crash the same day.

Instead, if I wrote a test script, I would establish an explicit consensus about the compliance. A test script either succeeds or fails. Tests enable the implementers and the architects to explicitly define the compliance against the architecture specification. Implementers looking to learn about the architecture can look at the test scripts. Test scripts are operational artifacts. They are less susceptible to drifting from the implementation and thus becoming outdated.

Explicit Consensus

A test script is the combination of a test procedure and test data. Test scripts are written sets of steps that should be performed manually or automatically. They embody characteristics that are critical to the success of the architecture. These characteristics can indicate appropriate or inappropriate use of architecture as well as negative behaviors that are to be trapped.

As a concrete example, Figure 1 shows two simple test scripts that validate the following irreversible design decision: "We will restore data only from the previous day in case of hard disk crash or data corruption. Users will need to retype lost data for the current day." A test script either succeeds or fails. Because it validates the intent of the architect, it provides an explicit consensus about compliance.


Figure 1. Sample test scripts for validating design decisions

Protecting Against Change

Test scripts expressing the key architecture decisions are always related to things that change. Main areas of change can be divided in three categories:

  1. Execution—In this category, variations about operations and monitoring are the main concern for infrastructure architects. They crosscut all the other execution concerns such as presentation, processing, communications, and state management.
  • Presentation—Stakeholders interacting with the system. How do we implement the UI? Do we need many front ends? Are quality factors such as usability, composability, and simplicity important?

  • Processing—Instructions that change the state of the system. How do we implement processing? Do we need transactional support? How about concurrency? Are quality factors such as availability, scalability, performance, and reliability important?

  • Communication—State distribution between physical nodes of the system. How do we interact? What format does the data travel in? Over what medium is the communication done? Are quality factors such as security and manageability important?

  • State management—Location, lifetime, and shape of data. Do we need a transient or durable state? How do we save the state? Are quality factors such as availability, recoverability, integrity, and extensibility important?

    Figure 2 shows all the potential sources of run-time variations.

    Click here for larger image

    Figure 2. Potential sources of run-time variations (Click on the picture for a larger image)

    Architects protect the system against run-time instability by building abstractions. Tests should validate these abstractions. For example, in high-availability requirements, to protect against defects of a physical server, a proven solution is to put a failover in place. Failover is the capability of switching over automatically to a redundant or standby computer server upon the failure or abnormal termination of the previously active server.

  1. Deployment—These are the load-time concerns. In this category, infrastructure architects should pay attention to configuration and registry variations. For example, to protect against change in authentication mechanisms when using Windows as a platform, a proven solution is to store user credentials in Microsoft Windows Active Directory.
  2. Implementation—The most important design-time variations are related to instability in tooling and organizations (people). For example, infrastructure teams have learned how to protect against unstable development teams. Deployment of bad components in production can be disastrous for service level agreements. A proven solution is to establish a fallback mechanism to rapidly restore the system to a previously stable state. Not only should we document fallback processes, but we should also write tests to express compliance without ambiguity.

The process of discovering irreversible design decisions consists of reviewing all of these areas susceptible to change and building the appropriate tests.

Operational Artifacts

Today, test scripts can be manual, automated, or a combination of both. The advantage of automated testing over manual testing is that it is easily repeatable. It is therefore favored when doing regression testing. After modifying the software, either for a change in functionality or to fix defects, a regression test reruns previously passing tests. It ensures that the modifications haven't unintentionally caused a nonrespect of design decisions.

An automated test script is a short program written in a programming language. Infrastructure architects do not need to be programmers to write automated test. Scripting languages can do the job efficiently. When using Windows as a platform, Windows PowerShell, the new scripting language and command shell from Microsoft, is well-suited for such purposes, given its support for control structures, exception handling, and access to .NET system classes. Here are some of the testing scenarios for which Windows PowerShell could be used:

  • Deployment testing—Create a script that verifies your release went as expected; check running processes, services, database metadata, database content, application file versions, configuration file contents, and so on.
  • Infrastructure testing—Create a script that verifies the hardware, operating system, running services, running processes, and so forth.

For example, here is a Windows PowerShell small function to test connectivity to a server:

function test-connection
      $pingtest = ping $args[0] -n 1
      if ($pingtest -match 'TTL=')
            write-host $true
            write-host $false
test-connection MyServer01.domain.com

Contrary to a narrative document, a set of automated test scripts is an operational artifact. It is always kept up to date and it continuously validates compliance with the intent of the architect.

Designing for Testability

Testability means having reliable and convenient interfaces to drive the execution and verification of test scripts. You cannot achieve testability if you write tests after the design and implementation. Building tests during design is the only viable approach to achieve testability.

By abstracting the implementation, writing tests first greatly reduces the coupling between components. The irreversible design decisions can now be more thoroughly tested than ever before, resulting in a higher quality architecture that is also more maintainable. In this manner, the benefits themselves begin returning dividends back to the architect, creating a seemingly perpetual upward cycle in quality. The act of writing tests during the design phase provides a dual benefit:

  1. It validates compliance with the intent of the architect.
  2. It does so explicitly.

Testing is a specification process, not only a validating process.

Designing for Automation

Experience has shown that during software maintenance, reemergence of faults is quite common. Sometimes, a fix to a problem will be "fragile"; that is, it does not respect characteristics that are critical to the success of the architecture.

Therefore, it is considered good practice that a test be recorded and regularly rerun after subsequent changes to the program. Although this may be done through manual testing, it is often done using automated testing tools. Such tools seek to uncover regression bugs. Those bugs occur whenever software functionality that previously worked as desired stops working or no longer works in the same. Typically, regression bugs occur as an unintended consequence of program changes.

Regression testing is an integral part of the agile software-development methodology, such as eXtreme Programming. This methodology promotes testing automation to build better software, faster. It requires that an automated unit test, defining requirements of the source code, be written before each aspect of the code itself. Unit tests are used to exercise other source code by directly calling routines, passing appropriate parameters, and then, if you include Assert statements, testing the values that are produced against expected values. Unit-testing frameworks, such as xUnit, help automate testing at every stage in the development cycle.

The xUnit framework was introduced as a core concept of eXtreme Programming in 1998. It introduced an efficient mechanism to help developers add structured, efficient, automated unit testing into their normal development activities. Since then, this framework has evolved into the de facto standard for automated unit-testing frameworks.

The xUnit frameworks execute all the test scripts at specified intervals and report any regressions. Common strategies are to run such a system after every successful build (continuous integration), every night, or once a week. They simplify the process of unit testing and regression testing.

It is generally possible to perform testing without the support of xUnit frameworks by writing client code that tests the units and uses assertions, exceptions, or early exit mechanisms to signal failure. However, infrastructure architects should not need to write their own testing framework. Familiar xUnit-testing frameworks have been developed for a wide variety of languages, including scripting language commonly used by system administrator.

On the Windows platform, "PowerShell Scripts for Testing," a function library that implements the familiar xUnit-style asserts for Microsoft's new scripting language and command shell, can be downloaded for free from CodePlex, a Web site hosting open source projects (see Resources). Figure 3 shows a sample Windows PowerShell automated test.

Click here for larger image

Figure 3. Windows PowerShell automated test

Automated tests seek to discover, as early as possible, nonrespect of design decisions that are critical to the success of the architecture.

Automated tests are:

  • Structured.
  • Self-documenting.
  • Automatic and repeatable.
  • Based on known data.
  • Designed to test both positive and negative actions.
  • Ideal for testing implementation across different machines.
  • Operational documentation of configuration, implementation, and execution.

Data-Driven Testing

Test automation, especially at the higher test levels such as architecture testing, may seem costly. The economic argument has to be there to support the effort. The economic model can be weakened if you do not simplify the process of writing tests.

Data-driven testing is a way of lowering the cost of automating tests. It permits the tester to focus on providing examples in the form of combinations of test input and expected output. You design your test by creating a table containing all the test data (inputs and expected outputs) and then write a test fixture that translates a row in the table into a call to the system under test (SUT).

You achieve a high performance ratio by virtue of never having to write the script beyond writing the test fixture. The framework implicitly provides the script—and runs it.

In the latest version of "PowerShell Scripts for Testing," this xUnit framework makes it possible to use Microsoft Office Excel as the source of the data to support data-driven testing. There is a new function in DataLib.ps1 called start-test that takes a workbook name, a worksheet name, a range name, and a set of field names as input, and runs the tests that are described in that range. It does this by calling a function that has the same name as the range.

Using start-test is demonstrated in a sample performance test on the companion blog for PSExpect. Figure 4 shows the Office Excel table and the test script provided for the weather Web service.


Figure 4. Office Excel table and test script for sample weather Web service


Automated testing combined with regression testing not only documents the architecture decisions at the appropriate level for implementers, but also continuously validates compliance with the intent of the architect.

The benefits of writing automated tests are the following:

  • Enable freedom to change—As software systems get bigger, it becomes harder and harder to make changes without breaking things. The business risk to this situation is that you may find yourself in a situation where customers are asking for things or the market shifts in some way that causes the need for change. A large battery of automated test scripts frees the architects to do lots of cool things. Architects will be less afraid to change existing design decisions. With the automated testing as a "safety net," they can refactor the architecture, or add or change features, for example, without losing sleep. What you will find, by investing in automated testing, is that your organization actually moves faster than it did before. You can respond to market change quicker, you roll out features faster, and you have a stronger organization. Automated testing helps you:
    • Reduce costs. Automated testing finds problems effectively as early as possible, long before the software reaches a customer. The earlier problems are found, the cheaper it is to fix them because the "surface area" of change is smaller (that is, the number of changes since the last test will be limited).
    • Ensure reliability. Tests perform precisely the same operations each time they are run without ambiguity. It either succeeds or fails. It provides an explicit consensus about compliance and continuously validates the intent of the architect. It eliminates human error and negates the fatigue factor of manual testing as deadlines approach.
    • Engender confidence. Automation provides concrete feedback about system because you can test how the software reacts under repeated execution of the same operations.
    • Prevent risks. Automated testing demonstrates system correctness within a sensible deadline. Tests can be run over and over again with less overhead.
    • Improve maintainability. Writing tests first influences structure and creates modular systems. It greatly loosens the coupling between components, thus reducing the risk that a change in one component will force a change in another component.
    • Provide up-to-date documentation consistently. Tests are an operational artifact that cannot be left out of sync. If tests drift from reality, it is impossible to run them with success. Outdated tests will always fail.



About the author

Mario Cardinal is an independent senior consultant specialized in enterprise application architecture. He spends most of his time building well-designed enterprise .NET applications with agile processes. He possesses over 15 years of experience in designing large-scale information systems. For the second year in a row, he has received from Microsoft the Most Valuable Professional (MVP) award in the competency of Software Architect. MVP status is awarded to credible technology experts who are among the very best community members willing to share their experience to helping others realize their potential. He leads the architecture interest group at the Montreal Visual Studio User Group and the Montreal's chapter of the International Association of Software Architects (IASA). He is also the architecture track tech chair for DevTeach Conference. Furthermore, he hosts an audio Internet talk show about software development with Microsoft .NET (Visual Studio Talk Show). Mario holds Bachelor of Computer Engineering and Master of Technology Management degrees from the École Polytechnique in Montreal, Canada. He also holds the titles of Certified ScrumMaster (CSM) and Microsoft Certified Solution Developer (MCSD.Net). Contact Mario through his Web site at www.mariocardinal.com.

This article was published in the Architecture Journal, a print and online publication produced by Microsoft. For more articles from this publication, please visit the Architecture Journal Web site.

© Microsoft Corporation. All rights reserved.