How we use Release Management for our test automation – Part 1

In November 2015, we made Release Management service available for Public Preview in Visual Studio Team Services. You can see the blog here to get going quickly on an end-to-end using RM. You can also use the MSDN documentation here for a more detailed understanding of RM scenarios and concepts.

RM can be used for 2 scenarios – for deploying bits to production through a series of environments, as well as for running tests during the development cycle of the product. In this blog post, I am going to talk about the latter – specifically how we (the RM team at Microsoft) use the RM service for our test automation scenarios. We have been using RM in this way for 7 months now – kudos to my colleague Lova for driving this effort.

I have split up this blog into 2 parts. This is the first part of the blog, where I will describe our high-level experience while setting up the end-to-end for our test automation. In the next part, I will describe some of the design choices / challenges we faced while setting up our test automation, and how we addressed them.

Overview and high level description

VSTS consists of a number of scale units (or SUs) that provide services like version control, build, release management, work item tracking, etc. SU0 is the “dogfood” scale unit i.e. all the teams that contribute towards the VSTS services use the services in SU0 for their own day-to-day work. Typically, teams deploy new bits first to SU0, and after dogfooding the bits for some time, they promote the bits to the other scale units.

Along these lines, the RM team also uses the services in SU0 for its day-to-day engineering work. We do our development work (features/bug fixes) in a TF Git based feature branch called features/rmmaster, use the Build service to build our code, and the RM service to test our code.

At a high level, the goal of our engineering system is to take every checkin and validate that through a bunch of configurations in parallel so that we catch regressions as early in the cycle as possible. To this end, there are 3 phases that a dev’s code goes through:

(1) Pre-checkin phase, or Pull Request (PR) : In this phase, we mainly run unit tests, and a few end-to-end tests to ensure that basic functionality is never broken in the features/rmmaster branch.

    • Note that if your git branch enables push-based checkins, then this phase is not applicable.

(2) Continuous Integration phase (CI) : In this phase, our CI build is kicked off immediately after checkin. This build once again runs the unit tests (to ensure that parallel checkins have not broken them), and then publishes artifacts which are consumed by RM during test automation.

(3) Test automation phase: This is triggered by the build completion event, and entails a bunch of Release Definitions (RD) starting in parallel. Each RD tests a different scenario, and together these RDs provide exhaustive coverage of the product. This is the phase that I’ll be focusing on the most in this blog.

Setting up the “Continuous Automation Pipeline”

Our CI build definition name is VSO.RM.CI, and it publishes a single artifact called “drop” which contains all the binaries produced by the build.


We tie this build definition to a bunch of Release Definitions (RDs) using the Trigger property of the RD. In other words, each of the RDs highlighted below is triggered automatically when the VSO.RM.CI build completes.


(Note that each RD currently has only a single environment. We were forced to use this design of 9 different RDs because RM doesn’t support running environments in parallel. That feature is coming soon; after that we will fold these RDs into a single RD with 9 environments that start in parallel. This will provide us with better traceability for a build.)

At a high level, each RD downloads the binaries required for its test, sets up the required test environment by cleaning up old binaries and deploying new ones (both dependent services and test DLLs), runs the tests using the VsTest task (which also publishes the results for easy reporting and analysis later), and finally cleans up the environment again. I’ll be drilling down into the RD design later in this blog.

Pictorially, the code flow and test publishing looks like this:


Setting up the agent pool

Before we could run anything, however, we needed to set up the build/release agents for our CI build and RDs. Normally, one would run the tasks on a build/release agent pool to deploy the services (RM/SPS/TFS) remotely to another set of target servers, and then run tests against them. In our case, we decided to deploy the services on the agent machine itself so that we could have multiple instances of the test at the same time.

Different tests had different pre-requisites respectively (and this precluded usage of the Hosted agent pool). Accordingly, we created a single agent pool called RMAgentPool. We prepared a different machine for each RM.CDP.* RD, installed the build/release agent on each of these machines, and added these machines to the RMAgentPool.

(To do this, we downloaded the agent on each test machine from the “Download agent” link highlighted in the image below. After unzipping the agent zip file, we configured it by setting the “URL for Team Foundation Server” to our account e.g.

Each machine was given a new user capability called “RmCdpCapability”, and its value denoted its usage e.g. the machines prepared for the CI build had “RmCdpCapability=CI”.


Another example: The agent used to run RM.CDP.TfsOnPrem had “RmCdpCapability=TfsOnPrem”.


The RmCdpCapability was then used as a demand by the RDs so that the tests run on the correct agents.

Overview of the RM.CDP.* RDs

A quick note on terminology: The web client based RM is also called “RM TWA” – TWA stands for Team Web Access

  • RM TWA service in VSTS is called RM Online (or RMO)
  • RM TWA, when folded into TFS on-prem is called RM on-prem (this has not released as yet, but we are pushing hard to release this as early as possible)

Note that VSTS is a collection of several micro-services all of which are developed using a common Sprint model (all teams use the same three-week cycle), but are deployed somewhat independently by each of the teams. While RM is one such micro-service, it depends on other micro-services such as SPS and TFS.

A brief description of the tests we run for RM TWA:

1. Tests for RMO:

    • RM.CDP.RmEqTfs: Runs end-to-end (e2e) API based tests for RMO for scenarios where RM is at the same sprint release version as dependent services (SPS, TFS) e.g. RM is at Sprint 92 (S92), and SPS/TFS are also at S92
    • RM.CDP.RmGtTfs: Runs e2e tests for RMO for scenarios where RM is ahead of dependent services (SPS, TFS) in a scale unit e.g. RM is at S92, and SPS/TFS are at S91
    • RM.CDP.RmLtTfs: Runs e2e tests for RMO for scenarios where RM is behind dependent services (SPS, TFS) in a scale unit e.g. RM is at S91, and SPS/TFS are at S92.

The above test matrix enables us to deploy RMO to scale units independent of whether dependent services SPS/TFS have deployed or not (as long as we don’t have a dependency on a new feature in these services, in which case the RmGtTfs test suite would [hopefully] fail).

2. Tests for RM on-prem:

    • RM.CDP.TfsOnPrem: Runs both API based tests and UI based tests on RM on-prem.

3. Upgrade tests

    • RM.CDP.DevFabricUpgrade: Tests the upgrade scenario for RMO e.g. from S91 to S92.
    • RM.CDP.OnPremUpgrade: Tests the upgrade scenario for RM on-prem.

4) Test for x-plat RM i.e. RM agent running on Linux / iOS:

    • RM.CDP.XPlat

Design of the RM.CDP.* RDs

I’ll drill down into the RM.CDP.TfsOnPrem RD, since it establishes the canonical pattern that is used by the other RDs.

1) The RD is configured to run on the correct agent as per the screenshot below: RM.CDP.TfsOnPrem –> Edit –> Environment –> … –> Agent Options –> Options tab.

2) Further, it “Skips Artifact Download”:


The reason for skipping artifact download is that our CI build publishes a single, large artifact (called “drop”) which is several GB in size, whereas each test requires a different subset of the files.

(Currently, RM doesn’t support easily downloading a subset of artifacts published by the build. Once that supports gets in, we will do 2 things: (1) Start publishing smaller artifacts in our CI build i.e. instead of publishing a single, big artifact called “drop”, we will publish smaller artifacts named “TfsDrop”, “RmDrop”, “SpsDrop”, etc (2) We will then download the appropriate artifact(s) as required by each RM.CDP.* RD.)

3. Each of the RDs downloads some standard files (specifically vssbinfetch.exe, which knows how to download specific parts of a build drop) by running the downloadArtifacts.ps1 file (which is available on a \\ file share). Then it uses the “vssbinfetch” task (which is a custom task authored by us, and which invokes vssbinfetch.exe) to download the required binaries from the build drop for that specific RD’s test scenario e.g. RM.CDP.RmEqTfs will download the binaries for SPS, TFS and RMO services, whereas RM.CDP.TfsOnPrem will download the binaries for TFS on-prem respectively. These 2 tasks are highlighted below. Along the way, the RD cleans up the machine and removes old binaries.


4. Next, it deploys the required services and test dlls onto the machine e.g. “tfat” is an internal tool that installs TFS on-prem on the machine.


5. Finally, it sets up the test environment file and calls the Visual Studio Test task (or VsTest task, which is our favorite task Smile). This publishes the test results under the title “TfsOnPrem”.


6. Typically, each RD completes after the “Pause agent on test failure” task, and some optional clean up at the end. The Pause task is usually disabled, and I’ll talk more about this in my next blog post.

Analyzing test results

As devs make their checkins, it is fairly straightforward to track who has caused regressions e.g. in the screenshot below, some test began failing after build VSO.RM.CI_rmmaster_20151231.5. Double-clicking on the highlighted release takes us into the Release Summary page:


We then navigate to the Test Results section of the Release Summary page, and notice that 2 UI tests have begun failing after this checkin. Clicking on the test link highlighted below takes us to the Test hub:


Going to the “Test Results” sub-tab gives us a great starting point to drill deeper into the failures:


The logs for the tests are available in the “Run summary” sub-tab.

We can also drill down further into the commits that went into that release (to correlate why they might have caused the regression) by going to the Commits tab in the Release Summary. For example, the screenshot below indicates that some UI changes went into this checkin, which might explain the 2 failing UI tests.


Benefits of using RM for testing

Using RM for test automation gives us the following benefits:

(1) We are able to run all the test suites in parallel to get a quick turnaround time for our build validation.

    • Further, we can target the longest running RDs, and add more agents for those test suites to get more parallelism e.g. we have 2 CI machines i.e. “RmCdpCapability=CI”, and plan to add another machine for RM.CDP.RmEqTfs (since that takes the longest time to run).

(2) This setup lends itself to easy testing of different branches: While we have CI set up for our feature branch (features/rmmaster), we can as easily queue a build off a release branch e.g. releases/M92 in the screenshot below. When the build completes, it will trigger the same RM.CDP.* RDs against the bits from this release branch.

    • This branch-wise flexibility can be further extended to the following scenario: Devs can get exhaustive testing on their code before checking in (typically done by devs when they have a big set of changes going in) e.g. /users/rahudha/rifi branch highlighted below. The key here is that devs can achieve this without the overhead of setting up any test infra on their dev boxes, by just re-using the team’s resources.


(3) We can run the same tests in production that we run in our test automation: Since we deploy to prod using RMO (yes, we deploy RMO using RMO Smile), we can pretty much use the same set of test tasks to test our production deployment also.

(4) Since the same set of bits get run through all the test suites, and it becomes easy to answer the question: “Whats the quality of a build?” We usually want this question answered before pushing a build into production. The screenshot below shows the simple query we run for this:


(This experience will become more integrated into the Release Summary page over the next few months.)


We started off using RMO to test our development processes purely as a dog-fooding exercise. Over time, however, we found this to be significantly better than the earlier testing infra we used. The devs in the team love the fact that they can test their big changes without going through the headache of setting up the test environment locally.

Now you have some insight into how the RM team uses RM for its test automation. Hopefully this gives you some ideas that will help you set up your own test automation.

In the next part of this blog, I have talked a bit about some of the design choices / challenges we encountered while setting up our RDs, and how we addressed them.

Edit (3/23/2016): RMO has now added support for running environments in parallel with deployment conditions.  We have taken advantage of this to move out test automation to a single RD with multiple environments.  I have blogged about that here.