Inside Microsoft.comRelease Management
Have you ever wondered how applications and content get developed and deployed to Microsoft.com? Are you curious about what development methodologies and processes are used and how Microsoft ensures that policies and standards are followed?
The Microsoft.com Release Management team is responsible for maintaining the policies and processes used to develop software for the site. Working within the development project teams, release managers hold these teams accountable for fulfilling their responsibilities and act as an interface between operations and the project teams. With a multi-project, multi-team view, they are often counted on to provide a strategic viewpoint and identify and manage cross-team dependencies, while also being responsible for the main deliverables in the deployment phase of the software development life cycle—creating the release plan and overseeing the deployment into the operational environment.
Each quarter, the Microsoft.com Release Management team deploys nearly 200 software releases into the various environments that make up Microsoft.com. Put another way, this means that each of our six release managers is responsible for managing the deployment of a project into one of the operational environments just about every other day. These releases are from a family of groups within Microsoft that do the bulk of the development work for sites such as microsoft.com, msdn.microsoft.com, profile.microsoft.com, technet.microsoft.com, and members.microsoft.com, which are each housed within the Microsoft.com operational environment. If you've ever visited Microsoft.com or any of the related Microsoft sites, registered a product, bought a product from the Microsoft Web site directly, looked up information on a Microsoft Official Curriculum training course, or even used the link in the Windows® Event Viewer that lets you get more details about an event message online, you've used something we've worked on.
Release Management Goals
A successful project release is one that releases at the right time, does not negatively impact customers and users, delivers the business value it was designed to address, and does not cause an inordinate impact on user support teams.
The goal of the Release Management team is to act as an extension of management, providing the necessary project oversight and guidance to ensure a successful release. To be able to do this, release managers must be involved in projects from the very beginning, not just at deployment. This lets us provide guidance (in the form of supportability and process requirements) during the envisioning and design phases of the projects, as well as participate in requirements and design reviews, to help make sure the project is set up to be successful from the very beginning and through all phases of its development. We also work with the development and operations teams when creating the release plan, which is used to guide and coordinate the deployment into the operational environment.
Setting Expectations Early
One of the keys to a successful project release is ensuring that expectations are set correctly, both within the project team and externally with stakeholders and management. We utilize a project release checklist to drive early agreements on who will fulfill the various required tasks and deliverables associated with the project, as well as which, if any, of the optional deliverables and tasks will be included in the project. The choice of checklist depends on which of the approved development methodologies the team agrees will be used
As this column is being written, there are several methodologies in use at Microsoft.com, including Waterfall, Agile, and SCRUM. Waterfall and Agile use the same checklist, an excerpt of which is shown in Figure 1. Agile is basically multiple Waterfall iterations in one project. (Some people, myself included, would argue that Waterfall is really iterative too, since almost no project except the very smallest make it from start to finish without at least one design change request occurring sometime in the build phase). SCRUM is a very different methodology, emphasizing face-to-face communication among team members and stakeholders, so we have developed a separate checklist for it. Both checklists are available from the Microsoft Download Center.
Figure 1** Waterfall Checklist Excerpt **
By setting expectations and obtaining agreements early in the project lifecycle, the team avoids last-minute surprises. It is particularly important to ensure that both representatives of the development and operations engineering teams participate in these discussions. Operations teams need to understand the project's goals and be able to perform capacity planning around expected customer traffic volume. This helps them determine whether the operational environment can support the release at the current time or whether, for example, additional capacity is needed, with an associated increase in budget. Waiting until the project's design phase—or even worse, until the developers have already started coding—could result in the team building a solution that cannot be deployed because it exceeds the ability of the current infrastructure to handle it.
Policies and Standards
The Release Management team works with various other teams at Microsoft to obtain agreement on, and put into effect, standards, policies and best practices that provide governance and direction for developing successful projects for the Microsoft.com environment. Many of these guidelines are very specific to our environment—for example, there is a policy and a standard for where the project files for a particular release are stored and what the folder structure for every release must be. Having policies and standards, and holding teams accountable for following them, has very clear benefits to the projects—and to the teams. There are many advantages, but I'd like to focus on two.
First, following policies and standards increases the predictability of the project, which means that team members know what to expect. Using a predictable location for project files, for example, means that the release manager knows in advance, even before the first line of code is written, where to get the bits for deployment to production. This enables him to begin writing the release plan early, rather than having to scramble to find the bits after the Test team signs off. It means that years later, if a new server is being built and someone needs to install a service onto it, he will know exactly where to go to find the files and documentation.
The second benefit is that having policies and standards reduces the number of decisions that must be made on things that are common to many projects to a small set—or sometimes even just one. This enables the project teams to focus their energies on what's really important—the features that will deliver on the original goals of the projects, rather than reinventing the wheel. Of course, if a policy or standard really doesn't meet a team's needs, there are processes for changing them or adding new ones. In general, though, most project teams simply agree to follow the existing policies and standards because of the clear benefit.
SDLC and Ops/RM Controls
The mandatory controls that are placed on the software development lifecycle (SDLC) by Microsoft corporate policy, along with the additional project criteria that the operations team requires of any project releasing into the operational environment are key policies that the Release Management team is responsible for holding teams accountable for. These controls are designed to ensure project success by requiring that various key deliverables and tasks take place, and then by documenting them in standard locations. In the project checklists, the corporate SDLC mandatory controls are the items color-coded in green, while the Ops/RM project criteria are color-coded in yellow.
For example, one SLDC mandatory control is a Responsible, Accountable, Consult, Inform (RACI) matrix that indicates the following for each deliverable and task:
- Who is Responsible (person or team that actually does the work)
- Who is Accountable (person or team responsible for making sure the work is done)
- Who to Consult (provides input, feedback, and support)
- Who to Inform (receives information about the project)
At the Microsoft corporate IT level, the RACI matrix is a complex template that is designed to account for any possible variation of project type and team. In Microsoft.com, the Release Management and Operations teams worked together to develop the project checklists, which contain the RACI information in a simplified matrix—because we also developed policies and standards around project release types, limiting them to only four (shown in Figure 2), which are denoted by the release version ID number. The sidebar "Version Numbers" explains the numbering process.
Figure 2 Release Terminology
|Major||A major release contains several significant features or an architectural change.|
|Minor||A minor release contains one significant feature or several small ones.|
|Service Pack||A service pack contains two or more bug fixes. It may not contain new features.|
|Hotfix||A hotfix is a single bug fix.|
|Service packs and hotfixes must meet certain impact criteria in order to be considered.|
The SDLC process, which is based on the publicly available Microsoft Solution Framework (MSF), divides each project into six distinct phases: envision, design, build, stabilize, deploy, and production.
At the end of each of the phases (except production), the release manager conducts a phase review in which the list of deliverables and tasks (exit criteria) are reviewed and two decisions are made by the team: Have all the exit criteria for this phase been met successfully? Should we continue to the next phase?
The first question determines whether the team has met its goals for the phase. If not, a discussion about the reasons takes place, and the team either decides to do more work in this phase or communicates to the stakeholders and sponsors that it is unable to continue (usually because more resources are needed).
Once the team has determined that the goals of the current phase have been met, a go/no-go decision is made for the next phase. Asking this question can be one of the more difficult tasks a release manager faces, because everyone on a project team (including the RM) usually wants to see the project go into production. However, sometimes the right thing to do is to stop.
There are many reasons a project might not continue on to its next phase. For example, a project might have been planned as a three-month effort to add a feature to an existing project with a particular team working on it. After going through the envisioning and design phases of the project, it might become apparent that in order to actually build the proposed feature, six months and twice the team might be needed. The team would then, at the design-phase review meeting, make a no-go decision and provide documentation to the project sponsor that additional time and resources are needed.
By following the project release checklist and conducting phase reviews, issues that might prevent a project from being successful can be found and mitigated as early as possible.
Releasing to the Environments
In addition to the project release checklist, through which we hold other team members accountable for deliverables, release managers are responsible and accountable for a successful release and for the key document that helps ensure success: the release plan.
Release plans vary in form according to the project type, but they all contain several significant points of information, as listed in Figure 3.
We have defined several types of operational environments based on their purposes. For the preproduction and staging environments, only operations and release management have read/write access, and only operations has access to the production environment. This ensures that control is maintained over the versions of software that are actually deployed. Figure A describes the various environments.
Releases are normally deployed to the preproduction and staging environments before being deployed to production. This gives an opportunity for smoke tests to verify that the release performs as expected in a production-like environment. It also provides operations with the chance to execute the release plan prior to propping the bits into production.
Many of our services are adopted by other teams. For example, one of the services provided by Microsoft.com is an internal Web service that enables content providers to publish Web page content to www.microsoft.com. Teams in several other development organizations have written clients that run against this Web service. So that they can test their clients without actually publishing to the live Web site, we offer a version of the service that runs in the pre-production environment, but otherwise mirrors the production setup. We typically release a new version of a service to its pre-production environment for a week prior to releasing to the staging environment so that adopters can make sure their clients work correctly with the new version.
Minimizing Negative Impact
One of our main goals when releasing any new version is to minimize the negative impact to customers and users. We strive for zero downtime—in other words, maintaining constant availability of the customer-facing system throughout the release period.
This is one of the most important reasons why release management must be involved throughout the entire project lifecycle. During the envision phase, we provide supportability requirements, including zero downtime, to make sure that the business goals and metrics include minimizing negative impact. In the design phase, we work with the core team to design the project in a way that fulfills those goals. One way we do that is to design projects that can be deployed in a side-by-side mode, in which the new version and current version of a system can exist in an environment at the same time. This enables the team to smoke-test the new version in place and have a final activation step that puts the new version into service. Side-by-side deployments simplify roll back plans as well, since the old version can be maintained in production for a period of time until the team agrees that keeping the rollback contingency available is no longer necessary.
Learning from Experience
The final task for release management on a project is to schedule and run the postmortem meeting. This is a nonconfrontational meeting at which the project team discusses both the things that went well on the project and the things that did not. The goal is not to place blame (in fact, the RM enforces rules at the meeting that prevent finger-pointing and blame), but rather to gather learning that can be used to improve future projects. Many times, something that went well (or even an approach to avoid or mitigate something that did not) gets documented as a best practice. Over time, best practices become standards and when that happens, everyone benefits.
At Microsoft.com, we use a standard scheme by which anyone can tell the release type simply by looking at the version number:
x.0.0.0—The first number signifies major version release of the product.
0.x.0.0—The second number signifies the minor version release of the product.
0.0.x.0—The third number signifies the service pack release of the product.
0.0.0.x—The fourth number signifies the hotfix release of the product.
Jim Scardelis has been a Release Manager in various Microsoft.com groups since the turn of the century. Prior to joining Microsoft in 1997, he was the author of several books under the pseudonym "Jim Blakely" and a former contributing editor for MCP Magazine (now Redmond Magazine).
© 2008 Microsoft Corporation and CMP Media, LLC. All rights reserved; reproduction in part or in whole without permission is prohibited.