How Microsoft delivers software with DevOps

Article
11/28/2022

Microsoft has decades of experience delivering highly scalable services to production environments. As Microsoft services and environments have expanded, their delivery practices have also evolved over time. Many Microsoft customers have also adopted and benefit from these efficient delivery practices. The following core DevOps principles and processes can apply to any modern software delivery effort.

To implement DevOps delivery processes, Microsoft adopted the following initiatives:

Focus organizational mindset and cadence on delivery.
Form autonomous, accountable teams who own, test, and deliver features.
Shift right to test and monitor systems in production.

Focus on delivery

Shipping faster is an obvious benefit that organizations and teams can easily measure and appreciate. The typical DevOps cadence involves short sprint cycles with regular deployments to production.

Fearing a lack of product stability with short sprints, some teams had compensated with stabilization periods at the end of their sprint cycles. Engineers wanted to ship as many features as possible during the sprint, so they incurred test debt that they had to pay down during stabilization. Teams that managed their debt during the sprint then had to support the teams that built up debt. The extra costs played out through the delivery pipelines and into production.

Removing the stabilization period quickly improved the way teams managed their debt. Instead of pushing off key maintenance work to the stabilization period, teams that built up debt had to spend the next sprint catching up to their debt targets. Teams quickly learned to manage their test debt during sprints. Features deliver when they're proven and worth the cost of deployment.

Fully automate pipelines

Much of the improvement teams can gain immediately is to fully automate the pipelines from code repository to production. Automation includes release pipelines with continuous integration (CI), automated testing, and continuous delivery (CD).

Teams might avoid deploying because it's hard, but the less frequently they deploy, the harder it is. The more time between deployments, the more issues pile up. If the code isn't fresh, there's deployment debt.

It's easier to work in smaller chunks by deploying frequently. This idea might seem obvious in hindsight, but at the time it can seem counterintuitive. Frequent deployments also motivate teams to prioritize creating more efficient and reliable deployment tools and pipelines.

Use in-house tools

Microsoft uses the release management system they build, and ships it to customers. A single investment improves both team productivity and Microsoft products. Using a secondary system would siphon off development and delivery velocity.

Team autonomy and accountability

No specific key progress indicators (KPIs) measure team productivity or performance, or whether a feature is on track. Teams need to be able to manage their own plans and backlogs, while finding a way to align with organizational goals.

It's important to communicate directly with teams to track progress. Tools should facilitate communication, but conversation is the most transparent way to communicate.

Prioritize features

An important goal is to focus on delivering features. Schedules can assess how much teams and individuals can reasonably complete over a given period of time, but some features will deliver earlier and some will come later. Teams can prioritize work so the most important features make it to production.

Use microservices

Microservices offer various technical benefits that improve and simplify delivery. Microservices also provide natural boundaries for team ownership. When a team has autonomy over investment in a microservice, they can prioritize how to implement features and manage debt. Teams can focus on plans for factors like versioning, independent of the overall services that depend on the microservice.

Work in main

Engineers used to work in separate branches. Merge debt on each branch grew until the engineer tried to integrate their branch into the main branch. The more teams and engineers there were, the bigger the integration became.

For integration to happen faster, more continuously, and in smaller chunks, engineers now work in the main branch. One big reason for moving to Git was the lightweight branching Git offers. The benefit to internal engineering was eliminating the deep branch hierarchy and its waste. All the time that used to be spent integrating is now poured into delivery.

Use feature flags

Some features aren't completely finished for a sprint deployment, but can still benefit from testing in production. Teams can merge and deploy this code with feature flags to turn on the feature for specific users, such as the development team or a small segment of early adopters. Feature flags control exposure without risking problems with the overall user base, and can help teams determine whether and how to complete the feature.

Testing in production

Shifting right to test in production helps ensure that pre-production tests are valid, and that ever-changing production environments are ready to handle deployments.

Instrument tests and metrics

Regardless of where an app deploys, it's important to instrument everything. Instrumentation not only helps identify and fix issues, but can provide invaluable research on usage and what to add next.

Test resiliency patterns

A risk for complex deployments is cascading failures, in which one component failure causes dependent components to fail, and so on until the entire system breaks down. It's important to understand where single points of failure (SPOFs) are and how they're mitigated, and to test the mitigation processes, especially in production.

Choose the right metrics

Designing metrics can be difficult. A common mistake is to include too many metrics, to avoid missing anything. But this can lead to ignoring or mistrusting the value of metrics that don't meet a specific need. Instead, Microsoft teams take time to determine the data they need to measure success. Teams might add or change metrics, but understanding the purpose from the beginning facilitates that process.

Besides the basis of a metric, teams consider what they need the metric to measure. For example, the velocity or acceleration of user gains might be a more useful metric than total number of users. Metrics vary from project to project, but the most helpful are those with the potential to drive business decisions.

Use metrics to guide work

Microsoft includes metrics with reviews at the highest leadership levels. Every six weeks, organizations present how they're doing on health, business, scenarios, and customer telemetry. Organizations discuss the metrics with executives and with their teams.

Teams throughout the organization examine engaged user metrics to determine the meaning for their features. Teams don't just ship features, but look to see whether and how people are using them. Teams use these metrics to adjust backlogs and determine whether features need more work to meet goals.

Delivery guidelines

It's never a straight line to get from A to B, nor is B the end.
There will always be setbacks and mistakes.
View setbacks as learning opportunities to change tactics for completing a given part of the process.
Over time, every team evolves its DevOps practices by building on experience and adjusting to meet changing needs.
The key is to focus on delivering value, both to end users and to the delivery process itself.