Canary deployment strategy for Kubernetes deployments

Azure Pipelines

Canary deployment strategy involves deploying new versions of an application next to stable production versions to see how the canary version compares against the baseline before promoting or rejecting the deployment. This step-by-step guide covers usage of Kubernetes manifest task's canary strategy support for setting up canary deployments for Kubernetes and the associated workflow in terms of instrumenting code and using the same for comparing baseline and canary before taking a manual judgment on promotion/rejection of the canary.

Prerequisites

  • A repository in Azure Container Registry or Docker Hub (Azure Container Registry, Google Container Registry, Docker Hub) with push privileges.
  • Any Kubernetes cluster (Azure Kubernetes Service, Google Kubernetes Engine, Amazon Elastic Kubernetes Service).

Sample code

Fork the following repository on GitHub -

https://github.com/MicrosoftDocs/azure-pipelines-canary-k8s

Here's a brief overview of the files in the repository that are used during the course of this guide -

  • ./app:
    • app.py - Simple Flask based web server instrumented using Prometheus instrumentation library for Python applications. A custom counter is set up for the number of 'good' and 'bad' responses given out based on the value of success_rate variable.
    • Dockerfile - Used for building the image with each change made to app.py. With each change made to app.py, build pipeline (CI) is triggered and the image gets built and pushed to the container registry.
  • ./manifests:
    • deployment.yml - Contains specification of the sampleapp Deployment workload corresponding to the image published earlier. This manifest file is used not just for the stable version of Deployment object, but for deriving the -baseline and -canary variants of the workloads as well.
    • service.yml - Creates sampleapp service for routing requests to the pods spun up by the Deployments (stable, baseline, and canary) mentioned above.
  • ./misc
    • service-monitor.yml - Used for setup of a ServiceMonitor object to set up Prometheus metric scraping.
    • fortio-deploy.yml - Used for setup of fortio deployment that is subsequently used as a load-testing tool to send a stream of requests to the sampleapp service deployed earlier. With sampleapp service's selector being applicable for all the three pods resulting from the Deployment objects that get created during the course of this how-to guide - sampleapp, sampleapp-baseline and sampleapp-canary, the stream of requests sent to sampleapp get routed to pods under all these three deployments.

Note

While Prometheus is used for code instrumentation and monitoring in this how-to guide, any equivalent solution like Azure Application Insights can be used as an alternative as well.

Install prometheus-operator

Use the following command from your development machine (with kubectl and Helm installed and context set to the cluster you want to deploy against) to install Prometheus on your cluster. Grafana, which is used later in this how-to guide for visualizing the baseline and canary metrics on dashboards, is installed as part of this Helm chart -

helm install --name sampleapp stable/prometheus-operator

Create service connections

  • Navigate to Project settings -> Pipelines -> Service connections.
  • Create a Docker registry service connection associated with your container registry. Name it azure-pipelines-canary-k8s.
  • Create a Kubernetes service connection for the Kubernetes cluster and namespace you want to deploy to. Name it azure-pipelines-canary-k8s.

Setup continuous integration

  1. Navigate to Pipelines -> New pipeline and select your repository.
  2. Upon reaching Configure tab, choose Starter pipeline
  3. In Review tab, replace the contents of the pipeline YAML with the following snippet -
    trigger:
    - master
    
    pool:
      vmImage: ubuntu-latest
    
    variables:
      imageName: azure-pipelines-canary-k8s
    
    steps:
    - task: Docker@2
      displayName: Build and push image
      inputs:
        containerRegistry: dockerRegistryServiceConnectionName #replace with name of your Docker registry service connection
        repository: $(imageName)
        command: buildAndPush
        Dockerfile: app/Dockerfile
        tags: |
          $(Build.BuildId)
    
    If the Docker registry service connection created by you was associated with foobar.azurecr.io, then the image is to foobar.azurecr.io/azure-pipelines-canary-k8s:$(Build.BuildId) based on the above configuration.

Edit manifest file

In manifests/deployment.yml, replace <foobar> with your container registry's URL. For example after replacement, the image field should look something like contosodemo.azurecr.io/azure-pipelines-canary-k8s.

Setup continuous deployment

Deploy canary stage

  1. Navigate to Pipelines -> Environments -> New environment

  2. Configure the new environment as follows -

    • Name: akscanary
    • Resource: choose Kubernetes
  3. Click on Next and now configure your Kubernetes resource as follows -

    • Provider: Azure Kubernetes Service
    • Azure subscription: Choose the subscription that holds your kubernetes cluster
    • Cluster: Choose your cluster
    • Namespace: Create a new namespace with the name canarydemo
  4. Click on Validate and Create

  5. Navigate to Pipelines -> Select the pipeline you just created -> Edit

  6. Change the step you created previously to now use a Stage. And add two additional steps to copy the manifests and mics directories as artifacts for use by consecutive stages. You might also want to move a couple of values to variables for easier usage later in your pipeline. Your complete YAML should now look like this:

    trigger:
    - master
    
    pool:
      vmImage: ubuntu-latest
    
    variables:
      imageName: azure-pipelines-canary-k8s
      dockerRegistryServiceConnection: dockerRegistryServiceConnectionName #replace with name of your Docker registry service connection
      imageRepository: 'azure-pipelines-canary-k8s'
      containerRegistry: containerRegistry #replace with the name of your container registry, Should be in the format foobar.azurecr.io
      tag: '$(Build.BuildId)'
    
    stages:
    - stage: Build
      displayName: Build stage
      jobs:  
      - job: Build
        displayName: Build
        pool:
          vmImage: ubuntu-latest
        steps:
        - task: Docker@2
          displayName: Build and push image
          inputs:
            containerRegistry: $(dockerRegistryServiceConnection)
            repository: $(imageName)
            command: buildAndPush
            Dockerfile: app/Dockerfile
            tags: |
              $(tag)
    
        - upload: manifests
          artifact: manifests
    
        - upload: misc
          artifact: misc
    
  7. Add an additional stage at the bottom of your YAML file to deploy the canary version.

    - stage: DeployCanary
      displayName: Deploy canary
      dependsOn: Build
      condition: succeeded()
    
      jobs:
      - deployment: Deploycanary
        displayName: Deploy canary
        pool:
          vmImage: ubuntu-latest
        environment: 'akscanary.canarydemo'
        strategy:
          runOnce:
            deploy:
              steps:
              - task: KubernetesManifest@0
                displayName: Create imagePullSecret
                inputs:
                  action: createSecret
                  secretName: azure-pipelines-canary-k8s
                  dockerRegistryEndpoint: azure-pipelines-canary-k8s
    
              - task: KubernetesManifest@0
                displayName: Deploy to Kubernetes cluster
                inputs:
                  action: 'deploy'
                  strategy: 'canary'
                  percentage: '25'
                  manifests: |
                    $(Pipeline.Workspace)/manifests/deployment.yml
                    $(Pipeline.Workspace)/manifests/service.yml
                  containers: '$(containerRegistry)/$(imageRepository):$(tag)'
                  imagePullSecrets: azure-pipelines-canary-k8s
    
              - task: KubernetesManifest@0
                displayName: Deploy Forbio and ServiceMonitor
                inputs:
                  action: 'deploy'
                  manifests: |
                    $(Pipeline.Workspace)/misc/*
    
  8. Save your pipeline by committing directly to the main branch. This commit should already run your pipeline successfully.

Manual intervention for promoting or rejecting canary

  1. Navigate to Pipelines -> Environments -> New environment

  2. Configure the new environment as follows -

    • Name: akspromote
    • Resource: choose Kubernetes
  3. Click on Next and now configure your Kubernetes resource as follows -

    • Provider: Azure Kubernetes Service
    • Azure subscription: Choose the subscription that holds your kubernetes cluster
    • Cluster: Choose your cluster
    • Namespace: Choose the namespace canarydemo namespace you created earlier
  4. Click on Validate and Create

  5. Select your new akspromote environment from the list of environments.

  6. Click on the button with the three dots in the top right -> Approvals and checks -> Approvals

  7. Configure your approval as follows -

    • Approvers: Add your own user account
    • Advanced: Make sure the Allow approvers to approve their own runs checkbox is checked.
  8. Click on Create

  9. Navigate to Pipelines -> Select the pipeline you just created -> Edit

  10. Add an additional stage PromoteRejectCanary at the end of your YAML file to promote the changes.

    - stage: PromoteRejectCanary
      displayName: Promote or Reject canary
      dependsOn: DeployCanary
      condition: succeeded()
    
      jobs:
      - deployment: PromoteCanary
        displayName: Promote Canary
        pool: 
          vmImage: ubuntu-latest
        environment: 'akspromote.canarydemo'
        strategy:
          runOnce:
            deploy:
              steps:            
              - task: KubernetesManifest@0
                displayName: promote canary
                inputs:
                  action: 'promote'
                  strategy: 'canary'
                  manifests: '$(Pipeline.Workspace)/manifests/*'
                  containers: '$(containerRegistry)/$(imageRepository):$(tag)'
                  imagePullSecrets: '$(imagePullSecret)'
    
  11. Add an additional stage RejectCanary at the end of your YAML file to roll back the changes.

    - stage: RejectCanary
      displayName: Reject canary
      dependsOn: PromoteRejectCanary
      condition: failed()
    
      jobs:
      - deployment: RejectCanary
        displayName: Reject Canary
        pool: 
          vmImage: ubuntu-latest
        environment: 'akscanary.canarydemo'
        strategy:
          runOnce:
            deploy:
              steps:            
              - task: KubernetesManifest@0
                displayName: reject canary
                inputs:
                  action: 'reject'
                  strategy: 'canary'
                  manifests: '$(Pipeline.Workspace)/manifests/*'
    
  12. Save your YAML pipeline by clicking on Save and commit it directly to the main branch.

Deploy a stable version

Currently for the first run of the pipeline, the stable version of the workloads and their baseline/canary version do not exist in the cluster. To deploy the stable version -

  1. In app/app.py, change success_rate = 5 to success_rate = 10.This change triggers the pipeline leading to build and push of the image to container registry. It will also trigger the DeployCanary stage.
  2. Given you have configured an approval on the akspromote environment, the release will wait before executing that stage.
  3. In the summary of the run click on Review and next click on Approve in the subsequent fly-out. This will result in the stable version of the workloads (sampleapp deployment in manifests/deployment.yml) being deployed to the namespace

Initiate canary workflow

Once the above release has been completed, the stable version of workload sampleapp now exists in the cluster. To understand how baseline and canaries are created for comparison purposes with every subsequent deployment, perform the following changes to the simulation application -

  1. In app/app.py, change success_rate = 10 to success_rate = 20

The above change triggers build pipeline resulting in build and push of image to the container registry, which in turn triggers the release pipeline and the commencement of Deploy canary stage.

Simulate requests

On your development machine, run the following commands and keep it running to send a constant stream of requests at the sampleapp service. sampleapp service routes the requests to the pods spun by stable sampleapp deployment and the pods spun up by sampleapp-baseline and sampleapp-canary deployments as the selector specified for sampleapp is applicable for all these pods.

FORTIO_POD=$(kubectl get pod | grep fortio | awk '{ print $1 }')
kubectl exec -it $FORTIO_POD -c fortio /usr/bin/fortio -- load -allow-initial-errors -t 0 http://sampleapp:8080/

Setup Grafana dashboard

  1. Run the following port forwarding command on your local development machine to be able to access Grafana -

    kubectl port-forward svc/sampleapp-grafana 3000:80
    
  2. In a browser, open the following URL -

    http://localhost:3000/login
    
  3. When prompted for login credentials, unless the adminPassword value was overridden during prometheus-operator Helm chart installation, use the following values -

    • username: admin
    • password: prom-operator
  4. In the left navigation menu, choose + -> Dashboard -> Graph

  5. Click anywhere on the newly added panel and type e to edit the panel.

  6. In the Metrics tab, enter the following query -

    rate(requests_total{pod=~"sampleapp-.*", custom_status="good"}[1m])
    
  7. In the General tab, change the name of this panel to All sampleapp pods

  8. In the overview bar at the top of the page, change the duration range to Last 5 minutes or Last 15 minutes.

  9. Click on the save icon in the overview bar to save this panel.

  10. While the above panel visualizes success rate metrics from all the variants - stable (from sampleapp deployment), baseline (from sampleapp-baseline deployment) and canary (from sampleapp-canary deployment), you can visualize just the baseline and canary metrics by adding another panel with the following configuration -

    • General tab -> Title: sampleapp baseline and canary
    • Metrics tab -> query to be used:
    rate(requests_total{pod=~"sampleapp-baseline-.*|sampleapp-canary-.*", custom_status="good"}[1m])
    

    Note

    Note that the panel for baseline and canary metrics will only have metrics available for comparison when the Deploy canary stage has successfully completed and the Promote/reject canary stage is waiting on manual intervention.

    Tip

    Setup annotations for Grafana dashboards to visually depict stage completion events for Deploy canary and Promote/reject canary so that you know when to start comparing baseline with canary and when the promotion/rejection of canary has completed respectively.

Compare baseline and canary

  1. At this point, with Deploy canary stage having successfully completed (based on the change of success_rate from '10' to '20') and with the Promote/reject canary stage is waiting on manual intervention, one can compare the success rate (as determined by custom_status=good) of baseline and canary variants in the Grafana dashboard. It should look similar to the below image -

    Compare baseline and canary metrics

  2. Based on the observation that the success rate is higher for canary, promote the canary by clicking on Resume in the manual intervention task