Creating continuous integration pipeline on Azure using Docker, Kubernetes, and Python Flask application
For an AI application, there are frequently two streams of work, Data Scientists building machine learning models and App developers building the application and exposing it to end users to consume. In this article, we demonstrate how to implement a Continuous Integration (CI)/Continuous Delivery (CD) pipeline for an AI application. AI application is a combination of application code embedded with a pretrained machine learning (ML) model. For this article, we are fetching a pretrained model from a private Azure blob storage account, it could be an AWS S3 account as well. We will use a simple python flask web application for the article.
This is one of several ways CI/CD can be performed. There are alternatives to the tooling and other pre-requisites mentioned below. As we develop additional content, we will publish those.
GitHub repository with document and code
The following are the pre-requisites for following the CI/CD pipeline described below:
- Azure DevOps Organization
- Azure CLI
- Azure Container Service (AKS) cluster running Kubernetes
- Azure Container Registry (ACR) account
- Install Kubectl to run commands against Kubernetes cluster. We will need this to fetch configuration from ACS cluster.
- Fork the repository to your GitHub account.
Description of the CI/CD pipeline
The pipeline kicks off for each new commit, run the test suite, if the test passes takes the latest build, packages it in a Docker container. The container is then deployed using Azure Container Service (ACS) and images are securely stored in Azure Container Registry (ACR). ACS is running Kubernetes for managing container cluster but you can choose Docker Swarm or Mesos.
The application securely pulls the latest model from an Azure Storage account and packages that as part of the application. The deployed application has the app code and ML model packaged as single container. This decouples the app developers and data scientists, to make sure that their production app is always running the latest code with latest ML model.
The pipeline architecture is given below.
Steps of the CI/CD pipeline
- Developer work on the IDE of their choice on the application code.
- They commit the code to source control of their choice (Azure DevOps has good support for various source controls)
- Separately, the data scientist work on developing their model.
- Once happy, they publish the model to a model repository, in this case we are using a blob storage account.
- A build is kicked off in Azure DevOps based on the commit in GitHub.
- Azure DevOps Build pipeline pulls the latest model from Blob container and creates a container.
- Azure DevOps pushes the image to private image repository in Azure Container Registry
- On a set schedule (nightly), release pipeline is kicked off.
- Latest image from ACR is pulled and deployed across Kubernetes cluster on ACS.
- Users request for the app goes through DNS server.
- DNS server passes the request to load balancer and sends the response back to user.
- Refer to the tutorial to follow the details and implement your own CI/CD pipeline for your application.