Deploy a TensorFlow model served with TF Serving using a custom container in a managed online endpoint (preview)
Learn how to deploy a custom container as a managed online endpoint in Azure Machine Learning.
Custom container deployments can use web servers other than the default Python Flask server used by Azure Machine Learning. Users of these deployments can still take advantage of Azure Machine Learning's built-in monitoring, scaling, alerting, and authentication.
Microsoft may not be able to help troubleshoot problems caused by a custom image. If you encounter problems, you may be asked to use the default image or one of the images Microsoft provides to see if the problem is specific to your image.
Install and configure the Azure CLI and ML extension. For more information, see Install, set up, and use the CLI (v2) (preview).
You must have an Azure resource group, in which you (or the service principal you use) need to have
Contributoraccess. You'll have such a resource group if you configured your ML extension per the above article.
You must have an Azure Machine Learning workspace. You'll have such a workspace if you configured your ML extension per the above article.
If you've not already set the defaults for Azure CLI, you should save your default settings. To avoid having to repeatedly pass in the values, run:
az account set --subscription <subscription id> az configure --defaults workspace=<azureml workspace name> group=<resource group>
To deploy locally, you must have Docker engine running locally. This step is highly recommended. It will help you debug issues.
Download source code
To follow along with this tutorial, download the source code below.
git clone https://github.com/Azure/azureml-examples --depth 1 cd azureml-examples/cli
Initialize environment variables
Define environment variables:
BASE_PATH=endpoints/online/custom-container AML_MODEL_NAME=tfserving-mounted MODEL_NAME=half_plus_two MODEL_BASE_PATH=/var/azureml-app/azureml-models/$AML_MODEL_NAME/1 ENDPOINT_NAME=tfserving-endpoint DEPLOYMENT_NAME=tfserving
Download a TensorFlow model
Download and unzip a model that divides an input by two and adds 2 to the result:
wget https://aka.ms/half_plus_two-model -O $BASE_PATH/half_plus_two.tar.gz tar -xvf $BASE_PATH/half_plus_two.tar.gz -C $BASE_PATH
Run a TF Serving image locally to test that it works
Use docker to run your image locally for testing:
docker run --rm -d -v $PWD/$BASE_PATH:$MODEL_BASE_PATH -p 8501:8501 \ -e MODEL_BASE_PATH=$MODEL_BASE_PATH -e MODEL_NAME=$MODEL_NAME \ --name="tfserving-test" docker.io/tensorflow/serving:latest sleep 10
Check that you can send liveness and scoring requests to the image
First, check that the container is "alive," meaning that the process inside the container is still running. You should get a 200 (OK) response.
curl -v http://localhost:8501/v1/models/$MODEL_NAME
Then, check that you can get predictions about unlabeled data:
curl --header "Content-Type: application/json" \ --request POST \ --data @$BASE_PATH/sample_request.json \ http://localhost:8501/v1/models/$MODEL_NAME:predict
Stop the image
Now that you've tested locally, stop the image:
docker stop tfserving-test
Create a YAML file for your endpoint
You can configure your cloud deployment using YAML. Take a look at the sample YAML for this endpoint:
$schema: https://azuremlsdk2.blob.core.windows.net/latest/managedOnlineEndpoint.schema.json name: tfserving-endpoint type: online auth_mode: aml_token traffic: tfserving: 100 deployments: - name: tfserving model: name: tfserving-mounted version: 1 local_path: ./half_plus_two environment_variables: MODEL_BASE_PATH: /var/azureml-app/azureml-models/tfserving-mounted/1 MODEL_NAME: half_plus_two environment: name: tfserving version: 1 docker: image: docker.io/tensorflow/serving:latest inference_config: liveness_route: port: 8501 path: /v1/models/half_plus_two readiness_route: port: 8501 path: /v1/models/half_plus_two scoring_route: port: 8501 path: /v1/models/half_plus_two:predict instance_type: Standard_F2s_v2 scale_settings: scale_type: manual instance_count: 1 min_instances: 1 max_instances: 2
There are a few important concepts to notice in this YAML:
Readiness route vs. liveness route
An HTTP server can optionally define paths for both liveness and readiness. A liveness route is used to check whether the server is running. A readiness route is used to check whether the server is ready to do some work. In machine learning inference, a server could respond 200 OK to a liveness request before loading a model. The server could respond 200 OK to a readiness request only after the model has been loaded into memory.
Review the Kubernetes documentation for more information about liveness and readiness probes.
Notice that this deployment uses the same path for both liveness and readiness, since TF Serving only defines a liveness route.
Locating the mounted model
When you deploy a model as a real-time endpoint, Azure Machine Learning mounts your model to your endpoint. Model mounting enables you to deploy new versions of the model without having to create a new Docker image. By default, a model registered with the name foo and version 1 would be located at the following path inside of your deployed container:
So, for example, if you have the following directory structure on your local machine:
azureml-examples cli endpoints online custom-container half_plus_two tfserving-endpoint.yml
model: name: tfserving-mounted version: 1 local_path: ./half_plus_two
then your model will be located at the following location in your endpoint:
var azureml-app azureml-models tfserving-endpoint 1 half_plus_two
Create the endpoint
Now that you've understood how the YAML was constructed, create your endpoint. This command can take a few minutes to complete.
az ml endpoint create -f $BASE_PATH/$ENDPOINT_NAME.yml -n $ENDPOINT_NAME
Invoke the endpoint
Once your deployment completes, see if you can make a scoring request to the deployed endpoint.
RESPONSE=$(az ml endpoint invoke -n $ENDPOINT_NAME --request-file $BASE_PATH/sample_request.json)
Delete endpoint and model
Now that you've successfully scored with your endpoint, you can delete it:
az ml endpoint delete -n $ENDPOINT_NAME -y echo "deleting model..." az ml model delete -n tfserving-mounted --version 1