High-performance serving with Triton Inference Server (Preview)
APPLIES TO:
Azure CLI ml extension v2 (current)
Learn how to use NVIDIA Triton Inference Server in Azure Machine Learning with Managed online endpoints.
Triton is multi-framework, open-source software that is optimized for inference. It supports popular machine learning frameworks like TensorFlow, ONNX Runtime, PyTorch, NVIDIA TensorRT, and more. It can be used for your CPU or GPU workloads.
In this article, you will learn how to deploy Triton and a model to a managed online endpoint. Information is provided on using both the CLI (command line) and Azure Machine Learning studio.
Note
- NVIDIA Triton Inference Server is an open-source third-party software that is integrated in Azure Machine Learning.
- While Azure Machine Learning online endpoints are generally available, using Triton with an online endpoint deployment is still in preview.
Prerequisites
Before following the steps in this article, make sure you have the following prerequisites:
An Azure subscription. If you don't have an Azure subscription, create a free account before you begin. Try the free or paid version of Azure Machine Learning.
The Azure CLI and the
ml
extension to the Azure CLI. For more information, see Install, set up, and use the CLI (v2) (preview).Important
The CLI examples in this article assume that you are using the Bash (or compatible) shell. For example, from a Linux system or Windows Subsystem for Linux.
An Azure Machine Learning workspace. If you don't have one, use the steps in the Install, set up, and use the CLI (v2) (preview) to create one.
A working Python 3.8 (or higher) environment.
Access to NCv3-series VMs for your Azure subscription.
Important
You may need to request a quota increase for your subscription before you can use this series of VMs. For more information, see NCv3-series.
The information in this article is based on code samples contained in the azureml-examples repository. To run the commands locally without having to copy/paste YAML and other files, clone the repo and then change directories to the cli
directory in the repo:
git clone https://github.com/Azure/azureml-examples --depth 1
cd azureml-examples
cd cli
If you haven't already set the defaults for the Azure CLI, save your default settings. To avoid passing in the values for your subscription, workspace, and resource group multiple times, use the following commands. Replace the following parameters with values for your specific configuration:
- Replace
<subscription>
with your Azure subscription ID. - Replace
<workspace>
with your Azure Machine Learning workspace name. - Replace
<resource-group>
with the Azure resource group that contains your workspace. - Replace
<location>
with the Azure region that contains your workspace.
Tip
You can see what your current defaults are by using the az configure -l
command.
az account set --subscription <subscription>
az configure --defaults workspace=<workspace> group=<resource-group> location=<location>
NVIDIA Triton Inference Server requires a specific model repository structure, where there is a directory for each model and subdirectories for the model version. The contents of each model version subdirectory is determined by the type of the model and the requirements of the backend that supports the model. To see all the model repository structure https://github.com/triton-inference-server/server/blob/main/docs/model_repository.md#model-files
The information in this document is based on using a model stored in ONNX format, so the directory structure of the model repository is <model-repository>/<model-name>/1/model.onnx
. Specifically, this model performs image identification.
Deploy using CLI (v2)
APPLIES TO:
Azure CLI ml extension v2 (current)
This section shows how you can deploy Triton to managed online endpoint using the Azure CLI with the Machine Learning extension (v2).
Important
For Triton no-code-deployment, testing via local endpoints is currently not supported.
To avoid typing in a path for multiple commands, use the following command to set a
BASE_PATH
environment variable. This variable points to the directory where the model and associated YAML configuration files are located:BASE_PATH=endpoints/online/triton/single-model
Use the following command to set the name of the endpoint that will be created. In this example, a random name is created for the endpoint:
export ENDPOINT_NAME=triton-single-endpt-`echo $RANDOM`
Install Python requirements using the following commands:
pip install numpy pip install tritonclient[http] pip install pillow pip install gevent
Create a YAML configuration file for your endpoint. The following example configures the name and authentication mode of the endpoint. The one used in the following commands is located at
/cli/endpoints/online/triton/single-model/create-managed-endpoint.yml
in the azureml-examples repo you cloned earlier:create-managed-endpoint.yaml
$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineEndpoint.schema.json name: my-endpoint auth_mode: aml_token
To create a new endpoint using the YAML configuration, use the following command:
az ml online-endpoint create -n $ENDPOINT_NAME -f $BASE_PATH/create-managed-endpoint.yaml
Create a YAML configuration file for the deployment. The following example configures a deployment named blue to the endpoint created in the previous step. The one used in the following commands is located at
/cli/endpoints/online/triton/single-model/create-managed-deployment.yml
in the azureml-examples repo you cloned earlier:Important
For Triton no-code-deployment (NCD) to work, setting
type
totriton_model
is required,type: triton_model
. For more information, see CLI (v2) model YAML schema.This deployment uses a Standard_NC6s_v3 VM. You may need to request a quota increase for your subscription before you can use this VM. For more information, see NCv3-series.
$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineDeployment.schema.json name: blue endpoint_name: my-endpoint model: name: sample-densenet-onnx-model version: 1 path: ./models type: triton_model instance_count: 1 instance_type: Standard_NC6s_v3
To create the deployment using the YAML configuration, use the following command:
az ml online-deployment create --name blue --endpoint $ENDPOINT_NAME -f $BASE_PATH/create-managed-deployment.yaml --all-traffic
Invoke your endpoint
Once your deployment completes, use the following command to make a scoring request to the deployed endpoint.
Tip
The file /cli/endpoints/online/triton/single-model/triton_densenet_scoring.py
in the azureml-examples repo is used for scoring. The image passed to the endpoint needs pre-processing to meet the size, type, and format requirements, and post-processing to show the predicted label. The triton_densenet_scoring.py
uses the tritonclient.http
library to communicate with the Triton inference server.
To get the endpoint scoring uri, use the following command:
scoring_uri=$(az ml online-endpoint show -n $ENDPOINT_NAME --query scoring_uri -o tsv) scoring_uri=${scoring_uri%/*}
To get an authentication token, use the following command:
auth_token=$(az ml online-endpoint get-credentials -n $ENDPOINT_NAME --query accessToken -o tsv)
To score data with the endpoint, use the following command. It submits the image of a peacock (https://aka.ms/peacock-pic) to the endpoint:
python $BASE_PATH/triton_densenet_scoring.py --base_url=$scoring_uri --token=$auth_token
The response from the script is similar to the following text:
Is server ready - True Is model ready - True /azureml-examples/cli/endpoints/online/triton/single-model/densenet_labels.txt 84 : PEACOCK
Delete your endpoint and model
Once you're done with the endpoint, use the following command to delete it:
az ml online-endpoint delete -n $ENDPOINT_NAME --yes
Use the following command to delete your model:
az ml model delete --name $MODEL_NAME --version $MODEL_VERSION
Deploy using Azure Machine Learning studio
This section shows how you can deploy Triton to managed online endpoint using Azure Machine Learning studio.
Register your model in Triton format using the following YAML and CLI command. The YAML uses a densenet-onnx model from https://github.com/Azure/azureml-examples/tree/main/cli/endpoints/online/triton/single-model
create-triton-model.yaml
name: densenet-onnx-model version: 1 path: ./models type: triton_model description: Registering my Triton format model.
az ml model create -f create-triton-model.yaml
The following screenshot shows how your registered model will look on the Models page of Azure Machine Learning studio.
From studio, select your workspace and then use either the endpoints or models page to create the endpoint deployment:
From the Endpoints page, select Create.
Provide a name and authentication type for the endpoint, and then select Next.
When selecting a model, select the Triton model registered previously. Select Next to continue.
When you select a model registered in Triton format, in the Environment step of the wizard, you don't need scoring script and environment.
Complete the wizard to deploy the model to the endpoint.
Next steps
To learn more, review these articles:
- Deploy models with REST
- Create and use managed online endpoints in the studio
- Safe rollout for online endpoints
- How to autoscale managed online endpoints
- View costs for an Azure Machine Learning managed online endpoint
- Access Azure resources with a managed online endpoint and managed identity
- Troubleshoot managed online endpoints deployment
Feedback
Submit and view feedback for