What are Azure Machine Learning endpoints (preview)?


This feature is currently in public preview. This preview version is provided without a service-level agreement, and it's not recommended for production workloads. Certain features might not be supported or might have constrained capabilities. For more information, see Supplemental Terms of Use for Microsoft Azure Previews.

Use Azure Machine Learning endpoints (preview) to streamline model deployments for both real-time and batch inference deployments. Endpoints provide a unified interface to invoke and manage model deployments across compute types.

In this article, you learn about:

  • Endpoints
  • Deployments
  • Managed online endpoints
  • Azure Kubernetes Service (AKS) online endpoints
  • Batch inference endpoints

What are endpoints and deployments (preview)?

After you train a machine learning model, you need to deploy the model so that others can use it to perform inferencing. In Azure Machine Learning, you can use endpoints (preview) and deployments (preview) to do so.

Diagram showing an endpoint splitting traffic to two deployments

An endpoint is an HTTPS endpoint that clients can call to receive the inferencing (scoring) output of a trained model. It provides:

  • Authentication using "key & token" based auth
  • SSL termination
  • Traffic allocation between deployments
  • A stable scoring URI (endpoint-name.region.inference.ml.azure.com)

A deployment is a set of compute resources hosting the model that performs the actual inferencing. It contains:

  • Model details (code, model, environment)
  • Compute resource and scale settings
  • Advanced settings (like request and probe settings)

A single endpoint can contain multiple deployments. Endpoints and deployments are independent ARM resources that will appear in the Azure portal.

Azure Machine Learning uses the concept of endpoints and deployments to implement different types of endpoints: online endpoints and batch endpoints.

Multiple developer interfaces

Create and manage batch and online endpoints with multiple developer tools:

  • the Azure CLI
  • Azure Machine Learning studio web portal
  • Azure portal (IT/Admin)
  • Support for CI/CD MLOps pipelines using the Azure CLI interface & REST/ARM interfaces

What are online endpoints (preview)?

Online endpoints (preview) are endpoints that are used for online (real-time) inferencing. Compared to batch endpoints, online endpoints contain deployments that are ready to receive data from clients and can send responses back in real time.

Online endpoints requirements

To create an online endpoint, you need to specify the following:

  • Model files (or specify a registered model in your workspace)
  • Scoring script - code needed to perform scoring/inferencing
  • Environment - a Docker image with Conda dependencies, or a dockerfile
  • Compute instance & scale settings

Learn how to deploy online endpoints from the CLI and the studio web portal.

Test and deploy locally for faster debugging

Deploy locally to test your endpoints without deploying to the cloud. Azure Machine Learning creates a local Docker image that mimics the Azure ML image. Azure Machine Learning will build and run deployments for you locally, and cache the image for rapid iterations.

Native blue/green deployment

Recall, that a single endpoint can have multiple deployments. The online endpoint can perform load balancing to allocate any percentage of traffic to each deployment.

Traffic allocation can be used to perform safe rollout blue/green deployments by balancing requests between different instances.

Screenshot showing slider interface to set traffic allocation between deployments

Learn how to safely rollout to online endpoints.

Application Insights integration

All online endpoints integrate with Application Insights to monitor SLAs and diagnose issues.

However managed online endpoints also include out-of-box integration with Azure Logs and Azure Metrics.


  • Authentication: Key and Azure ML Tokens
  • Managed identity: User assigned and system assigned (managed online endpoint only)
  • SSL by default for endpoint invocation

Managed online endpoints vs AKS online endpoints (preview)

There are two types of online endpoints: managed online endpoints (preview) and AKS online endpoints (preview). The following table highlights some of their key differences.

Managed online endpoints AKS online endpoints
Recommended users Users who want a managed model deployment and enhanced MLOps experience Users who prefer Azure Kubernetes Service (AKS) and can self-manage infrastructure requirements
Infrastructure management Managed compute provisioning, scaling, host OS image updates, and security hardening User responsibility
Compute type Managed (AmlCompute) AKS
Out-of-box monitoring Azure Monitoring
(includes key metrics like latency and throughput)
Out-of-box logging Azure Logs and Log Analytics at endpoint level Manual setup at the cluster level
Application Insights Supported Supported
Managed identity Supported Not supported
Virtual Network (VNET) Not supported (public preview) Manually configure at cluster level
View costs Endpoint and deployment level Cluster level

Managed online endpoints

Managed online endpoints can help streamline your deployment process. Managed online endpoints provide the following benefits over AKS online endpoints:

  • Managed infrastructure
    • Automatically provisions the compute and hosts the model (you just need to specify the VM type and scale settings)
    • Automatically performs updates and patches to the underlying host OS image
    • Automatic node recovery in case of system failure

Screenshot showing Azure Monitor graph of endpoint latency

Screenshot cost chart of an endpoint and deployment

For a step-by-step tutorial, see How to deploy managed online endpoints.

What are batch endpoints (preview)?

Batch endpoints (preview) are endpoints that are used to perform batch inferencing on large volumes of data over a period of time. Batch endpoints receive pointers to data and run jobs asynchronously to process the data in parallel on compute clusters. Batch endpoints store outputs to a data store for further analysis.

Learn how to deploy and use batch endpoints with the Azure CLI.

No-code MLflow model deployments

Use the no-code batch endpoint creation experience for MLflow models to automatically create scoring scripts and execution environments.

For batch endpoints using MLflow models, you need to specify the following:

  • Model files (or specify a registered model in your workspace)
  • Compute target

However, if you are not deploying an MLflow model, you need to provide additional requirements:

  • Scoring script - code needed to perform scoring/inferencing
  • Environment - a Docker image with Conda dependencies

Managed cost with autoscaling compute

Invoking a batch endpoint triggers an asynchronous batch inference job. Compute resources are automatically provisioned when the job starts, and automatically de-allocated as the job completes. So you only pay for compute when you use it.

You can override compute resource settings (like instance count) and advanced settings (like mini batch size, error threshold, and so on) for each individual batch inference job to speed up execution as well as reduce cost.

Flexible data sources and storage

You can use the following options for input data when invoking a batch endpoint:

Specify the storage output location to any datastore and path. By default, batch endpoints store their output to the workspace's default blob store, organized by the Job Name (a system-generated GUID).


  • Authentication: Azure Active Directory Tokens
  • SSL by default for endpoint invocation

Next steps