Secure an Azure Machine Learning inferencing environment with virtual networks

In this article, you learn how to secure inferencing environments with a virtual network in Azure Machine Learning.

This article is part four of a five-part series that walks you through securing an Azure Machine Learning workflow. We highly recommend that you read through Part one: VNet overview to understand the overall architecture first.

See the other articles in this series:

1. VNet overview > Secure the workspace > 3. Secure the training environment > 4. Secure the inferencing environment > 5. Enable studio functionality

In this article you learn how to secure the following inferencing resources in a virtual network:

  • Default Azure Kubernetes Service (AKS) cluster
  • Private AKS cluster
  • AKS cluster with private link
  • Azure Container Instances (ACI)

Prerequisites

  • Read the Network security overview article to understand common virtual network scenarios and overall virtual network architecture.

  • An existing virtual network and subnet to use with your compute resources.

  • To deploy resources into a virtual network or subnet, your user account must have permissions to the following actions in Azure role-based access control (Azure RBAC):

    • "Microsoft.Network/virtualNetworks/join/action" on the virtual network resource.
    • "Microsoft.Network/virtualNetworks/subnet/join/action" on the subnet resource.

    For more information on Azure RBAC with networking, see the Networking built-in roles

Azure Kubernetes Service

To use an AKS cluster in a virtual network, the following network requirements must be met:

  • Follow the prerequisites in Configure advanced networking in Azure Kubernetes Service (AKS).
  • The AKS instance and the virtual network must be in the same region. If you secure the Azure Storage Account(s) used by the workspace in a virtual network, they must be in the same virtual network as the AKS instance too.

To add AKS in a virtual network to your workspace, use the following steps:

  1. Sign in to Azure Machine Learning studio, and then select your subscription and workspace.

  2. Select Compute on the left.

  3. Select Inference clusters from the center, and then select +.

  4. In the New Inference Cluster dialog, select Advanced under Network configuration.

  5. To configure this compute resource to use a virtual network, perform the following actions:

    1. In the Resource group drop-down list, select the resource group that contains the virtual network.
    2. In the Virtual network drop-down list, select the virtual network that contains the subnet.
    3. In the Subnet drop-down list, select the subnet.
    4. In the Kubernetes Service address range box, enter the Kubernetes service address range. This address range uses a Classless Inter-Domain Routing (CIDR) notation IP range to define the IP addresses that are available for the cluster. It must not overlap with any subnet IP ranges (for example, 10.0.0.0/16).
    5. In the Kubernetes DNS service IP address box, enter the Kubernetes DNS service IP address. This IP address is assigned to the Kubernetes DNS service. It must be within the Kubernetes service address range (for example, 10.0.0.10).
    6. In the Docker bridge address box, enter the Docker bridge address. This IP address is assigned to Docker Bridge. It must not be in any subnet IP ranges, or the Kubernetes service address range (for example, 172.17.0.1/16).

    Azure Machine Learning: Machine Learning Compute virtual network settings

  6. When you deploy a model as a web service to AKS, a scoring endpoint is created to handle inferencing requests. Make sure that the NSG group that controls the virtual network has an inbound security rule enabled for the IP address of the scoring endpoint if you want to call it from outside the virtual network.

    To find the IP address of the scoring endpoint, look at the scoring URI for the deployed service. For information on viewing the scoring URI, see Consume a model deployed as a web service.

    Important

    Keep the default outbound rules for the NSG. For more information, see the default security rules in Security groups.

    An inbound security rule

    Important

    The IP address shown in the image for the scoring endpoint will be different for your deployments. While the same IP is shared by all deployments to one AKS cluster, each AKS cluster will have a different IP address.

You can also use the Azure Machine Learning SDK to add Azure Kubernetes Service in a virtual network. If you already have an AKS cluster in a virtual network, attach it to the workspace as described in How to deploy to AKS. The following code creates a new AKS instance in the default subnet of a virtual network named mynetwork:

from azureml.core.compute import ComputeTarget, AksCompute

# Create the compute configuration and set virtual network information
config = AksCompute.provisioning_configuration(location="eastus2")
config.vnet_resourcegroup_name = "mygroup"
config.vnet_name = "mynetwork"
config.subnet_name = "default"
config.service_cidr = "10.0.0.0/16"
config.dns_service_ip = "10.0.0.10"
config.docker_bridge_cidr = "172.17.0.1/16"

# Create the compute target
aks_target = ComputeTarget.create(workspace=ws,
                                  name="myaks",
                                  provisioning_configuration=config)

When the creation process is completed, you can run inference, or model scoring, on an AKS cluster behind a virtual network. For more information, see How to deploy to AKS.

For more information on using Role-Based Access Control with Kubernetes, see Use Azure RBAC for Kubernetes authorization.

Network contributor role

Important

If you create or attach an AKS cluster by providing a virtual network you previously created, you must grant the service principal (SP) or managed identity for your AKS cluster the Network Contributor role to the resource group that contains the virtual network.

To add the identity as network contributor, use the following steps:

  1. To find the service principal or managed identity ID for AKS, use the following Azure CLI commands. Replace <aks-cluster-name> with the name of the cluster. Replace <resource-group-name> with the name of the resource group that contains the AKS cluster:

    az aks show -n <aks-cluster-name> --resource-group <resource-group-name> --query servicePrincipalProfile.clientId
    

    If this command returns a value of msi, use the following command to identify the principal ID for the managed identity:

    az aks show -n <aks-cluster-name> --resource-group <resource-group-name> --query identity.principalId
    
  2. To find the ID of the resource group that contains your virtual network, use the following command. Replace <resource-group-name> with the name of the resource group that contains the virtual network:

    az group show -n <resource-group-name> --query id
    
  3. To add the service principal or managed identity as a network contributor, use the following command. Replace <SP-or-managed-identity> with the ID returned for the service principal or managed identity. Replace <resource-group-id> with the ID returned for the resource group that contains the virtual network:

    az role assignment create --assignee <SP-or-managed-identity> --role 'Network Contributor' --scope <resource-group-id>
    

For more information on using the internal load balancer with AKS, see Use internal load balancer with Azure Kubernetes Service.

Secure VNet traffic

There are two approaches to isolate traffic to and from the AKS cluster to the virtual network:

  • Private AKS cluster: This approach uses Azure Private Link to secure communications with the cluster for deployment/management operations.
  • Internal AKS load balancer: This approach configures the endpoint for your deployments to AKS to use a private IP within the virtual network.

Warning

Internal load balancer does not work with an AKS cluster that uses kubenet. If you want to use an internal load balancer and a private AKS cluster at the same time, configure your private AKS cluster with Azure Container Networking Interface (CNI). For more information, see Configure Azure CNI networking in Azure Kubernetes Service.

Private AKS cluster

By default, AKS clusters have a control plane, or API server, with public IP addresses. You can configure AKS to use a private control plane by creating a private AKS cluster. For more information, see Create a private Azure Kubernetes Service cluster.

After you create the private AKS cluster, attach the cluster to the virtual network to use with Azure Machine Learning.

Important

Before using a private link enabled AKS cluster with Azure Machine Learning, you must open a support incident to enable this functionality. For more information, see Manage and increase quotas.

Internal AKS load balancer

By default, AKS deployments use a public load balancer. In this section, you learn how to configure AKS to use an internal load balancer. An internal (or private) load balancer is used where only private IPs are allowed as frontend. Internal load balancers are used to load balance traffic inside a virtual network

A private load balancer is enabled by configuring AKS to use an internal load balancer.

Enable private load balancer

Important

You cannot enable private IP when creating the Azure Kubernetes Service cluster in Azure Machine Learning studio. You can create one with an internal load balancer when using the Python SDK or Azure CLI extension for machine learning.

The following examples demonstrate how to create a new AKS cluster with a private IP/internal load balancer using the SDK and CLI:

import azureml.core
from azureml.core.compute import AksCompute, ComputeTarget

# Verify that cluster does not exist already
try:
    aks_target = AksCompute(workspace=ws, name=aks_cluster_name)
    print("Found existing aks cluster")

except:
    print("Creating new aks cluster")

    # Subnet to use for AKS
    subnet_name = "default"
    # Create AKS configuration
    prov_config=AksCompute.provisioning_configuration(load_balancer_type="InternalLoadBalancer")
    # Set info for existing virtual network to create the cluster in
    prov_config.vnet_resourcegroup_name = "myvnetresourcegroup"
    prov_config.vnet_name = "myvnetname"
    prov_config.service_cidr = "10.0.0.0/16"
    prov_config.dns_service_ip = "10.0.0.10"
    prov_config.subnet_name = subnet_name
    prov_config.load_balancer_subnet = subnet_name
    prov_config.docker_bridge_cidr = "172.17.0.1/16"

    # Create compute target
    aks_target = ComputeTarget.create(workspace = ws, name = "myaks", provisioning_configuration = prov_config)
    # Wait for the operation to complete
    aks_target.wait_for_completion(show_output = True)

When attaching an existing cluster to your workspace, you must wait until after the attach operation to configure the load balancer. For information on attaching a cluster, see Attach an existing AKS cluster.

After attaching the existing cluster, you can then update the cluster to use an internal load balancer/private IP:

import azureml.core
from azureml.core.compute.aks import AksUpdateConfiguration
from azureml.core.compute import AksCompute

# ws = workspace object. Creation not shown in this snippet
aks_target = AksCompute(ws,"myaks")

# Change to the name of the subnet that contains AKS
subnet_name = "default"
# Update AKS configuration to use an internal load balancer
update_config = AksUpdateConfiguration(None, "InternalLoadBalancer", subnet_name)
aks_target.update(update_config)
# Wait for the operation to complete
aks_target.wait_for_completion(show_output = True)

Enable Azure Container Instances (ACI)

Azure Container Instances are dynamically created when deploying a model. To enable Azure Machine Learning to create ACI inside the virtual network, you must enable subnet delegation for the subnet used by the deployment.

Warning

When using Azure Container Instances in a virtual network, the virtual network must be:

  • In the same resource group as your Azure Machine Learning workspace.
  • If your workspace has a private endpoint, the virtual network used for Azure Container Instances must be the same as the one used by the workspace private endpoint.

When using Azure Container Instances inside the virtual network, the Azure Container Registry (ACR) for your workspace cannot be in the virtual network.

To use ACI in a virtual network to your workspace, use the following steps:

  1. To enable subnet delegation on your virtual network, use the information in the Add or remove a subnet delegation article. You can enable delegation when creating a virtual network, or add it to an existing network.

    Important

    When enabling delegation, use Microsoft.ContainerInstance/containerGroups as the Delegate subnet to service value.

  2. Deploy the model using AciWebservice.deploy_configuration(), use the vnet_name and subnet_name parameters. Set these parameters to the virtual network name and subnet where you enabled delegation.

Limit outbound connectivity from the virtual network

If you don't want to use the default outbound rules and you do want to limit the outbound access of your virtual network, you must allow access to Azure Container Registry. For example, make sure that your Network Security Groups (NSG) contains a rule that allows access to the AzureContainerRegistry.RegionName service tag where `{RegionName} is the name of an Azure region.

Next steps

This article is part four of a five-part virtual network series. See the rest of the articles to learn how to secure a virtual network:

Also see the article on using custom DNS for name resolution.