Secure an Azure Machine Learning inferencing environment with virtual networks

In this article, you learn how to secure inferencing environments with a virtual network in Azure Machine Learning.

Tip

This article is part of a series on securing an Azure Machine Learning workflow. See the other articles in this series:

For a tutorial on creating a secure workspace, see Tutorial: Create a secure workspace or Tutorial: Create a secure workspace using a template.

In this article you learn how to secure the following inferencing resources in a virtual network:

  • Default Azure Kubernetes Service (AKS) cluster
  • Private AKS cluster
  • AKS cluster with private link
  • Azure Container Instances (ACI)

Prerequisites

  • Read the Network security overview article to understand common virtual network scenarios and overall virtual network architecture.

  • An existing virtual network and subnet to use with your compute resources.

  • To deploy resources into a virtual network or subnet, your user account must have permissions to the following actions in Azure role-based access control (Azure RBAC):

    • "Microsoft.Network/virtualNetworks/join/action" on the virtual network resource.
    • "Microsoft.Network/virtualNetworks/subnet/join/action" on the subnet resource.

    For more information on Azure RBAC with networking, see the Networking built-in roles

Limitations

Azure Container Instances

  • When using Azure Container Instances in a virtual network, the virtual network must be in the same resource group as your Azure Machine Learning workspace. Otherwise, the virtual network can be in a different resource group.
  • If your workspace has a private endpoint, the virtual network used for Azure Container Instances must be the same as the one used by the workspace private endpoint.
  • When using Azure Container Instances inside the virtual network, the Azure Container Registry (ACR) for your workspace can't be in the virtual network.

Azure Kubernetes Service

Azure Kubernetes Service

Important

To use an AKS cluster in a virtual network, first follow the prerequisites in Configure advanced networking in Azure Kubernetes Service (AKS).

To add AKS in a virtual network to your workspace, use the following steps:

  1. Sign in to Azure Machine Learning studio, and then select your subscription and workspace.

  2. Select Compute on the left, Inference clusters from the center, and then select + New.

    Screenshot of create inference cluster dialog

  3. From the Create inference cluster dialog, select Create new and the VM size to use for the cluster. Finally, select Next.

    Screenshot of VM settings

  4. From the Configure Settings section, enter a Compute name, select the Cluster Purpose, Number of nodes, and then select Advanced to display the network settings. In the Configure virtual network area, set the following values:

    • Set the Virtual network to use.

      Tip

      If your workspace uses a private endpoint to connect to the virtual network, the Virtual network selection field is greyed out.

    • Set the Subnet to create the cluster in.

    • In the Kubernetes Service address range field, enter the Kubernetes service address range. This address range uses a Classless Inter-Domain Routing (CIDR) notation IP range to define the IP addresses that are available for the cluster. It must not overlap with any subnet IP ranges (for example, 10.0.0.0/16).

    • In the Kubernetes DNS service IP address field, enter the Kubernetes DNS service IP address. This IP address is assigned to the Kubernetes DNS service. It must be within the Kubernetes service address range (for example, 10.0.0.10).

    • In the Docker bridge address field, enter the Docker bridge address. This IP address is assigned to Docker Bridge. It must not be in any subnet IP ranges, or the Kubernetes service address range (for example, 172.18.0.1/16).

    Screenshot of configure network settings

  5. When you deploy a model as a web service to AKS, a scoring endpoint is created to handle inferencing requests. Make sure that the network security group (NSG) that controls the virtual network has an inbound security rule enabled for the IP address of the scoring endpoint if you want to call it from outside the virtual network.

    To find the IP address of the scoring endpoint, look at the scoring URI for the deployed service. For information on viewing the scoring URI, see Consume a model deployed as a web service.

    Important

    Keep the default outbound rules for the NSG. For more information, see the default security rules in Security groups.

    An inbound security rule

    Important

    The IP address shown in the image for the scoring endpoint will be different for your deployments. While the same IP is shared by all deployments to one AKS cluster, each AKS cluster will have a different IP address.

You can also use the Azure Machine Learning SDK to add Azure Kubernetes Service in a virtual network. If you already have an AKS cluster in a virtual network, attach it to the workspace as described in How to deploy to AKS. The following code creates a new AKS instance in the default subnet of a virtual network named mynetwork:

from azureml.core.compute import ComputeTarget, AksCompute

# Create the compute configuration and set virtual network information
config = AksCompute.provisioning_configuration(location="eastus2")
config.vnet_resourcegroup_name = "mygroup"
config.vnet_name = "mynetwork"
config.subnet_name = "default"
config.service_cidr = "10.0.0.0/16"
config.dns_service_ip = "10.0.0.10"
config.docker_bridge_cidr = "172.17.0.1/16"

# Create the compute target
aks_target = ComputeTarget.create(workspace=ws,
                                  name="myaks",
                                  provisioning_configuration=config)

When the creation process is completed, you can run inference, or model scoring, on an AKS cluster behind a virtual network. For more information, see How to deploy to AKS.

For more information on using Role-Based Access Control with Kubernetes, see Use Azure RBAC for Kubernetes authorization.

Network contributor role

Important

If you create or attach an AKS cluster by providing a virtual network you previously created, you must grant the service principal (SP) or managed identity for your AKS cluster the Network Contributor role to the resource group that contains the virtual network.

To add the identity as network contributor, use the following steps:

  1. To find the service principal or managed identity ID for AKS, use the following Azure CLI commands. Replace <aks-cluster-name> with the name of the cluster. Replace <resource-group-name> with the name of the resource group that contains the AKS cluster:

    az aks show -n <aks-cluster-name> --resource-group <resource-group-name> --query servicePrincipalProfile.clientId
    

    If this command returns a value of msi, use the following command to identify the principal ID for the managed identity:

    az aks show -n <aks-cluster-name> --resource-group <resource-group-name> --query identity.principalId
    
  2. To find the ID of the resource group that contains your virtual network, use the following command. Replace <resource-group-name> with the name of the resource group that contains the virtual network:

    az group show -n <resource-group-name> --query id
    
  3. To add the service principal or managed identity as a network contributor, use the following command. Replace <SP-or-managed-identity> with the ID returned for the service principal or managed identity. Replace <resource-group-id> with the ID returned for the resource group that contains the virtual network:

    az role assignment create --assignee <SP-or-managed-identity> --role 'Network Contributor' --scope <resource-group-id>
    

For more information on using the internal load balancer with AKS, see Use internal load balancer with Azure Kubernetes Service.

Secure VNet traffic

There are two approaches to isolate traffic to and from the AKS cluster to the virtual network:

  • Private AKS cluster: This approach uses Azure Private Link to secure communications with the cluster for deployment/management operations.
  • Internal AKS load balancer: This approach configures the endpoint for your deployments to AKS to use a private IP within the virtual network.

Private AKS cluster

By default, AKS clusters have a control plane, or API server, with public IP addresses. You can configure AKS to use a private control plane by creating a private AKS cluster. For more information, see Create a private Azure Kubernetes Service cluster.

After you create the private AKS cluster, attach the cluster to the virtual network to use with Azure Machine Learning.

Internal AKS load balancer

By default, AKS deployments use a public load balancer. In this section, you learn how to configure AKS to use an internal load balancer. An internal (or private) load balancer is used where only private IPs are allowed as frontend. Internal load balancers are used to load balance traffic inside a virtual network

A private load balancer is enabled by configuring AKS to use an internal load balancer.

Enable private load balancer

Important

You cannot enable private IP when creating the Azure Kubernetes Service cluster in Azure Machine Learning studio. You can create one with an internal load balancer when using the Python SDK or Azure CLI extension for machine learning.

The following examples demonstrate how to create a new AKS cluster with a private IP/internal load balancer using the SDK and CLI:

import azureml.core
from azureml.core.compute import AksCompute, ComputeTarget

# Verify that cluster does not exist already
try:
    aks_target = AksCompute(workspace=ws, name=aks_cluster_name)
    print("Found existing aks cluster")

except:
    print("Creating new aks cluster")

    # Subnet to use for AKS
    subnet_name = "default"
    # Create AKS configuration
    prov_config=AksCompute.provisioning_configuration(load_balancer_type="InternalLoadBalancer")
    # Set info for existing virtual network to create the cluster in
    prov_config.vnet_resourcegroup_name = "myvnetresourcegroup"
    prov_config.vnet_name = "myvnetname"
    prov_config.service_cidr = "10.0.0.0/16"
    prov_config.dns_service_ip = "10.0.0.10"
    prov_config.subnet_name = subnet_name
    prov_config.load_balancer_subnet = subnet_name
    prov_config.docker_bridge_cidr = "172.17.0.1/16"

    # Create compute target
    aks_target = ComputeTarget.create(workspace = ws, name = "myaks", provisioning_configuration = prov_config)
    # Wait for the operation to complete
    aks_target.wait_for_completion(show_output = True)

When attaching an existing cluster to your workspace, you must wait until after the attach operation to configure the load balancer. For information on attaching a cluster, see Attach an existing AKS cluster.

After attaching the existing cluster, you can then update the cluster to use an internal load balancer/private IP:

import azureml.core
from azureml.core.compute.aks import AksUpdateConfiguration
from azureml.core.compute import AksCompute

# ws = workspace object. Creation not shown in this snippet
aks_target = AksCompute(ws,"myaks")

# Change to the name of the subnet that contains AKS
subnet_name = "default"
# Update AKS configuration to use an internal load balancer
update_config = AksUpdateConfiguration(None, "InternalLoadBalancer", subnet_name)
aks_target.update(update_config)
# Wait for the operation to complete
aks_target.wait_for_completion(show_output = True)

Enable Azure Container Instances (ACI)

Azure Container Instances are dynamically created when deploying a model. To enable Azure Machine Learning to create ACI inside the virtual network, you must enable subnet delegation for the subnet used by the deployment. To use ACI in a virtual network to your workspace, use the following steps:

  1. To enable subnet delegation on your virtual network, use the information in the Add or remove a subnet delegation article. You can enable delegation when creating a virtual network, or add it to an existing network.

    Important

    When enabling delegation, use Microsoft.ContainerInstance/containerGroups as the Delegate subnet to service value.

  2. Deploy the model using AciWebservice.deploy_configuration(), use the vnet_name and subnet_name parameters. Set these parameters to the virtual network name and subnet where you enabled delegation.

Limit outbound connectivity from the virtual network

If you don't want to use the default outbound rules and you do want to limit the outbound access of your virtual network, you must allow access to Azure Container Registry. For example, make sure that your Network Security Groups (NSG) contains a rule that allows access to the AzureContainerRegistry.RegionName service tag where `{RegionName} is the name of an Azure region.

Next steps

This article is part of a series on securing an Azure Machine Learning workflow. See the other articles in this series: