Secure an Azure Machine Learning workspace with virtual networks

In this article, you learn how to secure an Azure Machine Learning workspace and its associated resources in a virtual network.

Tip

This article is part of a series on securing an Azure Machine Learning workflow. See the other articles in this series:

For a tutorial on creating a secure workspace, see Tutorial: Create a secure workspace.

In this article you learn how to enable the following workspaces resources in a virtual network:

  • Azure Machine Learning workspace
  • Azure Storage accounts
  • Azure Machine Learning datastores and datasets
  • Azure Key Vault
  • Azure Container Registry

Prerequisites

  • Read the Network security overview article to understand common virtual network scenarios and overall virtual network architecture.

  • Read the Azure Machine Learning best practices for enterprise security article to learn about best practices.

  • An existing virtual network and subnet to use with your compute resources.

    Tip

    If you plan on using Azure Container Instances in the virtual network (to deploy models), then the workspace and virtual network must be in the same resource group. Otherwise, they can be in different groups.

  • To deploy resources into a virtual network or subnet, your user account must have permissions to the following actions in Azure role-based access control (Azure RBAC):

    • "Microsoft.Network/virtualNetworks/join/action" on the virtual network resource.
    • "Microsoft.Network/virtualNetworks/subnet/join/action" on the subnet resource.

    For more information on Azure RBAC with networking, see the Networking built-in roles

Azure Container Registry

  • Your Azure Container Registry must be Premium version. For more information on upgrading, see Changing SKUs.

  • Your Azure Container Registry must be in the same virtual network and subnet as the storage account and compute targets used for training or inference.

  • Your Azure Machine Learning workspace must contain an Azure Machine Learning compute cluster.

Limitations

Azure Storage Account

If both the Azure Machine Learning workspace and the Azure Storage Account use a private endpoint to connect to the VNet, both must be within the same subnet.

Azure Container Registry

When ACR is behind a virtual network, Azure Machine Learning cannot use it to directly build Docker images. Instead, the compute cluster is used to build the images.

Important

The compute cluster used to build Docker images needs to be able to access the package repositories that are used to train and deploy your models. You may need to add network security rules that allow access to public repos, use private Python packages, or use custom Docker images that already include the packages.

Warning

If your Azure Container Registry uses a private endpoint to communicate with the virtual network, you cannot use a managed identity with an Azure Machine Learning compute cluster. To use a managed identity with a compute cluster, use a service endpoint with the Azure Container Registry for the workspace.

Required public internet access

Azure Machine Learning requires both inbound and outbound access to the public internet. The following tables provide an overview of what access is required and what it is for. The protocol for all items is TCP. For service tags that end in .region, replace region with the Azure region that contains your workspace. For example, Storage.westus:

Direction Ports Service tag Purpose
Inbound 29876-29877 BatchNodeManagement Create, update, and delete of Azure Machine Learning compute instance and compute cluster.
Inbound 44224 AzureMachineLearning Create, update, and delete of Azure Machine Learning compute instance.
Outbound * AzureActiveDirectory Authentication using Azure AD.
Outbound 443 AzureMachineLearning Using Azure Machine Learning services.
Outbound 443 AzureResourceManager Creation of Azure resources with Azure Machine Learning.
Outbound 443 Storage.region Access data stored in the Azure Storage Account for the Azure Batch service.
Outbound 443 AzureFrontDoor.FrontEnd
* Not needed in Azure China.
Global entry point for Azure Machine Learning studio.
Outbound 443 ContainerRegistry.region Access docker images provided by Microsoft.
Outbound 443 MicrosoftContainerRegistry.region Access docker images provided by Microsoft. Setup of the Azure Machine Learning router for Azure Kubernetes Service.
Outbound 443 Keyvault.region Access the key vault for the Azure Batch service. Only needed if your workspace was created with the hbi_workspace flag enabled.

Tip

If you need the IP addresses instead of service tags, use one of the following options:

The IP addresses may change periodically.

You may also need to allow outbound traffic to Visual Studio Code and non-Microsoft sites for the installation of packages required by your machine learning project. The following table lists commonly used repositories for machine learning:

Host name Purpose
anaconda.com
*.anaconda.com
Used to install default packages.
*.anaconda.org Used to get repo data.
pypi.org Used to list dependencies from the default index, if any, and the index is not overwritten by user settings. If the index is overwritten, you must also allow *.pythonhosted.org.
cloud.r-project.org Used when installing CRAN packages for R development.
*pytorch.org Used by some examples based on PyTorch.
*.tensorflow.org Used by some examples based on Tensorflow.
update.code.visualstudio.com

*.vo.msecnd.net
Used to retrieve VS Code server bits which are installed on the compute instance through a setup script.
raw.githubusercontent.com/microsoft/vscode-tools-for-ai/master/azureml_remote_websocket_server/* Used to retrieve websocket server bits which are installed on the compute instance. The websocket server is used to transmit requests from Visual Studio Code client (desktop application) to Visual Studio Code server running on the compute instance.

When using Azure Kubernetes Service (AKS) with Azure Machine Learning, allow the following traffic to the AKS VNet:

For information on using a firewall solution, see Use a firewall with Azure Machine Learning.

Secure the workspace with private endpoint

Azure Private Link lets you connect to your workspace using a private endpoint. The private endpoint is a set of private IP addresses within your virtual network. You can then limit access to your workspace to only occur over the private IP addresses. A private endpoint helps reduce the risk of data exfiltration.

For more information on configuring a private endpoint for your workspace, see How to configure a private endpoint.

Warning

Securing a workspace with private endpoints does not ensure end-to-end security by itself. You must follow the steps in the rest of this article, and the VNet series, to secure individual components of your solution. For example, if you use a private endpoint for the workspace, but your Azure Storage Account is not behind the VNet, traffic between the workspace and storage does not use the VNet for security.

Secure Azure storage accounts

Azure Machine Learning supports storage accounts configured to use either a private endpoint or service endpoint.

  1. In the Azure portal, select the Azure Storage Account.

  2. Use the information in Use private endpoints for Azure Storage to add private endpoints for the following storage sub-resources:

    • Blob
    • File
    • Queue - Only needed if you plan to use ParallelRunStep in an Azure Machine Learning pipeline.
    • Table - Only needed if you plan to use ParallelRunStep in an Azure Machine Learning pipeline.

    Screenshot showing private endpoint configuration page with blob and file options

    Tip

    When configuring a storage account that is not the default storage, select the Target subresource type that corresponds to the storage account you want to add.

  3. After creating the private endpoints for thee sub-resources, select the Firewalls and virtual networks tab under Networking for the storage account.

  4. Select Selected networks, and then under Resource instances, select Microsoft.MachineLearningServices/Workspace as the Resource type. Select your workspace using Instance name. For more information, see Trusted access based on system-assigned managed identity.

    Tip

    Alternatively, you can select Allow Azure services on the trusted services list to access this storage account to more broadly allow access from trusted services. For more information, see Configure Azure Storage firewalls and virtual networks.

    The networking area on the Azure Storage page in the Azure portal when using private endpoint

  5. Select Save to save the configuration.

Tip

When using a private endpoint, you can also disable public access. For more information, see disallow public read access.

Secure Azure Key Vault

Azure Machine Learning uses an associated Key Vault instance to store the following credentials:

  • The associated storage account connection string
  • Passwords to Azure Container Repository instances
  • Connection strings to data stores

Azure key vault can be configured to use either a private endpoint or service endpoint. To use Azure Machine Learning experimentation capabilities with Azure Key Vault behind a virtual network, use the following steps:

Tip

Regardless of whether you use a private endpoint or service endpoint, the key vault must be in the same network as the private endpoint of the workspace.

For information on using a private endpoint with Azure Key Vault, see Integrate Key Vault with Azure Private Link.

Enable Azure Container Registry (ACR)

Tip

If you did not use an existing Azure Container Registry when creating the workspace, one may not exist. By default, the workspace will not create an ACR instance until it needs one. To force the creation of one, train or deploy a model using your workspace before using the steps in this section.

Azure Container Registry can be configured to use a private endpoint. Use the following steps to configure your workspace to use ACR when it is in the virtual network:

  1. Find the name of the Azure Container Registry for your workspace, using one of the following methods:

    Azure portal

    From the overview section of your workspace, the Registry value links to the Azure Container Registry.

    Azure Container Registry for the workspace

    Azure CLI

    If you have installed the Machine Learning extension for Azure CLI, you can use the az ml workspace show command to show the workspace information.

    az ml workspace show -w yourworkspacename -g resourcegroupname --query 'containerRegistry'
    

    This command returns a value similar to "/subscriptions/{GUID}/resourceGroups/{resourcegroupname}/providers/Microsoft.ContainerRegistry/registries/{ACRname}". The last part of the string is the name of the Azure Container Registry for the workspace.

  2. Limit access to your virtual network using the steps in Connect privately to an Azure Container Registry. When adding the virtual network, select the virtual network and subnet for your Azure Machine Learning resources.

  3. Configure the ACR for the workspace to Allow access by trusted services.

  4. Create an Azure Machine Learning compute cluster. This is used to build Docker images when ACR is behind a VNet. For more information, see Create a compute cluster.

  5. Use the Azure Machine Learning Python SDK to configure the workspace to build Docker images using the compute instance. The following code snippet demonstrates how to update the workspace to set a build compute. Replace mycomputecluster with the name of the cluster to use:

    from azureml.core import Workspace
    # Load workspace from an existing config file
    ws = Workspace.from_config()
    # Update the workspace to use an existing compute cluster
    ws.update(image_build_compute = 'mycomputecluster')
    # To switch back to using ACR to build (if ACR is not in the VNet):
    # ws.update(image_build_compute = '')
    

    Important

    Your storage account, compute cluster, and Azure Container Registry must all be in the same subnet of the virtual network.

    For more information, see the update() method reference.

Tip

When ACR is behind a VNet, you can also disable public access to it.

Datastores and datasets

The following table lists the services that you need to skip validation for:

Service Skip validation required?
Azure Blob storage Yes
Azure File share Yes
Azure Data Lake Store Gen1 No
Azure Data Lake Store Gen2 No
Azure SQL Database Yes
PostgreSql Yes

Note

Azure Data Lake Store Gen1 and Azure Data Lake Store Gen2 skip validation by default, so you don't have to do anything.

The following code sample creates a new Azure Blob datastore and sets skip_validation=True.

blob_datastore = Datastore.register_azure_blob_container(workspace=ws,  

                                                         datastore_name=blob_datastore_name,  

                                                         container_name=container_name,  

                                                         account_name=account_name, 

                                                         account_key=account_key, 

                                                         skip_validation=True ) // Set skip_validation to true

Use datasets

The syntax to skip dataset validation is similar for the following dataset types:

  • Delimited file
  • JSON
  • Parquet
  • SQL
  • File

The following code creates a new JSON dataset and sets validate=False.

json_ds = Dataset.Tabular.from_json_lines_files(path=datastore_paths, 

validate=False) 

Securely connect to your workspace

To connect to a workspace that's secured behind a VNet, use one of the following methods:

  • Azure VPN gateway - Connects on-premises networks to the VNet over a private connection. Connection is made over the public internet. There are two types of VPN gateways that you might use:

    • Point-to-site: Each client computer uses a VPN client to connect to the VNet.
    • Site-to-site: A VPN device connects the VNet to your on-premises network.
  • ExpressRoute - Connects on-premises networks into the cloud over a private connection. Connection is made using a connectivity provider.

  • Azure Bastion - In this scenario, you create an Azure Virtual Machine (sometimes called a jump box) inside the VNet. You then connect to the VM using Azure Bastion. Bastion allows you to connect to the VM using either an RDP or SSH session from your local web browser. You then use the jump box as your development environment. Since it is inside the VNet, it can directly access the workspace. For an example of using a jump box, see Tutorial: Create a secure workspace.

Important

When using a VPN gateway or ExpressRoute, you will need to plan how name resolution works between your on-premises resources and those in the VNet. For more information, see Use a custom DNS server.

Workspace diagnostics

From Azure Machine Learning studio, you can run diagnostics on your workspace to check your setup. To run diagnostics, select the '?' icon from the upper right corner of the page. Then select Run workspace diagnostics.

Screenshot of the workspace diagnostics button

After diagnostics run, a list of any detected problems is returned. This list includes links to possible solutions.

Next steps

This article is part of a series on securing an Azure Machine Learning workflow. See the other articles in this series: