Manage an Azure Machine Learning compute instance

APPLIES TO: Azure CLI ml extension v2 (current) Python SDK azure-ai-ml v2 (current)

Learn how to manage a compute instance in your Azure Machine Learning workspace.

Use a compute instance as your fully configured and managed development environment in the cloud. For development and testing, you can also use the instance as a training compute target. A compute instance can run multiple jobs in parallel and has a job queue. As a development environment, a compute instance can't be shared with other users in your workspace.

In this article, you learn how to start, stop, restart, delete a compute instance. See Create an Azure Machine Learning compute instance to learn how to create a compute instance.

Note

This article shows CLI v2 in the sections below. If you are still using CLI v1, see Create an Azure Machine Learning compute cluster CLI v1.

Prerequisites

  • An Azure Machine Learning workspace. For more information, see Create an Azure Machine Learning workspace. In the storage account, the "Allow storage account key access" option must be enabled for compute instance creation to be successful.

  • The Azure CLI extension for Machine Learning service (v2), Azure Machine Learning Python SDK (v2), or the Azure Machine Learning Visual Studio Code extension.

  • If using the Python SDK, set up your development environment with a workspace. Once your environment is set up, attach to the workspace in your Python script:

    APPLIES TO: Python SDK azure-ai-ml v2 (current)

    Run this code to connect to your Azure ML workspace.

    Replace your Subscription ID, Resource Group name and Workspace name in the code below. To find these values:

    1. Sign in to Azure Machine Learning studio.
    2. Open the workspace you wish to use.
    3. In the upper right Azure Machine Learning studio toolbar, select your workspace name.
    4. Copy the value for workspace, resource group and subscription ID into the code.
    5. If you're using a notebook inside studio, you'll need to copy one value, close the area and paste, then come back for the next one.
    # Enter details of your AML workspace
    subscription_id = "<SUBSCRIPTION_ID>"
    resource_group = "<RESOURCE_GROUP>"
    workspace = "<AML_WORKSPACE_NAME>"
    # get a handle to the workspace
    from azure.ai.ml import MLClient
    from azure.identity import DefaultAzureCredential
    
    ml_client = MLClient(
        DefaultAzureCredential(), subscription_id, resource_group, workspace
    )

    ml_client is a handler to the workspace that you'll use to manage other resources and jobs.

Manage

Start, stop, restart, and delete a compute instance. A compute instance doesn't always automatically scale down, so make sure to stop the resource to prevent ongoing charges. Stopping a compute instance deallocates it. Then start it again when you need it. While stopping the compute instance stops the billing for compute hours, you'll still be billed for disk, public IP, and standard load balancer.

You can enable automatic shutdown to automatically stop the compute instance after a specified time.

You can also create a schedule for the compute instance to automatically start and stop based on a time and day of week.

Tip

The compute instance has 120GB OS disk. If you run out of disk space, use the terminal to clear at least 1-2 GB before you stop or restart the compute instance. Please do not stop the compute instance by issuing sudo shutdown from the terminal. The temp disk size on compute instance depends on the VM size chosen and is mounted on /mnt.

APPLIES TO: Python SDK azure-ai-ml v2 (current)

In the examples below, the name of the compute instance is stored in the variable ci_basic_name.

  • Get status

    from azure.ai.ml.entities import ComputeInstance, AmlCompute
    
    # Get compute
    ci_basic_state = ml_client.compute.get(ci_basic_name)
  • Stop

    from azure.ai.ml.entities import ComputeInstance, AmlCompute
    
    # Stop compute
    ml_client.compute.begin_stop(ci_basic_name).wait()
  • Start

    from azure.ai.ml.entities import ComputeInstance, AmlCompute
    
    # Start compute
    ml_client.compute.begin_start(ci_basic_name).wait()
  • Restart

    from azure.ai.ml.entities import ComputeInstance, AmlCompute
    
    # Restart compute
    ml_client.compute.begin_restart(ci_basic_name).wait()
  • Delete

    from azure.ai.ml.entities import ComputeInstance, AmlCompute
    
    ml_client.compute.begin_delete(ci_basic_name).wait()

Azure RBAC allows you to control which users in the workspace can create, delete, start, stop, restart a compute instance. All users in the workspace contributor and owner role can create, delete, start, stop, and restart compute instances across the workspace. However, only the creator of a specific compute instance, or the user assigned if it was created on their behalf, is allowed to access Jupyter, JupyterLab, and RStudio on that compute instance. A compute instance is dedicated to a single user who has root access. That user has access to Jupyter/JupyterLab/RStudio running on the instance. Compute instance will have single-user sign-in and all actions will use that user's identity for Azure RBAC and attribution of experiment jobs. SSH access is controlled through public/private key mechanism.

These actions can be controlled by Azure RBAC:

  • Microsoft.MachineLearningServices/workspaces/computes/read
  • Microsoft.MachineLearningServices/workspaces/computes/write
  • Microsoft.MachineLearningServices/workspaces/computes/delete
  • Microsoft.MachineLearningServices/workspaces/computes/start/action
  • Microsoft.MachineLearningServices/workspaces/computes/stop/action
  • Microsoft.MachineLearningServices/workspaces/computes/restart/action
  • Microsoft.MachineLearningServices/workspaces/computes/updateSchedules/action

To create a compute instance, you'll need permissions for the following actions:

  • Microsoft.MachineLearningServices/workspaces/computes/write
  • Microsoft.MachineLearningServices/workspaces/checkComputeNameAvailability/action

Audit and observe compute instance version

Once a compute instance is deployed, it does not get automatically updated. Microsoft releases new VM images on a monthly basis. To understand options for keeping recent with the latest version, see vulnerability management.

To keep track of whether an instance's operating system version is current, you could query its version using the CLI, SDK or Studio UI.

APPLIES TO: Python SDK azure-ai-ml v2 (current)

from azure.ai.ml.entities import ComputeInstance, AmlCompute

# Display operating system version
instance = ml_client.compute.get("myci")
print instance.os_image_metadata

For more information on the classes, methods, and parameters used in this example, see the following reference documents:

IT administrators can use Azure Policy to monitor the inventory of instances across workspaces in Azure Policy compliance portal. Assign the built-in policy Audit Azure Machine Learning Compute Instances with an outdated operating system on an Azure subscription or Azure management group scope.

Next steps