Create and manage an Azure Machine Learning compute instance

Learn how to create and manage a compute instance in your Azure Machine Learning workspace.

Use a compute instance as your fully configured and managed development environment in the cloud. For development and testing, you can also use the instance as a training compute target or for an inference target. A compute instance can run multiple jobs in parallel and has a job queue. As a development environment, a compute instance cannot be shared with other users in your workspace.

In this article, you learn how to:

Compute instances can run jobs securely in a virtual network environment, without requiring enterprises to open up SSH ports. The job executes in a containerized environment and packages your model dependencies in a Docker container.

Prerequisites

Create

Important

Items marked (preview) below are currently in public preview. The preview version is provided without a service level agreement, and it's not recommended for production workloads. Certain features might not be supported or might have constrained capabilities. For more information, see Supplemental Terms of Use for Microsoft Azure Previews.

Time estimate: Approximately 5 minutes.

Creating a compute instance is a one time process for your workspace. You can reuse the compute as a development workstation or as a compute target for training. You can have multiple compute instances attached to your workspace.

The dedicated cores per region per VM family quota and total regional quota, which applies to compute instance creation, is unified and shared with Azure Machine Learning training compute cluster quota. Stopping the compute instance does not release quota to ensure you will be able to restart the compute instance. It is not possible to change the virtual machine size of compute instance once it is created.

The following example demonstrates how to create a compute instance:

import datetime
import time

from azureml.core.compute import ComputeTarget, ComputeInstance
from azureml.core.compute_target import ComputeTargetException

# Choose a name for your instance
# Compute instance name should be unique across the azure region
compute_name = "ci{}".format(ws._workspace_id)[:10]

# Verify that instance does not exist already
try:
    instance = ComputeInstance(workspace=ws, name=compute_name)
    print('Found existing instance, use it.')
except ComputeTargetException:
    compute_config = ComputeInstance.provisioning_configuration(
        vm_size='STANDARD_D3_V2',
        ssh_public_access=False,
        # vnet_resourcegroup_name='<my-resource-group>',
        # vnet_name='<my-vnet-name>',
        # subnet_name='default',
        # admin_user_ssh_public_key='<my-sshkey>'
    )
    instance = ComputeInstance.create(ws, compute_name, compute_config)
    instance.wait_for_completion(show_output=True)

For more information on the classes, methods, and parameters used in this example, see the following reference documents:

You can also create a compute instance with an Azure Resource Manager template.

Enable SSH access

SSH access is disabled by default. SSH access cannot be changed after creation. Make sure to enable access if you plan to debug interactively with VS Code Remote.

After you have selected Next: Advanced Settings:

  1. Turn on Enable SSH access.
  2. In the SSH public key source, select one of the options from the dropdown:
    • If you Generate new key pair:
      1. Enter a name for the key in Key pair name.
      2. Select Create.
      3. Select Download private key and create compute. The key is usually downloaded into the Downloads folder.
    • If you select Use existing public key stored in Azure, search for and select the key in Stored key.
    • If you select Use existing public key, provide an RSA public key in the single-line format (starting with "ssh-rsa") or the multi-line PEM format. You can generate SSH keys using ssh-keygen on Linux and OS X, or PuTTYGen on Windows.

Once the compute instance is created and running, see Connect with SSH access.

Create on behalf of (preview)

As an administrator, you can create a compute instance on behalf of a data scientist and assign the instance to them with:

The data scientist you create the compute instance for needs the following be Azure role-based access control (Azure RBAC) permissions:

  • Microsoft.MachineLearningServices/workspaces/computes/start/action
  • Microsoft.MachineLearningServices/workspaces/computes/stop/action
  • Microsoft.MachineLearningServices/workspaces/computes/restart/action
  • Microsoft.MachineLearningServices/workspaces/computes/applicationaccess/action
  • Microsoft.MachineLearningServices/workspaces/computes/updateSchedules/action

The data scientist can start, stop, and restart the compute instance. They can use the compute instance for:

  • Jupyter
  • JupyterLab
  • RStudio
  • Integrated notebooks

Schedule automatic start and stop (preview)

Define multiple schedules for auto-shutdown and auto-start. For instance, create a schedule to start at 9 AM and stop at 6 PM from Monday-Thursday, and a second schedule to start at 9 AM and stop at 4 PM for Friday. You can create a total of four schedules per compute instance.

Schedules can also be defined for create on behalf of compute instances. You can create schedule to create a compute instance in a stopped state. This is particularly useful when a user creates a compute instance on behalf of another user.

Create a schedule in studio

  1. Fill out the form.

  2. On the second page of the form, open Show advanced settings.

  3. Select Add schedule to add a new schedule.

    Screenshot: Add schedule in advanced settings.

  4. Select Start compute instance or Stop compute instance.

  5. Select the Time zone.

  6. Select the Startup time or Shutdown time.

  7. Select the days when this schedule is active.

    Screenshot: schedule a compute instance to shut down.

  8. Select Add schedule again if you want to create another schedule.

Once the compute instance is created, you can view, edit, or add new schedules from the compute instance details section. Please note timezone labels don't account for day light savings. For instance, (UTC+01:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna is actually UTC+02:00 during day light savings.

Create a schedule with a Resource Manager template

You can schedule the automatic start and stop of a compute instance by using a Resource Manager template.

In the Resource Manager template, add:

"schedules": "[parameters('schedules')]"

Then use either cron or LogicApps expressions to define the schedule that starts or stops the instance in your parameter file:

        "schedules": {
        "value": {
        "computeStartStop": [
          {
            "triggerType": "Cron",
            "cron": {              
              "timeZone": "UTC",
              "expression": "0 18 * * *"
            },
            "action": "Stop",
            "status": "Enabled"
          },
          {
            "triggerType": "Cron",
            "cron": {              
              "timeZone": "UTC",
              "expression": "0 8 * * *"
            },
            "action": "Start",
            "status": "Enabled"
          },
          { 
            "triggerType": "Recurrence", 
            "recurrence": { 
              "frequency": "Day", 
              "interval": 1, 
              "timeZone": "UTC", 
              "schedule": { 
                "hours": [17], 
                "minutes": [0]
              } 
            }, 
            "action": "Stop", 
            "status": "Enabled" 
          } 
        ]
      }
    }
  • Action can have value of “Start” or “Stop”.

  • For trigger type of Recurrence use the same syntax as logic app, with this recurrence schema.

  • For trigger type of cron, use standard cron syntax:

    // Crontab expression format: 
    // 
    // * * * * * 
    // - - - - - 
    // | | | | | 
    // | | | | +----- day of week (0 - 6) (Sunday=0) 
    // | | | +------- month (1 - 12) 
    // | | +--------- day of month (1 - 31) 
    // | +----------- hour (0 - 23) 
    // +------------- min (0 - 59) 
    // 
    // Star (*) in the value field above means all legal values as in 
    // braces for that column. The value column can have a * or a list 
    // of elements separated by commas. An element is either a number in 
    // the ranges shown above or two numbers in the range separated by a 
    // hyphen (meaning an inclusive range). 
    

Azure Policy support to default a schedule

Use Azure Policy to enforce a shutdown schedule exists for every compute instance in a subscription or default to a schedule if nothing exists. Following is a sample policy to default a shutdown schedule at 10 PM PST.

{
    "mode": "All",
    "policyRule": {
     "if": {
      "allOf": [
       {
        "field": "Microsoft.MachineLearningServices/workspaces/computes/computeType",
        "equals": "ComputeInstance"
       },
       {
        "field": "Microsoft.MachineLearningServices/workspaces/computes/schedules",
        "exists": "false"
       }
      ]
     },
     "then": {
      "effect": "append",
      "details": [
       {
        "field": "Microsoft.MachineLearningServices/workspaces/computes/schedules",
        "value": {
         "computeStartStop": [
          {
           "triggerType": "Cron",
           "cron": {
            "startTime": "2021-03-10T21:21:07",
            "timeZone": "Pacific Standard Time",
            "expression": "0 22 * * *"
           },
           "action": "Stop",
           "status": "Enabled"
          }
         ]
        }
       }
      ]
     }
    }
}    

Customize the compute instance with a script (preview)

Use a setup script for an automated way to customize and configure the compute instance at provisioning time. As an administrator, you can write a customization script to be used to provision all compute instances in the workspace according to your requirements.

Some examples of what you can do in a setup script:

  • Install packages, tools, and software
  • Mount data
  • Create custom conda environment and Jupyter kernels
  • Clone git repositories and set git config
  • Set network proxies
  • Set environment variables
  • Install JupyterLab extensions

Create the setup script

The setup script is a shell script, which runs as rootuser. Create or upload the script into your Notebooks files:

  1. Sign into the studio and select your workspace.
  2. On the left, select Notebooks
  3. Use the Add files tool to create or upload your setup shell script. Make sure the script filename ends in ".sh". When you create a new file, also change the File type to bash(.sh).

Create or upload your setup script to Notebooks file in studio

When the script runs, the current working directory of the script is the directory where it was uploaded. For example, if you upload the script to Users>admin, the location of the script on the compute instance and current working directory when the script runs is /home/azureuser/cloudfiles/code/Users/admin. This would enable you to use relative paths in the script.

Script arguments can be referred to in the script as $1, $2, etc.

If your script was doing something specific to azureuser such as installing conda environment or jupyter kernel, you will have to put it within sudo -u azureuser block like this

#!/bin/bash

set -e

# This script installs a pip package in compute instance azureml_py38 environment.

sudo -u azureuser -i <<'EOF'

PACKAGE=numpy
ENVIRONMENT=azureml_py38 
conda activate "$ENVIRONMENT"
pip install "$PACKAGE"
conda deactivate
EOF

The command sudo -u azureuser changes the current working directory to /home/azureuser. You also can't access the script arguments in this block.

For other example scripts, see azureml-examples.

You can also use the following environment variables in your script:

  1. CI_RESOURCE_GROUP
  2. CI_WORKSPACE
  3. CI_NAME
  4. CI_LOCAL_UBUNTU_USER. This points to azureuser

You can use setup script in conjunction with Azure Policy to either enforce or default a setup script for every compute instance creation. The default value for setup script timeout is 15 minutes. This can be changed through Studio UI or through ARM templates using the DURATION parameter. DURATION is a floating point number with an optional suffix: 's' for seconds (the default), 'm' for minutes, 'h' for hours or 'd' for days.

Use the script in the studio

Once you store the script, specify it during creation of your compute instance:

  1. Sign into the studio and select your workspace.
  2. On the left, select Compute.
  3. Select +New to create a new compute instance.
  4. Fill out the form.
  5. On the second page of the form, open Show advanced settings.
  6. Turn on Provision with setup script.
  7. Browse to the shell script you saved. Or upload a script from your computer.
  8. Add command arguments as needed.

Provisiona compute instance with a setup script in the studio.

If workspace storage is attached to a virtual network you might not be able to access the setup script file unless you are accessing the Studio from within virtual network.

Use script in a Resource Manager template

In a Resource Manager template, add setupScripts to invoke the setup script when the compute instance is provisioned. For example:

"setupScripts":{
    "scripts":{
        "creationScript":{
        "scriptSource":"workspaceStorage",
        "scriptData":"[parameters('creationScript.location')]",
        "scriptArguments":"[parameters('creationScript.cmdArguments')]"
        }
    }
}

scriptData above specifies the location of the creation script in the notebooks file share such as Users/admin/testscript.sh. scriptArguments is optional above and specifies the arguments for the creation script.

You could instead provide the script inline for a Resource Manager template. The shell command can refer to any dependencies uploaded into the notebooks file share. When you use an inline string, the working directory for the script is /mnt/batch/tasks/shared/LS_root/mounts/clusters/ciname/code/Users.

For example, specify a base64 encoded command string for scriptData:

"setupScripts":{
    "scripts":{
        "creationScript":{
        "scriptSource":"inline",
        "scriptData":"[base64(parameters('inlineCommand'))]",
        "scriptArguments":"[parameters('creationScript.cmdArguments')]"
        }
    }
}

Setup script logs

Logs from the setup script execution appear in the logs folder in the compute instance details page. Logs are stored back to your notebooks file share under the Logs<compute instance name> folder. Script file and command arguments for a particular compute instance are shown in the details page.

Manage

Start, stop, restart, and delete a compute instance. A compute instance does not automatically scale down, so make sure to stop the resource to prevent ongoing charges. Stopping a compute instance deallocates it. Then start it again when you need it. While stopping the compute instance stops the billing for compute hours, you will still be billed for disk, public IP, and standard load balancer.

You can create a schedule for the compute instance to automatically start and stop based on a time and day of week.

Tip

The compute instance has 120GB OS disk. If you run out of disk space, use the terminal to clear at least 1-2 GB before you stop or restart the compute instance. Please do not stop the compute instance by issuing sudo shutdown from the terminal. The temp disk size on compute instance depends on the VM size chosen and is mounted on /mnt.

In the examples below, the name of the compute instance is instance

  • Get status

    # get_status() gets the latest status of the ComputeInstance target
    instance.get_status()
    
  • Stop

    # stop() is used to stop the ComputeInstance
    # Stopping ComputeInstance will stop the billing meter and persist the state on the disk.
    # Available Quota will not be changed with this operation.
    instance.stop(wait_for_completion=True, show_output=True)
    
  • Start

    # start() is used to start the ComputeInstance if it is in stopped state
    instance.start(wait_for_completion=True, show_output=True)
    
  • Restart

    # restart() is used to restart the ComputeInstance
    instance.restart(wait_for_completion=True, show_output=True)
    
  • Delete

    # delete() is used to delete the ComputeInstance target. Useful if you want to re-use the compute name
    instance.delete(wait_for_completion=True, show_output=True)
    

Azure RBAC allows you to control which users in the workspace can create, delete, start, stop, restart a compute instance. All users in the workspace contributor and owner role can create, delete, start, stop, and restart compute instances across the workspace. However, only the creator of a specific compute instance, or the user assigned if it was created on their behalf, is allowed to access Jupyter, JupyterLab, and RStudio on that compute instance. A compute instance is dedicated to a single user who has root access, and can terminal in through Jupyter/JupyterLab/RStudio. Compute instance will have single-user log in and all actions will use that user’s identity for Azure RBAC and attribution of experiment runs. SSH access is controlled through public/private key mechanism.

These actions can be controlled by Azure RBAC:

  • Microsoft.MachineLearningServices/workspaces/computes/read
  • Microsoft.MachineLearningServices/workspaces/computes/write
  • Microsoft.MachineLearningServices/workspaces/computes/delete
  • Microsoft.MachineLearningServices/workspaces/computes/start/action
  • Microsoft.MachineLearningServices/workspaces/computes/stop/action
  • Microsoft.MachineLearningServices/workspaces/computes/restart/action
  • Microsoft.MachineLearningServices/workspaces/computes/updateSchedules/action

To create a compute instance you'll need permissions for the following actions:

  • Microsoft.MachineLearningServices/workspaces/computes/write
  • Microsoft.MachineLearningServices/workspaces/checkComputeNameAvailability/action

Next steps