What is an Azure Machine Learning compute instance?
An Azure Machine Learning compute instance is a managed cloud-based workstation for data scientists.
Compute instances make it easy to get started with Azure Machine Learning development as well as provide management and enterprise readiness capabilities for IT administrators.
Use a compute instance as your fully configured and managed development environment in the cloud for machine learning. They can also be used as a compute target for training and inferencing for development and testing purposes.
For production grade model training, use an Azure Machine Learning compute cluster with multi-node scaling capabilities. For production grade model deployment, use Azure Kubernetes Service cluster.
For compute instance Jupyter functionality to work, ensure that web socket communication is not disabled. Please ensure your network allows websocket connections to *.instances.azureml.net and *.instances.azureml.ms.
Why use a compute instance?
A compute instance is a fully managed cloud-based workstation optimized for your machine learning development environment. It provides the following benefits:
Key benefits | Description |
---|---|
Productivity | You can build and deploy models using integrated notebooks and the following tools in Azure Machine Learning studio: - Jupyter - JupyterLab - RStudio (preview) Compute instance is fully integrated with Azure Machine Learning workspace and studio. You can share notebooks and data with other data scientists in the workspace. You can also use VS Code with compute instances. |
Managed & secure | Reduce your security footprint and add compliance with enterprise security requirements. Compute instances provide robust management policies and secure networking configurations such as: - Autoprovisioning from Resource Manager templates or Azure Machine Learning SDK - Azure role-based access control (Azure RBAC) - Virtual network support - SSH policy to enable/disable SSH access TLS 1.2 enabled |
Preconfigured for ML | Save time on setup tasks with pre-configured and up-to-date ML packages, deep learning frameworks, GPU drivers. |
Fully customizable | Broad support for Azure VM types including GPUs and persisted low-level customization such as installing packages and drivers makes advanced scenarios a breeze. |
You can create a compute instance yourself, or an administrator can create a compute instance for you.
Tools and environments
Important
Items marked (preview) in this article are currently in public preview. The preview version is provided without a service level agreement, and it's not recommended for production workloads. Certain features might not be supported or might have constrained capabilities. For more information, see Supplemental Terms of Use for Microsoft Azure Previews.
Azure Machine Learning compute instance enables you to author, train, and deploy models in a fully integrated notebook experience in your workspace.
You can run Jupyter notebooks in VS Code using compute instance as the remote server with no SSH needed. You can also enable VS Code integration through remote SSH extension.
You can install packages and add kernels to your compute instance.
Following tools and environments are already installed on the compute instance:
General tools & environments | Details |
---|---|
Drivers | CUDA cuDNN NVIDIA Blob FUSE |
Intel MPI library | |
Azure CLI | |
Azure Machine Learning samples | |
Docker | |
Nginx | |
NCCL 2.0 | |
Protobuf |
R tools & environments | Details |
---|---|
RStudio Server Open Source Edition (preview) | |
R kernel | |
Azure Machine Learning SDK for R | azuremlsdkSDK samples |
PYTHON tools & environments | Details |
---|---|
Anaconda Python | |
Jupyter and extensions | |
Jupyterlab and extensions | |
Azure Machine Learning SDK for Pythonfrom PyPI | Includes most of the azureml extra packages. To see the full list, open a terminal window on your compute instance and run conda list -n azureml_py36 azureml* |
Other PyPI packages | jupytext tensorboard nbconvert notebook Pillow |
Conda packages | cython numpy ipykernel scikit-learn matplotlib tqdm joblib nodejs nb_conda_kernels |
Deep learning packages | PyTorch TensorFlow Keras Horovod MLFlow pandas-ml scrapbook |
ONNX packages | keras2onnx onnx onnxconverter-common skl2onnx onnxmltools |
Azure Machine Learning Python & R SDK samples |
Python packages are all installed in the Python 3.6 - AzureML environment.
Accessing files
Notebooks and R scripts are stored in the default storage account of your workspace in Azure file share. These files are located under your “User files” directory. This storage makes it easy to share notebooks between compute instances. The storage account also keeps your notebooks safely preserved when you stop or delete a compute instance.
The Azure file share account of your workspace is mounted as a drive on the compute instance. This drive is the default working directory for Jupyter, Jupyter Labs, and RStudio. This means that the notebooks and other files you create in Jupyter, JupyterLab, or RStudio are automatically stored on the file share and available to use in other compute instances as well.
The files in the file share are accessible from all compute instances in the same workspace. Any changes to these files on the compute instance will be reliably persisted back to the file share.
You can also clone the latest Azure Machine Learning samples to your folder under the user files directory in the workspace file share.
Writing small files can be slower on network drives than writing to the compute instance local disk itself. If you are writing many small files, try using a directory directly on the compute instance, such as a /tmp
directory. Note these files will not be accessible from other compute instances.
You can use the /tmp
directory on the compute instance for your temporary data. However, do not write large files of data on the OS disk of the compute instance. Use datastores instead. If you have installed JupyterLab git extension, it can also lead to slow down in compute instance performance.
Managing a compute instance
In your workspace in Azure Machine Learning studio, select Compute, then select Compute Instance on the top.
You can perform the following actions:
- Create a compute instance.
- Refresh the compute instances tab.
- Start, stop, and restart a compute instance. You do pay for the instance whenever it is running. Stop the compute instance when you are not using it to reduce cost. Stopping a compute instance deallocates it. Then start it again when you need it. Please note stopping the compute instance stops the billing for compute hours but you will still be billed for disk, public IP, and standard load balancer.
- Delete a compute instance.
- Filter the list of compute instanced to show only those you have created.
For each compute instance in your workspace that you can use, you can:
- Access Jupyter, JupyterLab, RStudio on the compute instance
- SSH into compute instance. SSH access is disabled by default but can be enabled at compute instance creation time. SSH access is through public/private key mechanism. The tab will give you details for SSH connection such as IP address, username, and port number.
- Get details about a specific compute instance such as IP address, and region.
Azure RBAC allows you to control which users in the workspace can create, delete, start, stop, restart a compute instance. All users in the workspace contributor and owner role can create, delete, start, stop, and restart compute instances across the workspace. However, only the creator of a specific compute instance, or the user assigned if it was created on their behalf, is allowed to access Jupyter, JupyterLab, and RStudio on that compute instance. A compute instance is dedicated to a single user who has root access, and can terminal in through Jupyter/JupyterLab/RStudio. Compute instance will have single-user log in and all actions will use that user’s identity for Azure RBAC and attribution of experiment runs. SSH access is controlled through public/private key mechanism.
These actions can be controlled by Azure RBAC:
- Microsoft.MachineLearningServices/workspaces/computes/read
- Microsoft.MachineLearningServices/workspaces/computes/write
- Microsoft.MachineLearningServices/workspaces/computes/delete
- Microsoft.MachineLearningServices/workspaces/computes/start/action
- Microsoft.MachineLearningServices/workspaces/computes/stop/action
- Microsoft.MachineLearningServices/workspaces/computes/restart/action
To create a compute instance you need to have permissions for the following actions:
- Microsoft.MachineLearningServices/workspaces/computes/write
- Microsoft.MachineLearningServices/workspaces/checkComputeNameAvailability/action
Create a compute instance
In your workspace in Azure Machine Learning studio, create a new compute instance from either the Compute section or in the Notebooks section when you are ready to run one of your notebooks.
You can also create an instance
- Directly from the integrated notebooks experience
- In Azure portal
- From Azure Resource Manager template. For an example template, see the create an Azure Machine Learning compute instance template.
- With Azure Machine Learning SDK
- From the CLI extension for Azure Machine Learning
The dedicated cores per region per VM family quota and total regional quota, which applies to compute instance creation, is unified and shared with Azure Machine Learning training compute cluster quota. Stopping the compute instance does not release quota to ensure you will be able to restart the compute instance.
Create on behalf of (preview)
As an administrator, you can create a compute instance on behalf of a data scientist and assign the instance to them with:
- Azure Resource Manager template. For details on how to find the TenantID and ObjectID needed in this template, see Find identity object IDs for authentication configuration. You can also find these values in the Azure Active Directory portal.
- REST API
The data scientist you create the compute instance for needs the following Azure RBAC permissions:
- Microsoft.MachineLearningServices/workspaces/computes/start/action
- Microsoft.MachineLearningServices/workspaces/computes/stop/action
- Microsoft.MachineLearningServices/workspaces/computes/restart/action
- Microsoft.MachineLearningServices/workspaces/computes/applicationaccess/action
The data scientist can start, stop, and restart the compute instance. They can use the compute instance for:
- Jupyter
- JupyterLab
- RStudio
- Integrated notebooks
Compute target
Compute instances can be used as a training compute target similar to Azure Machine Learning compute training clusters.
A compute instance:
- Has a job queue.
- Runs jobs securely in a virtual network environment, without requiring enterprises to open up SSH port. The job executes in a containerized environment and packages your model dependencies in a Docker container.
- Can run multiple small jobs in parallel (preview). Two jobs per core can run in parallel while the rest of the jobs are queued.
- Supports single-node multi-GPU distributed training jobs
You can use compute instance as a local inferencing deployment target for test/debug scenarios.
What happened to Notebook VM?
Compute instances are replacing the Notebook VM.
Any notebook files stored in the workspace file share and data in workspace data stores will be accessible from a compute instance. However, any custom packages previously installed on a Notebook VM will need to be reinstalled on the compute instance. Quota limitations, which apply to compute clusters creation will apply to compute instance creation as well.
New Notebook VMs cannot be created. However, you can still access and use Notebook VMs you have created, with full functionality. Compute instances can be created in same workspace as the existing Notebook VMs.
Next steps
- Create and manage a compute instance
- Tutorial: Train your first ML model shows how to use a compute instance with an integrated notebook.