Set up a Python development environment for Azure Machine Learning

Learn how to configure a Python development environment for Azure Machine Learning.

The following table shows each development environment covered in this article, along with pros and cons.

Environment Pros Cons
Local environment Full control of your development environment and dependencies. Run with any build tool, environment, or IDE of your choice. Takes longer to get started. Necessary SDK packages must be installed, and an environment must also be installed if you don't already have one.
The Data Science Virtual Machine (DSVM) Similar to the cloud-based compute instance (Python and the SDK are pre-installed), but with additional popular data science and machine learning tools pre-installed. Easy to scale and combine with other custom tools and workflows. A slower getting started experience compared to the cloud-based compute instance.
Azure Machine Learning compute instance Easiest way to get started. The entire SDK is already installed in your workspace VM, and notebook tutorials are pre-cloned and ready to run. Lack of control over your development environment and dependencies. Additional cost incurred for Linux VM (VM can be stopped when not in use to avoid charges). See pricing details.
Azure Databricks Ideal for running large-scale intensive machine learning workflows on the scalable Apache Spark platform. Overkill for experimental machine learning, or smaller-scale experiments and workflows. Additional cost incurred for Azure Databricks. See pricing details.

This article also provides additional usage tips for the following tools:

  • Jupyter Notebooks: If you're already using Jupyter Notebooks, the SDK has some extras that you should install.

  • Visual Studio Code: If you use Visual Studio Code, the Azure Machine Learning extension includes extensive language support for Python as well as features to make working with the Azure Machine Learning much more convenient and productive.

Prerequisites

Local and DSVM only: Create a workspace configuration file

The workspace configuration file is a JSON file that tells the SDK how to communicate with your Azure Machine Learning workspace. The file is named config.json, and it has the following format:

{
    "subscription_id": "<subscription-id>",
    "resource_group": "<resource-group>",
    "workspace_name": "<workspace-name>"
}

This JSON file must be in the directory structure that contains your Python scripts or Jupyter Notebooks. It can be in the same directory, a subdirectory named .azureml, or in a parent directory.

To use this file from your code, use the Workspace.from_config method. This code loads the information from the file and connects to your workspace.

Create a workspace configuration file in one of the following methods:

  • Azure portal

    Download the file: In the Azure portal, select Download config.json from the Overview section of your workspace.

    Azure portal

  • Azure Machine Learning Python SDK

    Create a script to connect to your Azure Machine Learning workspace and use the write_config method to generate your file and save it as .azureml/config.json. Make sure to replace subscription_id,resource_group, and workspace_name with your own.

    from azureml.core import Workspace
    
    subscription_id = '<subscription-id>'
    resource_group  = '<resource-group>'
    workspace_name  = '<workspace-name>'
    
    try:
        ws = Workspace(subscription_id = subscription_id, resource_group = resource_group, workspace_name = workspace_name)
        ws.write_config()
        print('Library configuration succeeded')
    except:
        print('Workspace not found')
    

Local computer or remote VM environment

You can set up an environment on a local computer or remote virtual machine, such as an Azure Machine Learning compute instance or Data Science VM.

To configure a local development environment or remote VM:

  1. Create a Python virtual environment (virtualenv, conda).

    Note

    Although not required, it's recommended you use Anaconda or Miniconda to manage Python virtual environments and install packages.

    Important

    If you're on Linux or macOS and use a shell other than bash (for example, zsh) you might receive errors when you run some commands. To work around this problem, use the bash command to start a new bash shell and run the commands there.

  2. Activate your newly created Python virtual environment.

  3. Install the Azure Machine Learning Python SDK.

  4. To to configure your local environment to use your Azure Machine Learning workspace, create a workspace configuration file or use an existing one.

Now that you have your local environment set up, you're ready to start working with Azure Machine Learning. See the Azure Machine Learning Python getting started guide to get started.

Jupyter Notebooks

When running a local Jupyter Notebook server, it's recommended that you create an IPython kernel for your Python virtual environment. This helps ensure the expected kernel and package import behavior.

  1. Enable environment-specific IPython kernels

    conda install notebook ipykernel
    
  2. Create a kernel for your Python virtual environment. Make sure to replace <myenv> with the name of your Python virtual environment.

    ipython kernel install --user --name <myenv> --display-name "Python (myenv)"
    
  3. Launch the Jupyter Notebook server

See the Azure Machine Learning notebooks repository to get started with Azure Machine Learning and Jupyter Notebooks.

Note

A community-driven repository of examples can be found at https://github.com/Azure/azureml-examples.

Visual Studio Code

To use Visual Studio Code for development:

  1. Install Visual Studio Code.
  2. Install the Azure Machine Learning Visual Studio Code extension (preview).

Once you have the Visual Studio Code extension installed, you can manage your Azure Machine Learning resources, run and debug experiments, and deploy trained models.

Azure Machine Learning compute instance

The Azure Machine Learning compute instance is a secure, cloud-based Azure workstation that provides data scientists with a Jupyter Notebook server, JupyterLab, and a fully managed machine learning environment.

There is nothing to install or configure for a compute instance.

Create one anytime from within your Azure Machine Learning workspace. Provide just a name and specify an Azure VM type. Try it now with this Tutorial: Setup environment and workspace.

To learn more about compute instances, including how to install packages, see Create and manage an Azure Machine Learning compute instance.

Tip

To prevent incurring charges for an unused compute instance, stop the compute instance.

In addition to a Jupyter Notebook server and JupyterLab, you can use compute instances in the integrated notebook feature inside of Azure Machine Learning studio.

You can also use the Azure Machine Learning Visual Studio Code extension to configure an Azure Machine Learning compute instance as a remote Jupyter Notebook server.

Data Science Virtual Machine

The Data Science VM is a customized virtual machine (VM) image you can use as a development environment. It's designed for data science work that's pre-configured tools and software like:

  • Packages such as TensorFlow, PyTorch, Scikit-learn, XGBoost, and the Azure Machine Learning SDK
  • Popular data science tools such as Spark Standalone and Drill
  • Azure tools such as the Azure CLI, AzCopy, and Storage Explorer
  • Integrated development environments (IDEs) such as Visual Studio Code and PyCharm
  • Jupyter Notebook Server

For a more comprehensive list of the tools, see the Data Science VM tools guide.

Important

If you plan to use the Data Science VM as a compute target for your training or inferencing jobs, only Ubuntu is supported.

To use the Data Science VM as a development environment:

  1. Create a Data Science VM using one of the following methods:

    • Use the Azure portal to create an Ubuntu or Windows DSVM.

    • Create a Data Science VM using ARM templates.

    • Use the Azure CLI

      To create an Ubuntu Data Science VM, use the following command:

      # create a Ubuntu Data Science VM in your resource group
      # note you need to be at least a contributor to the resource group in order to execute this command successfully
      # If you need to create a new resource group use: "az group create --name YOUR-RESOURCE-GROUP-NAME --location YOUR-REGION (For example: westus2)"
      az vm create --resource-group YOUR-RESOURCE-GROUP-NAME --name YOUR-VM-NAME --image microsoft-dsvm:linux-data-science-vm-ubuntu:linuxdsvmubuntu:latest --admin-username YOUR-USERNAME --admin-password YOUR-PASSWORD --generate-ssh-keys --authentication-type password
      

      To create a Windows DSVM, use the following command:

      # create a Windows Server 2016 DSVM in your resource group
      # note you need to be at least a contributor to the resource group in order to execute this command successfully
      az vm create --resource-group YOUR-RESOURCE-GROUP-NAME --name YOUR-VM-NAME --image microsoft-dsvm:dsvm-windows:server-2016:latest --admin-username YOUR-USERNAME --admin-password YOUR-PASSWORD --authentication-type password
      
  2. Activate the conda environment containing the Azure Machine Learning SDK.

    • For Ubuntu Data Science VM:

      conda activate py36
      
    • For Windows Data Science VM:

      conda activate AzureML
      
  3. To configure the Data Science VM to use your Azure Machine Learning workspace, create a workspace configuration file or use an existing one.

Similar to local environments, you can use Visual Studio Code and the Azure Machine Learning Visual Studio Code extension to interact with Azure Machine Learning.

For more information, see Data Science Virtual Machines.

Next steps