Tutorial: Get started with Azure Machine Learning in your development environment (part 1 of 4)
In this four-part tutorial series, you'll learn the fundamentals of Azure Machine Learning and complete jobs-based Python machine learning tasks on the Azure cloud platform.
In part 1 of this tutorial series, you will:
- Install the Azure Machine Learning SDK.
- Set up the directory structure for code.
- Create an Azure Machine Learning workspace.
- Configure your local development environment.
- Set up a compute cluster.
This tutorial series focuses on the Azure Machine Learning concepts required to submit batch jobs - this is where the code is submitted to the cloud to run in the background without any user interaction. This is useful for finished scripts or code you wish to run repeatedly, or for compute-intensive machine learning tasks. If you are more interested in an exploratory workflow, you could instead use Jupyter or RStudio on an Azure Machine Learning compute instance.
- An Azure subscription. If you don't have an Azure subscription, create a free account before you begin. Try Azure Machine Learning.
- Anaconda or Miniconda to manage Python virtual environments and install packages.
- If you're not familiar with using conda, see Getting started with conda.
Install the Azure Machine Learning SDK
Throughout this tutorial, you will use the Azure Machine Learning SDK for Python. To avoid Python dependency issues, you'll create an isolated environment. This tutorial series uses conda to create that environment. If you prefer to use other solutions, such as
virtualenv, or docker, make sure you use a Python version >=3.5 and < 3.9.
Check if you have conda installed on your system:
If this command returns a
conda not found error, download and install Miniconda.
Once you have installed Conda, use a terminal or Anaconda Prompt window to create a new environment:
conda create -n tutorial python=3.8
Next, install the Azure Machine Learning SDK into the conda environment you created:
conda activate tutorial pip install azureml-core
It takes approximately 2 minutes for the Azure Machine Learning SDK install to complete.
If you get a timeout error, try
pip install --default-timeout=100 azureml-core intstead.
Create a directory structure for code
We recommend that you set up the following simple directory structure for this tutorial:
tutorial: Top-level directory of the project.
.azureml: Hidden subdirectory for storing Azure Machine Learning configuration files.
You can create the hidden .azureml subdirectory in a terminal window. Or use the following:
- In a Mac Finder window use Command + Shift + . to toggle the ability to see and create directories that begin with a dot.
- In a Windows 10 File Explorer, see how to view hidden files and folders.
- In the Linux Graphical Interface, use Ctrl + h or the View menu and check the box to Show hidden files.
Create an Azure Machine Learning workspace
A workspace is a top-level resource for Azure Machine Learning and is a centralized place to:
- Manage resources such as compute.
- Store assets like notebooks, environments, datasets, pipelines, models, and endpoints.
- Collaborate with other team members.
In the top-level directory,
tutorial, add a new Python file called
01-create-workspace.py by using the following code. Adapt the parameters (name, subscription ID, resource group, and location) with your preferences.
You can run the code in an interactive session or as a Python file.
When you're using a local development environment (for example, your computer), you'll be asked to authenticate to your workspace by using a device code the first time you run the following code. Follow the on-screen instructions.
# tutorial/01-create-workspace.py from azureml.core import Workspace ws = Workspace.create(name='<my_workspace_name>', # provide a name for your workspace subscription_id='<azure-subscription-id>', # provide your subscription ID resource_group='<myresourcegroup>', # provide a resource group name create_resource_group=True, location='<NAME_OF_REGION>') # For example: 'westeurope' or 'eastus2' or 'westus2' or 'southeastasia'. # write out the workspace details to a configuration file: .azureml/config.json ws.write_config(path='.azureml')
In the window that has the activated tutorial1 conda environment, run this code from the
cd <path/to/tutorial> python ./01-create-workspace.py
If running this code gives you an error that you do not have access to the subscription, see Create a workspace for information on authentication options.
After you've successfully run 01-create-workspace.py, your folder structure will look like:
tutorial └──.azureml | └──config.json └──01-create-workspace.py
.azureml/config.json contains the metadata necessary to connect to your Azure Machine Learning
workspace. Namely, it contains your subscription ID, resource group, and workspace name.
The contents of
config.json are not secrets. It's fine to share these details.
Authentication is still required to interact with your Azure Machine Learning workspace.
Create an Azure Machine Learning compute cluster
Create a Python script in the
tutorial top-level directory called
02-create-compute.py. Populate it with the following code to create an Azure Machine Learning compute cluster that will autoscale between zero and four nodes:
# tutorial/02-create-compute.py from azureml.core import Workspace from azureml.core.compute import ComputeTarget, AmlCompute from azureml.core.compute_target import ComputeTargetException ws = Workspace.from_config() # This automatically looks for a directory .azureml # Choose a name for your CPU cluster cpu_cluster_name = "cpu-cluster" # Verify that the cluster does not exist already try: cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name) print('Found existing cluster, use it.') except ComputeTargetException: compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2', idle_seconds_before_scaledown=2400, min_nodes=0, max_nodes=4) cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config) cpu_cluster.wait_for_completion(show_output=True)
In the window that has the activated tutorial1 conda environment, run the Python file:
When the cluster is created, it will have 0 nodes provisioned. The cluster does not incur costs until you submit a job. This cluster will scale down when it has been idle for 2,400 seconds (40 minutes).
Your folder structure will now look as follows:
tutorial └──.azureml | └──config.json └──01-create-workspace.py └──02-create-compute.py
View in the studio
Sign in to Azure Machine Learning studio to view the workspace and compute instance you created.
- Select the Subscription you used to create the workspace.
- Select the Machine Learning workspace you created, tutorial-ws.
- Once the workspace loads, on the left side, select Compute.
- At the top, select the Compute clusters tab.
This view shows the provisioned compute cluster, along with the number of idle nodes, busy nodes, and unprovisioned nodes. Since you haven't used the cluster yet, all the nodes are currently unprovisioned.
In this setup tutorial, you have:
- Created an Azure Machine Learning workspace.
- Set up your local development environment.
- Created an Azure Machine Learning compute cluster.
In the other parts of this tutorial you will learn:
- Part 2. Run code in the cloud by using the Azure Machine Learning SDK for Python.
- Part 3. Manage the Python environment that you use for model training.
- Part 4. Upload data to Azure and consume that data in training.
Continue to the next tutorial, to walk through submitting a script to the Azure Machine Learning compute cluster.