Train models with Azure Machine Learning CLI, SDK, and REST API

APPLIES TO: Azure CLI ml extension v2 (current) Python SDK azure-ai-ml v2 (current)

Azure Machine Learning provides multiple ways to submit ML training jobs. In this article, you'll learn how to submit jobs using the following methods:

  • Azure CLI extension for machine learning: The ml extension, also referred to as CLI v2.
  • Python SDK v2 for Azure Machine Learning.
  • REST API: The API that the CLI and SDK are built on.

Prerequisites

To use the SDK information, install the Azure Machine Learning SDK v2 for Python.

Clone the examples repository

The code snippets in this article are based on examples in the Azure Machine Learning examples GitHub repo. To clone the repository to your development environment, use the following command:

git clone --depth 1 https://github.com/Azure/azureml-examples

Tip

Use --depth 1 to clone only the latest commit to the repository, which reduces time to complete the operation.

Example job

The examples in this article use the iris flower dataset to train an MLFlow model.

Train in the cloud

When training in the cloud, you must connect to your Azure Machine Learning workspace and select a compute resource that will be used to run the training job.

1. Connect to the workspace

Tip

Use the tabs below to select the method you want to use to train a model. Selecting a tab will automatically switch all the tabs in this article to the same tab. You can select another tab at any time.

To connect to the workspace, you need identifier parameters - a subscription, resource group, and workspace name. You'll use these details in the MLClient from the azure.ai.ml namespace to get a handle to the required Azure Machine Learning workspace. To authenticate, you use the default Azure authentication. Check this example for more details on how to configure credentials and connect to a workspace.

#import required libraries
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential

#Enter details of your Azure Machine Learning workspace
subscription_id = '<SUBSCRIPTION_ID>'
resource_group = '<RESOURCE_GROUP>'
workspace = '<AZUREML_WORKSPACE_NAME>'

#connect to the workspace
ml_client = MLClient(DefaultAzureCredential(), subscription_id, resource_group, workspace)

2. Create a compute resource for training

Note

To try serverless compute, skip this step and proceed to 3. Submit the training job.

An Azure Machine Learning compute cluster is a fully managed compute resource that can be used to run the training job. In the following examples, a compute cluster named cpu-compute is created.

from azure.ai.ml.entities import AmlCompute

# specify aml compute name.
cpu_compute_target = "cpu-cluster"

try:
    ml_client.compute.get(cpu_compute_target)
except Exception:
    print("Creating a new cpu compute target...")
    compute = AmlCompute(
        name=cpu_compute_target, size="STANDARD_D2_V2", min_instances=0, max_instances=4
    )
    ml_client.compute.begin_create_or_update(compute).result()

3. Submit the training job

To run this script, you'll use a command that executes main.py Python script located under ./sdk/python/jobs/single-step/lightgbm/iris/src/. The command will be run by submitting it as a job to Azure Machine Learning.

Note

To use serverless compute, delete compute="cpu-cluster" in this code.

from azure.ai.ml import command, Input

# define the command
command_job = command(
    code="./src",
    command="python main.py --iris-csv ${{inputs.iris_csv}} --learning-rate ${{inputs.learning_rate}} --boosting ${{inputs.boosting}}",
    environment="AzureML-lightgbm-3.2-ubuntu18.04-py37-cpu@latest",
    inputs={
        "iris_csv": Input(
            type="uri_file",
            path="https://azuremlexamples.blob.core.windows.net/datasets/iris.csv",
        ),
        "learning_rate": 0.9,
        "boosting": "gbdt",
    },
    compute="cpu-cluster",
)
# submit the command
returned_job = ml_client.jobs.create_or_update(command_job)
# get a URL for the status of the job
returned_job.studio_url

In the above examples, you configured:

  • code - path where the code to run the command is located
  • command - command that needs to be run
  • environment - the environment needed to run the training script. In this example, we use a curated or ready-made environment provided by Azure Machine Learning called AzureML-lightgbm-3.2-ubuntu18.04-py37-cpu. We use the latest version of this environment by using the @latest directive. You can also use custom environments by specifying a base docker image and specifying a conda yaml on top of it.
  • inputs - dictionary of inputs using name value pairs to the command. The key is a name for the input within the context of the job and the value is the input value. Inputs are referenced in the command using the ${{inputs.<input_name>}} expression. To use files or folders as inputs, you can use the Input class. For more information, see SDK and CLI v2 expressions.

For more information, see the reference documentation.

When you submit the job, a URL is returned to the job status in the Azure Machine Learning studio. Use the studio UI to view the job progress. You can also use returned_job.status to check the current status of the job.

Register the trained model

The following examples demonstrate how to register a model in your Azure Machine Learning workspace.

Tip

The name property returned by the training job is used as part of the path to the model.

from azure.ai.ml.entities import Model
from azure.ai.ml.constants import AssetTypes

run_model = Model(
    path="azureml://jobs/{}/outputs/artifacts/paths/model/".format(returned_job.name),
    name="run-model-example",
    description="Model created from run.",
    type=AssetTypes.MLFLOW_MODEL
)

ml_client.models.create_or_update(run_model)

Next steps

Now that you have a trained model, learn how to deploy it using an online endpoint.

For more examples, see the Azure Machine Learning examples GitHub repository.

For more information on the Azure CLI commands, Python SDK classes, or REST APIs used in this article, see the following reference documentation: