Train models with Azure Machine Learning

APPLIES TO: Python SDK azure-ai-ml v2 (current)

Azure Machine Learning provides several ways to train your models, from code-first solutions using the SDK to low-code solutions such as automated machine learning and the visual designer. Use the following list to determine which training method is right for you:

  • Azure Machine Learning SDK for Python: The Python SDK provides several ways to train models, each with different capabilities.

    Training method Description
    command() A typical way to train models is to submit a command() that includes a training script, environment, and compute information.
    Automated machine learning Automated machine learning allows you to train models without extensive data science or programming knowledge. For people with a data science and programming background, it provides a way to save time and resources by automating algorithm selection and hyperparameter tuning. You don't have to worry about defining a job configuration when using automated machine learning.
    Machine learning pipeline Pipelines are not a different training method, but a way of defining a workflow using modular, reusable steps that can include training as part of the workflow. Machine learning pipelines support using automated machine learning and run configuration to train models. Since pipelines are not focused specifically on training, the reasons for using a pipeline are more varied than the other training methods. Generally, you might use a pipeline when:
    * You want to schedule unattended processes such as long running training jobs or data preparation.
    * Use multiple steps that are coordinated across heterogeneous compute resources and storage locations.
    * Use the pipeline as a reusable template for specific scenarios, such as retraining or batch scoring.
    * Track and version data sources, inputs, and outputs for your workflow.
    * Your workflow is implemented by different teams that work on specific steps independently. Steps can then be joined together in a pipeline to implement the workflow.
  • Designer: Azure Machine Learning designer provides an easy entry-point into machine learning for building proof of concepts, or for users with little coding experience. It allows you to train models using a drag and drop web-based UI. You can use Python code as part of the design, or train models without writing any code.

  • Azure CLI: The machine learning CLI provides commands for common tasks with Azure Machine Learning, and is often used for scripting and automating tasks. For example, once you've created a training script or pipeline, you might use the Azure CLI to start a training job on a schedule or when the data files used for training are updated. For training models, it provides commands that submit training jobs. It can submit jobs using run configurations or pipelines.

Each of these training methods can use different types of compute resources for training. Collectively, these resources are referred to as compute targets. A compute target can be a local machine or a cloud resource, such as an Azure Machine Learning Compute, Azure HDInsight, or a remote virtual machine.

Python SDK

The Azure Machine Learning SDK for Python allows you to build and run machine learning workflows with Azure Machine Learning. You can interact with the service from an interactive Python session, Jupyter Notebooks, Visual Studio Code, or other IDE.

Submit a command

A generic training job with Azure Machine Learning can be defined using the command(). The command is then used, along with your training script(s) to train a model on the specified compute target.

You may start with a command for your local computer, and then switch to one for a cloud-based compute target as needed. When changing the compute target, you only change the compute parameter in the command that you use. A run also logs information about the training job, such as the inputs, outputs, and logs.

Automated Machine Learning

Define the iterations, hyperparameter settings, featurization, and other settings. During training, Azure Machine Learning tries different algorithms and parameters in parallel. Training stops once it hits the exit criteria you defined.

Tip

In addition to the Python SDK, you can also use Automated ML through Azure Machine Learning studio.

Machine learning pipeline

Machine learning pipelines can use the previously mentioned training methods. Pipelines are more about creating a workflow, so they encompass more than just the training of models.

Understand what happens when you submit a training job

The Azure training lifecycle consists of:

  1. Zipping the files in your project folder and upload to the cloud.

    Tip

    To prevent unnecessary files from being included in the snapshot, make an ignore file (.gitignore or .amlignore) in the directory. Add the files and directories to exclude to this file. For more information on the syntax to use inside this file, see syntax and patterns for .gitignore. The .amlignore file uses the same syntax. If both files exist, the .amlignore file is used and the .gitignore file is unused.

  2. Scaling up your compute cluster (or serverless compute

  3. Building or downloading the dockerfile to the compute node

    1. The system calculates a hash of:
    2. The system uses this hash as the key in a lookup of the workspace Azure Container Registry (ACR)
    3. If it is not found, it looks for a match in the global ACR
    4. If it is not found, the system builds a new image (which will be cached and registered with the workspace ACR)
  4. Downloading your zipped project file to temporary storage on the compute node

  5. Unzipping the project file

  6. The compute node executing python <entry script> <arguments>

  7. Saving logs, model files, and other files written to ./outputs to the storage account associated with the workspace

  8. Scaling down compute, including removing temporary storage

Azure Machine Learning designer

The designer lets you train models using a drag and drop interface in your web browser.

Azure CLI

The machine learning CLI is an extension for the Azure CLI. It provides cross-platform CLI commands for working with Azure Machine Learning. Typically, you use the CLI to automate tasks, such as training a machine learning model.

VS Code

You can use the VS Code extension to run and manage your training jobs. See the VS Code resource management how-to guide to learn more.

Next steps

Learn how to Tutorial: Create production ML pipelines with Python SDK v2 in a Jupyter notebook.