Train models with Azure Machine Learning using estimator

With Azure Machine Learning, you can easily submit your training script to various compute targets, using a RunConfiguration object and a ScriptRunConfig object. That pattern gives you a lot of flexibility and maximum control.

The estimator class makes it easier to train models with deep learning and reinforcement learning. It provides a high-level abstraction that lets you easily construct run configuration. You can create and use a generic Estimator to submit training script using any learning framework you choose (such as scikit-learn) on any compute target you choose, whether it's your local machine, a single VM in Azure, or a GPU cluster in Azure. For PyTorch, TensorFlow, Chainer, and reinforcement learning tasks, Azure Machine Learning also provides respective PyTorch, TensorFlow, Chainer, and reinforcement learning estimators to simplify using these frameworks.

Train with an estimator

Once you've created your workspace and set up your development environment, training a model in Azure Machine Learning involves the following steps:

  1. Create a remote compute target (or you can also use local computer as compute target)
  2. Upload your training data to datastore (Optional)
  3. Create your training script
  4. Create an Estimator object
  5. Submit the estimator to an experiment object under the workspace

This article focuses on steps 4-5. For steps 1-3, refer to the train a model tutorial for an example.

Single-node training

Use an Estimator for a single-node training run on remote compute in Azure for a scikit-learn model. You should have already created your compute target object compute_target and your FileDataset object ds.

from azureml.train.estimator import Estimator

script_params = {
    # to mount files referenced by mnist dataset
    '--data-folder': ds.as_named_input('mnist').as_mount(),
    '--regularization': 0.8

sk_est = Estimator(source_directory='./my-sklearn-proj',

This code snippet specifies the following parameters to the Estimator constructor.

Parameter Description
source_directory Local directory that contains all of your code needed for the training job. This folder gets copied from your local machine to the remote compute.
script_params Dictionary specifying the command-line arguments to pass to your training script entry_script, in the form of <command-line argument, value> pairs. To specify a verbose flag in script_params, use <command-line argument, "">.
compute_target Remote compute target that your training script will run on, in this case an Azure Machine Learning Compute (AmlCompute) cluster. (Note even though AmlCompute cluster is the commonly used target, it is also possible to choose other compute target types such as Azure VMs or even local computer.)
entry_script Filepath (relative to the source_directory) of the training script to be run on the remote compute. This file, and any additional files it depends on, should be located in this folder.
conda_packages List of Python packages to be installed via conda needed by your training script.

The constructor has another parameter called pip_packages that you use for any pip packages needed.

Now that you've created your Estimator object, submit the training job to be run on the remote compute with a call to the submit function on your Experiment object experiment.

run = experiment.submit(sk_est)


Special Folders Two folders, outputs and logs, receive special treatment by Azure Machine Learning. During training, when you write files to folders named outputs and logs that are relative to the root directory (./outputs and ./logs, respectively), the files will automatically upload to your run history so that you have access to them once your run is finished.

To create artifacts during training (such as model files, checkpoints, data files, or plotted images) write these to the ./outputs folder.

Similarly, you can write any logs from your training run to the ./logs folder. To utilize Azure Machine Learning's TensorBoard integration make sure you write your TensorBoard logs to this folder. While your run is in progress, you will be able to launch TensorBoard and stream these logs. Later, you will also be able to restore the logs from any of your previous runs.

For example, to download a file written to the outputs folder to your local machine after your remote training run: run.download_file(name='outputs/my_output_file', output_file_path='my_destination_path')

Distributed training and custom Docker images

There are two additional training scenarios you can carry out with the Estimator:

  • Using a custom Docker image
  • Distributed training on a multi-node cluster

The following code shows how to carry out distributed training for a Keras model. In addition, instead of using the default Azure Machine Learning images, it specifies a custom docker image from Docker Hub continuumio/miniconda for training.

You should have already created your compute target object compute_target. You create the estimator as follows:

from azureml.train.estimator import Estimator
from azureml.core.runconfig import MpiConfiguration

estimator = Estimator(source_directory='./my-keras-proj',
                      conda_packages=['tensorflow', 'keras'],

The above code exposes the following new parameters to the Estimator constructor:

Parameter Description Default
custom_docker_image Name of the image you want to use. Only provide images available in public docker repositories (in this case Docker Hub). To use an image from a private docker repository, use the constructor's environment_definition parameter instead. None
node_count Number of nodes to use for your training job. 1
process_count_per_node Number of processes (or "workers") to run on each node. In this case, you use the 2 GPUs available on each node. 1
distributed_training MPIConfiguration object for launching distributed training using MPI backend. None

Finally, submit the training job:

run = experiment.submit(estimator)

Registering a model

Once you've trained the model, you can save and register it to your workspace. Model registration lets you store and version your models in your workspace to simplify model management and deployment.

Running the following code will register the model to your workspace, and will make it available to reference by name in remote compute contexts or deployment scripts. See register_model in the reference docs for more information and additional parameters.

model = run.register_model(model_name='sklearn-sample', model_path=None)

GitHub tracking and integration

When you start a training run where the source directory is a local Git repository, information about the repository is stored in the run history. For more information, see Git integration for Azure Machine Learning.


For a notebook that trains a scikit-learn model by using estimator, see:

For notebooks on training models by using deep-learning-framework specific estimators, see:

Learn how to run notebooks by following the article Use Jupyter notebooks to explore this service.

Next steps