Train and register a Keras classification model with Azure Machine Learning

APPLIES TO: yesBasic edition yesEnterprise edition                    (Upgrade to Enterprise edition)

This article shows you how to train and register a Keras classification model built on TensorFlow using Azure Machine Learning. It uses the popular MNIST dataset to classify handwritten digits using a deep neural network (DNN) built using the Keras Python library running on top of TensorFlow.

Keras is a high-level neural network API capable of running top of other popular DNN frameworks to simplify development. With Azure Machine Learning, you can rapidly scale out training jobs using elastic cloud compute resources. You can also track your training runs, version models, deploy models, and much more.

Whether you're developing a Keras model from the ground-up or you're bringing an existing model into the cloud, Azure Machine Learning can help you build production-ready models.

See the conceptual article for information on the differences between machine learning and deep learning.


Run this code on either of these environments:

Set up the experiment

This section sets up the training experiment by loading the required python packages, initializing a workspace, creating an experiment, and uploading the training data and training scripts.

Import packages

First, import the necessary Python libraries.

import os
import azureml
from azureml.core import Experiment
from azureml.core import Workspace, Run
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

Initialize a workspace

The Azure Machine Learning workspace is the top-level resource for the service. It provides you with a centralized place to work with all the artifacts you create. In the Python SDK, you can access the workspace artifacts by creating a workspace object.

Create a workspace object from the config.json file created in the prerequisites section.

ws = Workspace.from_config()

Create an experiment

Create an experiment called "keras-mnist" in your workspace.

exp = Experiment(workspace=ws, name='keras-mnist')

Create a file dataset

A FileDataset object references one or multiple files in your workspace datastore or public urls. The files can be of any format, and the class provides you with the ability to download or mount the files to your compute. By creating a FileDataset, you create a reference to the data source location. If you applied any transformations to the data set, they will be stored in the data set as well. The data remains in its existing location, so no extra storage cost is incurred. See the how-to guide on the Dataset package for more information.

from azureml.core.dataset import Dataset

web_paths = [
dataset = Dataset.File.from_files(path=web_paths)

Use the register() method to register the data set to your workspace so they can be shared with others, reused across various experiments, and referred to by name in your training script.

dataset = dataset.register(workspace=ws,
                           name='mnist dataset',
                           description='training and test dataset',

Create a compute target

Create a compute target for your TensorFlow job to run on. In this example, create a GPU-enabled Azure Machine Learning compute cluster.

cluster_name = "gpucluster"

    compute_target = ComputeTarget(workspace=ws, name=cluster_name)
    print('Found existing compute target')
except ComputeTargetException:
    print('Creating a new compute target...')
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6',

    compute_target = ComputeTarget.create(ws, cluster_name, compute_config)

    compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)


You may choose to use low-priority VMs to run some or all of your workloads. See how to create a low-priority VM.

For more information on compute targets, see the what is a compute target article.

Create a TensorFlow estimator and import Keras

The TensorFlow estimator provides a simple way of launching TensorFlow training jobs on compute target. Since Keras runs on top of TensorFlow, you can use the TensorFlow estimator and import the Keras library using the pip_packages argument.

First get the data from the workspace datastore using the Dataset class.

dataset = Dataset.get_by_name(ws, 'mnist dataset')

# list the files referenced by mnist dataset

The TensorFlow estimator is implemented through the generic estimator class, which can be used to support any framework. Additionally, create a dictionary script_params that contains the DNN hyperparameter settings. For more information about training models using the generic estimator, see train models with Azure Machine Learning using estimator

from azureml.train.dnn import TensorFlow

script_params = {
    '--data-folder': dataset.as_named_input('mnist').as_mount(),
    '--batch-size': 50,
    '--first-layer-neurons': 300,
    '--second-layer-neurons': 100,
    '--learning-rate': 0.001

est = TensorFlow(source_directory=script_folder,
                 pip_packages=['keras', 'matplotlib'],

Submit a run

The Run object provides the interface to the run history while the job is running and after it has completed.

run = exp.submit(est)

As the Run is executed, it goes through the following stages:

  • Preparing: A docker image is created according to the TensorFlow estimator. The image is uploaded to the workspace's container registry and cached for later runs. Logs are also streamed to the run history and can be viewed to monitor progress.

  • Scaling: The cluster attempts to scale up if the Batch AI cluster requires more nodes to execute the run than are currently available.

  • Running: All scripts in the script folder are uploaded to the compute target, data stores are mounted or copied, and the entry_script is executed. Outputs from stdout and the ./logs folder are streamed to the run history and can be used to monitor the run.

  • Post-Processing: The ./outputs folder of the run is copied over to the run history.

Register the model

Once you've trained the DNN model, you can register it to your workspace. Model registration lets you store and version your models in your workspace to simplify model management and deployment.

model = run.register_model(model_name='keras-dnn-mnist', model_path='outputs/model')


The model you just registered is deployed the exact same way as any other registered model in Azure Machine Learning, regardless of which estimator you used for training. The deployment how-to contains a section on registering models, but you can skip directly to creating a compute target for deployment, since you already have a registered model.

You can also download a local copy of the model. This can be useful for doing additional model validation work locally. In the training script,, a TensorFlow saver object persists the model to a local folder (local to the compute target). You can use the Run object to download a copy from datastore.

# Create a model folder in the current directory
os.makedirs('./model', exist_ok=True)

for f in run.get_file_names():
    if f.startswith('outputs/model'):
        output_file_path = os.path.join('./model', f.split('/')[-1])
        print('Downloading from {} to {} ...'.format(f, output_file_path))
        run.download_file(name=f, output_file_path=output_file_path)

Next steps

In this article, you trained and registered a Keras model on Azure Machine Learning. To learn how to deploy a model, continue on to our model deployment article.