Build scikit-learn models at scale with Azure Machine Learning

APPLIES TO: yesBasic edition yesEnterprise edition                    (Upgrade to Enterprise edition)

In this article, learn how to run your scikit-learn training scripts at enterprise scale using Azure Machine Learning's SKlearn estimator class.

The example scripts in this article are used to classify iris flower images to build a machine learning model based on scikit-learn's iris dataset.

Whether you're training a machine learning scikit-learn model from the ground-up or you're bringing an existing model into the cloud, you can use Azure Machine Learning to scale out open-source training jobs using elastic cloud compute resources. You can build, deploy, version and monitor production-grade models with Azure Machine Learning.

Prerequisites

Run this code on either of these environments:

  • Azure Machine Learning Notebook VM - no downloads or installation necessary

    • Complete the Tutorial: Setup environment and workspace to create a dedicated notebook server pre-loaded with the SDK and the sample repository.
    • In the samples training folder on the notebook server, find a completed and expanded notebook by navigating to this directory: how-to-use-azureml > ml-frameworks > scikit-learn > training > train-hyperparameter-tune-deploy-with-sklearn folder.
  • Your own Jupyter Notebook server

Set up the experiment

This section sets up the training experiment by loading the required python packages, initializing a workspace, creating an experiment, and uploading the training data and training scripts.

Import packages

First, import the necessary Python libraries.

import os
import urllib
import shutil
import azureml

from azureml.core import Experiment
from azureml.core import Workspace, Run

from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

Initialize a workspace

The Azure Machine Learning workspace is the top-level resource for the service. It provides you with a centralized place to work with all the artifacts you create. In the Python SDK, you can access the workspace artifacts by creating a workspace object.

Create a workspace object from the config.json file created in the prerequisites section.

ws = Workspace.from_config()

Create a machine learning experiment

Create an experiment and a folder to hold your training scripts. In this example, create an experiment called "sklearn-iris".

project_folder = './sklearn-iris'
os.makedirs(project_folder, exist_ok=True)

exp = Experiment(workspace=ws, name='sklearn-iris')

Prepare training script

In this tutorial, the training script train_iris.py is already provided for you. In practice, you should be able to take any custom training script as is and run it with Azure ML without having to modify your code.

To use Azure ML's tracking and metrics capabilities, add a small amount of Azure ML code inside your training script. The training script train_iris.py shows how to log some metrics to your Azure ML run using the Run object within the script.

The provided training script uses example data from the iris = datasets.load_iris() function. For your own data, you may need to use steps such as Upload dataset and scripts to make data available during training.

Copy the training script train_iris.py into your project directory.

import shutil
shutil.copy('./train_iris.py', project_folder)

Create or get a compute target

Create a compute target for your scikit-learn job to run on. Scikit-learn only supports single node, CPU computing.

The following code, creates an Azure Machine Learning managed compute (AmlCompute) for your remote training compute resource. Creation of AmlCompute takes approximately 5 minutes. If the AmlCompute with that name is already in your workspace, this code will skip the creation process.

cluster_name = "cpu-cluster"

try:
    compute_target = ComputeTarget(workspace=ws, name=cluster_name)
    print('Found existing compute target')
except ComputeTargetException:
    print('Creating a new compute target...')
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2', 
                                                           max_nodes=4)

    compute_target = ComputeTarget.create(ws, cluster_name, compute_config)

    compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)

For more information on compute targets, see the what is a compute target article.

Create a scikit-learn estimator

The scikit-learn estimator provides a simple way of launching a scikit-learn training job on a compute target. It is implemented through the SKLearn class, which can be used to support single-node CPU training.

If your training script needs additional pip or conda packages to run, you can have the packages installed on the resulting docker image by passing their names through the pip_packages and conda_packages arguments.

from azureml.train.sklearn import SKLearn

script_params = {
    '--kernel': 'linear',
    '--penalty': 1.0,
}

estimator = SKLearn(source_directory=project_folder, 
                    script_params=script_params,
                    compute_target=compute_target,
                    entry_script='train_iris.py'
                    pip_packages=['joblib']
                   )

Submit a run

The Run object provides the interface to the run history while the job is running and after it has completed.

run = experiment.submit(estimator)
run.wait_for_completion(show_output=True)

As the Run is executed, it goes through the following stages:

  • Preparing: A docker image is created according to the TensorFlow estimator. The image is uploaded to the workspace's container registry and cached for later runs. Logs are also streamed to the run history and can be viewed to monitor progress.

  • Scaling: The cluster attempts to scale up if the Batch AI cluster requires more nodes to execute the run than are currently available.

  • Running: All scripts in the script folder are uploaded to the compute target, data stores are mounted or copied, and the entry_script is executed. Outputs from stdout and the ./logs folder are streamed to the run history and can be used to monitor the run.

  • Post-Processing: The ./outputs folder of the run is copied over to the run history.

Save and register the model

Once you've trained the model, you can save and register it to your workspace. Model registration lets you store and version your models in your workspace to simplify model management and deployment.

Add the following code to your training script, train_iris.py, to save the model.

import joblib

joblib.dump(svm_model_linear, 'model.joblib')

Register the model to your workspace with the following code. By specifying the parameters model_framework, model_framework_version, and resource_configuration, no-code model deployment becomes available. This allows you to directly deploy your model as a web service from the registered model, and the ResourceConfiguration object defines the compute resource for the web service.

from azureml.core import Model
from azureml.core.resource_configuration import ResourceConfiguration

model = run.register_model(model_name='sklearn-iris', 
                           model_path='model.joblib',
                           model_framework=Model.Framework.SCIKITLEARN,
                           model_framework_version='0.19.1',
                           resource_configuration=ResourceConfiguration(cpu=1, memory_in_gb=0.5))

Deployment

The model you just registered can be deployed the exact same way as any other registered model in Azure Machine Learning, regardless of which estimator you used for training. The deployment how-to contains a section on registering models, but you can skip directly to creating a compute target for deployment, since you already have a registered model.

(Preview) No-code model deployment

Instead of the traditional deployment route, you can also use the no-code deployment feature (preview)for scikit-learn. No-code model deployment is supported for all built-in scikit-learn model types. By registering your model as shown above with the model_framework, model_framework_version, and resource_configuration parameters, you can simply use the deploy() static function to deploy your model.

web_service = Model.deploy(ws, "scikit-learn-service", [model])

NOTE: These dependencies are included in the pre-built scikit-learn inference container.

    - azureml-defaults
    - inference-schema[numpy-support]
    - scikit-learn
    - numpy

The full how-to covers deployment in Azure Machine Learning in greater depth.

Next steps

In this article, you trained and registered a scikit-learn model, and learned about deployment options. See these other articles to learn more about Azure Machine Learning.