Enable logging in Azure ML training runs

APPLIES TO: yesBasic edition yesEnterprise edition                    (Upgrade to Enterprise edition)

The Azure Machine Learning Python SDK lets you log real-time information using both the default Python logging package and SDK-specific functionality. You can log locally and send logs to your workspace in the portal.

Logs can help you diagnose errors and warnings, or track performance metrics like parameters and model performance. In this article, you learn how to enable logging in the following scenarios:

  • Interactive training sessions
  • Submitting training jobs using ScriptRunConfig
  • Python native logging settings
  • Logging from additional sources

Tip

This article shows you how to monitor the model training process. If you're interested in monitoring resource usage and events from Azure Machine learning, such as quotas, completed training runs, or completed model deployments, see Monitoring Azure Machine Learning.

Data types

You can log multiple data types including scalar values, lists, tables, images, directories, and more. For more information, and Python code examples for different data types, see the Run class reference page.

Interactive logging session

Interactive logging sessions are typically used in notebook environments. The method Experiment.start_logging() starts an interactive logging session. Any metrics logged during the session are added to the run record in the experiment. The method run.complete() ends the sessions and marks the run as completed.

The following code snippet uses an interactive logging session to log training parameters and performance metrics with the run.log() method. It also uploads the trained model to a specified output location.

# Get an experiment object from Azure Machine Learning
experiment = Experiment(workspace=ws, name="train-within-notebook")

# Create a run object in the experiment
run =  experiment.start_logging()
# Log the algorithm parameter alpha to the run
run.log('alpha', 0.03)

# Create, fit, and test the scikit-learn Ridge regression model
regression_model = Ridge(alpha=0.03)
regression_model.fit(data['train']['X'], data['train']['y'])
preds = regression_model.predict(data['test']['X'])

# Output the Mean Squared Error to the notebook and to the run
print('Mean Squared Error is', mean_squared_error(data['test']['y'], preds))
run.log('mse', mean_squared_error(data['test']['y'], preds))

# Save the model to the outputs directory for capture
model_file_name = 'outputs/model.pkl'

joblib.dump(value = regression_model, filename = model_file_name)

# upload the model file explicitly into artifacts 
run.upload_file(name = model_file_name, path_or_stream = model_file_name)

# Complete the run
run.complete()

For a complete sample notebook that uses interactive logging, see Train a model in a notebook.

ScriptRunConfig logs

In this section, you learn how to add logging code inside of ScriptConfig runs. You can use the ScriptRunConfig class to encapsulate scripts and environments for repeatable runs. You can also use this option to show a visual Jupyter Notebooks widget for monitoring.

This example performs a parameter sweep over alpha values and captures the results using the run.log() method.

  1. Create a training script that includes the logging logic, train.py.

    # Copyright (c) Microsoft. All rights reserved.
    # Licensed under the MIT license.
    
    from sklearn.datasets import load_diabetes
    from sklearn.linear_model import Ridge
    from sklearn.metrics import mean_squared_error
    from sklearn.model_selection import train_test_split
    from azureml.core.run import Run
    import os
    import numpy as np
    import mylib
    # sklearn.externals.joblib is removed in 0.23
    try:
        from sklearn.externals import joblib
    except ImportError:
        import joblib
    
    os.makedirs('./outputs', exist_ok=True)
    
    X, y = load_diabetes(return_X_y=True)
    
    run = Run.get_context()
    
    X_train, X_test, y_train, y_test = train_test_split(X, y,
                                                        test_size=0.2,
                                                        random_state=0)
    data = {"train": {"X": X_train, "y": y_train},
            "test": {"X": X_test, "y": y_test}}
    
    # list of numbers from 0.0 to 1.0 with a 0.05 interval
    alphas = mylib.get_alphas()
    
    for alpha in alphas:
        # Use Ridge algorithm to create a regression model
        reg = Ridge(alpha=alpha)
        reg.fit(data["train"]["X"], data["train"]["y"])
    
        preds = reg.predict(data["test"]["X"])
        mse = mean_squared_error(preds, data["test"]["y"])
        run.log('alpha', alpha)
        run.log('mse', mse)
    
        model_file_name = 'ridge_{0:.2f}.pkl'.format(alpha)
        # save model in the outputs folder so it automatically get uploaded
        with open(model_file_name, "wb") as file:
            joblib.dump(value=reg, filename=os.path.join('./outputs/',
                                                         model_file_name))
    
        print('alpha is {0:.2f}, and mse is {1:0.2f}'.format(alpha, mse))
    
  2. Submit the train.py script to run in a user-managed environment. The entire script folder is submitted for training.

    from azureml.core import ScriptRunConfig
    
    src = ScriptRunConfig(source_directory='./', script='train.py')
    src.run_config.environment = user_managed_env
    run = exp.submit(src)

    The show_output parameter turns on verbose logging, which lets you see details from the training process as well as information about any remote resources or compute targets. Use the following code to turn on verbose logging when you submit the experiment.

run = exp.submit(src, show_output=True)

You can also use the same parameter in the wait_for_completion function on the resulting run.

run.wait_for_completion(show_output=True)

For a complete sample notebook that uses ScriptRunConfigs logs, see Train a model locally.

Native Python logging

Some logs in the SDK may contain an error that instructs you to set the logging level to DEBUG. To set the logging level, add the following code to your script.

import logging
logging.basicConfig(level=logging.DEBUG)

Additional logging sources

Azure Machine Learning can also log information from other sources during training, such as automated machine learning runs, or Docker containers that run the jobs. These logs aren't documented, but if you encounter problems and contact Microsoft support, they may be able to use these logs during troubleshooting.

For information on logging metrics in Azure Machine Learning designer (preview), see How to log metrics in the designer (preview)

Example notebooks

The following notebooks demonstrate concepts in this article:

Learn how to run notebooks by following the article Use Jupyter notebooks to explore this service.

Next steps

See these articles to learn more on how to use Azure Machine Learning: