SKLearn Class

Creates an estimator for training in Scikit-learn experiments.

DEPRECATED. Use the ScriptRunConfig object with your own defined environment or the AzureML-Tutorial curated environment. For an introduction to configuring SKLearn experiment runs with ScriptRunConfig, see Train scikit-learn models at scale with Azure Machine Learning.

This estimator only supports single-node CPU training.

Supported versions: 0.20.3

Initialize a Scikit-learn estimator.

Inheritance
azureml.train.estimator._framework_base_estimator._FrameworkBaseEstimator
SKLearn

Constructor

SKLearn(source_directory, *, compute_target=None, vm_size=None, vm_priority=None, entry_script=None, script_params=None, use_docker=True, custom_docker_image=None, image_registry_details=None, user_managed=False, conda_packages=None, pip_packages=None, conda_dependencies_file_path=None, pip_requirements_file_path=None, conda_dependencies_file=None, pip_requirements_file=None, environment_variables=None, environment_definition=None, inputs=None, shm_size=None, resume_from=None, max_run_duration_seconds=None, framework_version=None, _enable_optimized_mode=False, _disable_validation=True, _show_lint_warnings=False, _show_package_warnings=False)

Parameters

source_directory
str
Required

A local directory containing experiment configuration files.

compute_target
AbstractComputeTarget or str
Required

The compute target where training will happen. This can either be an object or the string "local".

vm_size
str
Required

The VM size of the compute target that will be created for the training.

Supported values: Any Azure VM size.

vm_priority
str
Required

The VM priority of the compute target that will be created for the training. If not specified, 'dedicated' is used.

Supported values: 'dedicated' and 'lowpriority'.

This takes effect only when the vm_size param is specified in the input.

entry_script
str
Required

A string representing the relative path to the file used to start training.

script_params
dict
Required

A dictionary of command-line arguments to pass to your training script specified in entry_script.

custom_docker_image
str
Required

The name of the Docker image from which the image to use for training will be built. If not set, a default CPU based image will be used as the base image.

image_registry_details
ContainerRegistry
Required

The details of the Docker image registry.

user_managed
bool
Required

Specifies whether Azure ML reuses an existing Python environment. False means that AzureML will create a Python environment based on the conda dependencies specification.

conda_packages
list
Required

A list of strings representing conda packages to be added to the Python environment for the experiment.

pip_packages
list
Required

A list of strings representing pip packages to be added to the Python environment for the experiment.

conda_dependencies_file_path
str
Required

A string representing the relative path to the conda dependencies yaml file. If specified, Azure ML will not install any framework related packages. This can be provided in combination with the conda_packages parameter. DEPRECATED. Use the conda_dependencies_file parameter.

pip_requirements_file_path
str
Required

A string representing the relative path to the pip requirements text file. This can be provided in combination with the pip_packages parameter. DEPRECATED. Use the pip_requirements_file parameter.

conda_dependencies_file
str
Required

A string representing the relative path to the conda dependencies yaml file. If specified, Azure ML will not install any framework related packages. This can be provided in combination with the conda_packages parameter.

pip_requirements_file
str
Required

A string representing the relative path to the pip requirements text file. This can be provided in combination with the pip_packages parameter.

environment_variables
dict
Required

A dictionary of environment variables names and values. These environment variables are set on the process where user script is being executed.

environment_definition
Environment
Required

The environment definition for an experiment includes PythonSection, DockerSection, and environment variables. Any environment option not directly exposed through other parameters to the Estimator construction can be set using environment_definition parameter. If this parameter is specified, it will take precedence over other environment related parameters like use_gpu, custom_docker_image, conda_packages, or pip_packages. Errors will be reported invalid combinations.

inputs
list
Required

A list of DataReference or DatasetConsumptionConfig objects to use as input.

shm_size
str
Required

The size of the Docker container's shared memory block. If not set, the default azureml.core.environment._DEFAULT_SHM_SIZE is used.

resume_from
DataPath
Required

The data path containing the checkpoint or model files from which to resume the experiment.

max_run_duration_seconds
int
Required

The maximum allowed time for the run. Azure ML will attempt to automatically cancel the run if it takes longer than this value.

framework_version
str
Required

The Scikit-learn version to be used for executing training code. SKLearn.get_supported_versions() returns a list of the versions supported by the current SDK.

source_directory
str
Required

A local directory containing experiment configuration files.

compute_target
AbstractComputeTarget or str
Required

The compute target where training will happen. This can either be an object or the string "local".

vm_size
str
Required

The VM size of the compute target that will be created for the training. Supported values: Any Azure VM size.

vm_priority
str
Required

The VM priority of the compute target that will be created for the training. If not specified, 'dedicated' is used.

Supported values: 'dedicated' and 'lowpriority'.

This takes effect only when the vm_size param is specified in the input.

entry_script
str
Required

A string representing the relative path to the file used to start training.

script_params
dict
Required

A dictionary of command-line arguments to pass to your training script specified in entry_script.

use_docker
bool
Required

A bool value indicating if the environment to run the experiment should be Docker-based.

custom_docker_image
str
Required

The name of the Docker image from which the image to use for training will be built. If not set, a default CPU-based image will be used as the base image.

image_registry_details
ContainerRegistry
Required

The details of the Docker image registry.

user_managed
bool
Required

Specifies whether Azure ML reuses an existing Python environment. False means that AzureML will create a Python environment based on the conda dependencies specification.

conda_packages
list
Required

A list of strings representing conda packages to be added to the Python environment for the experiment.

pip_packages
list
Required

A list of strings representing pip packages to be added to the Python environment for the experiment.

conda_dependencies_file_path
str
Required

A string representing the relative path to the conda dependencies yaml file. If specified, Azure ML will not install any framework related packages. This can be provided in combination with the conda_packages parameter. DEPRECATED. Use the conda_dependencies_file parameter.

pip_requirements_file_path
str
Required

A string representing the relative path to the pip requirements text file. This can be provided in combination with the pip_packages parameter. DEPRECATED. Use the pip_requirements_file parameter.

conda_dependencies_file
str
Required

A string representing the relative path to the conda dependencies yaml file. If specified, Azure ML will not install any framework related packages. This can be provided in combination with the conda_packages parameter.

pip_requirements_file
str
Required

A string representing the relative path to the pip requirements text file. This can be provided in combination with the pip_packages parameter.

environment_variables
dict
Required

A dictionary of environment variables names and values. These environment variables are set on the process where user script is being executed.

environment_definition
Environment
Required

The environment definition for an experiment includes PythonSection, DockerSection, and environment variables. Any environment option not directly exposed through other parameters to the Estimator construction can be set using environment_definition parameter. If this parameter is specified, it will take precedence over other environment related parameters like use_gpu, custom_docker_image, conda_packages, or pip_packages. Errors will be reported invalid combinations.

inputs
list
Required

A list of azureml.data.data_reference.DataReference objects to use as input.

shm_size
str
Required

The size of the Docker container's shared memory block. If not set, the default azureml.core.environment._DEFAULT_SHM_SIZE is used.

resume_from
DataPath
Required

The data path containing the checkpoint or model files from which to resume the experiment.

max_run_duration_seconds
int
Required

The maximum allowed time for the run. Azure ML will attempt to automatically cancel the run if it takes longer than this value.

framework_version
str
Required

The Scikit-learn version to be used for executing training code. SKLearn.get_supported_versions() returns a list of the versions supported by the current SDK.

_enable_optimized_mode
bool
Required

Enable incremental environment build with pre-built framework images for faster environment preparation. A pre-built framework image is built on top of Azure ML default CPU/GPU base images with framework dependencies pre-installed.

_disable_validation
bool
Required

Disable script validation before run submission. The default is True.

_show_lint_warnings
bool
Required

Show script linting warnings. The default is False.

_show_package_warnings
bool
Required

Show package validation warnings. The default is False.

Remarks

When submitting a training job, Azure ML runs your script in a conda environment within a Docker container. SKLearn containers have the following dependencies installed.

Dependencies | Scikit-learn 0.20.3 | ———————- | —————– | Python | 3.6.2 | azureml-defaults | Latest | IntelMpi | 2018.3.222 | scikit-learn | 0.20.3 | numpy | 1.16.2 | miniconda | 4.5.11 | scipy | 1.2.1 | joblib | 0.13.2 | git | 2.7.4 |

The Docker images extend Ubuntu 16.04.

If you need to install additional dependencies, you can either use the pip_packages or conda_packages parameters, or you can provide your pip_requirements_file or conda_dependencies_file file. Alternatively, you can build your own image and pass the custom_docker_image parameter to the estimator constructor.

Attributes

DEFAULT_VERSION

DEFAULT_VERSION = '0.20.3'

FRAMEWORK_NAME

FRAMEWORK_NAME = 'SKLearn'