A PyTorch Estimator is used to train a PyTorch specific experiment.
Supported versions: 1.0, 1.1
PyTorch(source_directory, *, compute_target=None, vm_size=None, vm_priority=None, entry_script=None, script_params=None, node_count=1, process_count_per_node=1, distributed_backend=None, distributed_training=None, use_gpu=False, use_docker=True, custom_docker_image=None, image_registry_details=None, user_managed=False, conda_packages=None, pip_packages=None, conda_dependencies_file_path=None, pip_requirements_file_path=None, environment_variables=None, environment_definition=None, inputs=None, source_directory_data_store=None, shm_size=None, max_run_duration_seconds=None, framework_version=None, _enable_optimized_mode=False)
A local directory containing experiment configuration files.
The ComputeTarget where training will happen. This can either be an object or the string "local".
The VM size of the compute target that will be created for the training.
Supported values: Any Azure VM size.
The list of available VM sizes are listed here: https://docs.microsoft.com/en-us/azure/cloud-services/cloud-services-sizes-specs
The VM priority of the compute target that will be created for the training. If not specified, it will be defaulted to 'dedicated'.
Supported values: 'dedicated' and 'lowpriority'.
This takes effect only when the vm_size param is specified in the input.
A string representing the relative path to the file used to start training.
A dict containing parameters to the entry_script.
Number of nodes in the compute target used for training. If greater than 1, mpi distributed job will be run. Only AmlCompute compute target is supported for distributed jobs.
Number of processes per node. If greater than 1, mpi distributed job will be run. Only AmlCompute compute target is supported for distributed jobs.
Communication backend for distributed training.
Supported values: 'mpi'.
This parameter is required when any of node_count, process_count_per_node, worker_count, or parameter_server_count > 1.
When node_count == 1 and process_count_per_node == 1, no backend will be used unless the backend is explicitly set. Only AmlCompute compute target is supported for distributed training.
Parameters for running a distributed training job. Please use this option instead of deprecated distributed_backend.
For running a distributed job with MPI backend, use MpiConfiguration object to specify process_count_per_node.
A bool value indicating if the environment to run the experiment should support GPUs. If set to true, gpu-based default docker image will be used in the environment. If set to false, CPU based image will be used. Default docker images (CPU or GPU) will be used only if custom_docker_image parameter is not set. This setting is used only in docker enabled compute targets.
A bool value indicating if the environment to run the experiment should be docker-based.
The name of the docker image from which the image to use for training will be built. If not set, a default CPU based image will be used as the base image.
The details of the docker image registry.
True means that AzureML reuses an existing python environment, False means that AzureML will create a python environment based on the Conda dependencies specification.
List of strings representing conda packages to be added to the Python environment for the experiment.
List of strings representing pip packages to be added to the Python environment for the experiment.
A string representing the relative path to the conda dependencies yaml file.
A string representing the relative path to the pip requirements file. This can be provided in combination with the pip_packages parameter.
A dictionary of environment variables names and values. These environment variables are set on the process where user script is being executed.
The EnvironmentDefinition for the experiment. It includes PythonSection and DockerSection and environment variables. Any environment option not directly exposed through other parameters to the Estimator construction can be set using environment_definition parameter. If this parameter is specified, it will take precedence over other environment related parameters like use_gpu, custom_docker_image, conda_packages or pip_packages and errors will be reported on these invalid combinations.
Data references as input.
Backing datastore for project share.
The size of the Docker container's shared memory block. Please refer to https://docs.docker.com/engine/reference/run/ for more information. If not set, default is 1G.
Maximum allowed time for the run. The system will attempt to automatically cancel the run, if it took longer than this value.
PyTorch version to be used for executing training code. PyTorch.get_supported_versions() returns a list of the versions supported by the current SDK.
While submitting a training job, Azure ML runs your script in a conda environment within a Docker container. The PyTorch containers have the following dependencies installed.
|CUDA (GPU image only)||10.0|
|cuDNN (GPU image only)||7.5|
|NCCL (GPU image only)||2.4.2|
The Docker images extend Ubuntu 16.04.
If you need to install additional dependencies, you can either use pip_packages/conda_packages parameters or provide your pip requirements.txt/conda environment.yml file. Alternatively, you can build your own image, and pass the custom_docker_image parameter to the estimator constructor.
DEFAULT_VERSION = '1.1'
FRAMEWORK_NAME = 'PyTorch'