DatasetConsumptionConfig Class

Represent how to deliver the dataset to a compute target.

Inheritance
DatasetConsumptionConfig

Constructor

DatasetConsumptionConfig(name, dataset, mode='direct', path_on_compute=None)

Parameters

name
str

The name of the dataset in the run, which can be different to the registered name. The name will be registered as environment variable and can be used in data plane.

dataset
AbstractDataset or PipelineParameter or OutputDatasetConfig

The dataset that will be consumed in the run.

mode
str

Defines how the dataset should be delivered to the compute target. There are three modes:

  1. 'direct': consume the dataset as dataset.
  2. 'download': download the dataset and consume the dataset as downloaded path.
  3. 'mount': mount the dataset and consume the dataset as mount path.
path_on_compute
str

The target path on the compute to make the data available at. The folder structure of the source data will be kept, however, we might add prefixes to this folder structure to avoid collision. Use tabular_dataset.to_path to see the output folder structure.

Methods

as_download

Set the mode to download.

In the submitted run, files in the dataset will be downloaded to local path on the compute target. The download location can be retrieved from argument values and the input_datasets field of the run context.


   # Given a run submitted with dataset input like this:
   dataset_input = dataset.as_named_input('input_1').as_download()
   experiment.submit(ScriptRunConfig(source_directory, arguments=[dataset_input]))


   # Following are sample codes running in context of the submitted run:

   # The download location can be retrieved from argument values
   import sys
   download_location = sys.argv[1]

   # The download location can also be retrieved from input_datasets of the run context.
   from azureml.core import Run
   download_location = Run.get_context().input_datasets['input_1']
as_mount

Set the mode to mount.

In the submitted run, files in the datasets will be mounted to local path on the compute target. The mount point can be retrieved from argument values and the input_datasets field of the run context.


   # Given a run submitted with dataset input like this:
   dataset_input = dataset.as_named_input('input_1').as_mount()
   experiment.submit(ScriptRunConfig(source_directory, arguments=[dataset_input]))


   # Following are sample codes running in context of the submitted run:

   # The mount point can be retrieved from argument values
   import sys
   mount_point = sys.argv[1]

   # The mount point can also be retrieved from input_datasets of the run context.
   from azureml.core import Run
   mount_point = Run.get_context().input_datasets['input_1']

as_download

Set the mode to download.

In the submitted run, files in the dataset will be downloaded to local path on the compute target. The download location can be retrieved from argument values and the input_datasets field of the run context.


   # Given a run submitted with dataset input like this:
   dataset_input = dataset.as_named_input('input_1').as_download()
   experiment.submit(ScriptRunConfig(source_directory, arguments=[dataset_input]))


   # Following are sample codes running in context of the submitted run:

   # The download location can be retrieved from argument values
   import sys
   download_location = sys.argv[1]

   # The download location can also be retrieved from input_datasets of the run context.
   from azureml.core import Run
   download_location = Run.get_context().input_datasets['input_1']
as_download(path_on_compute=None)

Parameters

path_on_compute
str
default value: None

The target path on the compute to make the data available at.

Remarks

When the dataset is created from path of a single file, the download location will be path of the single downloaded file. Otherwise, the download location will be path of the enclosing folder for all the downloaded files.

If path_on_compute starts with a /, then it will be treated as an absolute path. If it doesn't start with a /, then it will be treated as a relative path relative to the working directory. If you have specified an absolute path, please make sure that the job has permission to write to that directory.

as_mount

Set the mode to mount.

In the submitted run, files in the datasets will be mounted to local path on the compute target. The mount point can be retrieved from argument values and the input_datasets field of the run context.


   # Given a run submitted with dataset input like this:
   dataset_input = dataset.as_named_input('input_1').as_mount()
   experiment.submit(ScriptRunConfig(source_directory, arguments=[dataset_input]))


   # Following are sample codes running in context of the submitted run:

   # The mount point can be retrieved from argument values
   import sys
   mount_point = sys.argv[1]

   # The mount point can also be retrieved from input_datasets of the run context.
   from azureml.core import Run
   mount_point = Run.get_context().input_datasets['input_1']
as_mount(path_on_compute=None)

Parameters

path_on_compute
str
default value: None

The target path on the compute to make the data available at.

Remarks

When the dataset is created from path of a single file, the mount point will be path of the single mounted file. Otherwise, the mount point will be path of the enclosing folder for all the mounted files.

If path_on_compute starts with a /, then it will be treated as an absolute path. If it doesn't start with a /, then it will be treated as a relative path relative to the working directory. If you have specified an absolute path, please make sure that the job has permission to write to that directory.

Attributes

name

Name of the input.

Returns

Name of the input.