DataReference class

Definition

Represents a reference to data in a datastore.

A DataReference represents a path in a datastore and can be used to describe how and where data should be made available in a run.

For more information on how to use DataReference in two common scenarios, see the articles:

DataReference(datastore, data_reference_name=None, path_on_datastore=None, mode='mount', path_on_compute=None, overwrite=False)
Inheritance
builtins.object
DataReference

Parameters

datastore
AbstractAzureStorageDatastore or AzureDataLakeDatastore

The datastore to reference.

data_reference_name
str

The name of the data reference.

path_on_datastore
str

The relative path in the backing storage for the data reference.

mode
str

The operation on the data reference. "mount" and "download" are supported.

path_on_compute
str

The path on the compute target for the data reference.

overwrite
bool

Indicates whether to overwrite existing data.

Remarks

A DataReference defines both the data location and how the data is used on the target compute binding (mount or upload). The path to the data in the datastore can be the root /, a directory within the datastore, or a file in the datastore.

The following example shows how to work with a DataReference object in an estimation pipeline.


   from azureml.core import Datastore
   from azureml.data.data_reference import DataReference
   from azureml.pipeline.core import PipelineData

   def_blob_store = Datastore(ws, "workspaceblobstore")

   input_data = DataReference(
       datastore=def_blob_store,
       data_reference_name="input_data",
       path_on_datastore="20newsgroups/20news.pkl")

   output = PipelineData("output", datastore=def_blob_store)

   source_directory = 'estimator_train'

Full sample is available from https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-estimatorstep.ipynb

Methods

as_download(path_on_compute=None, overwrite=False)

Switch data reference operation to download.

For more information on which computes and datastores support downloading of the data, see: https://aka.ms/datastore-matrix.

as_mount()

Switch data reference operation to mount.

For more information on which computes and datastores support mounting of the data, see: https://aka.ms/datastore-matrix.

as_upload(path_on_compute=None, overwrite=False)

Switch data reference operation to upload.

For more information on which computes and datastores support uploading of the data, see: https://aka.ms/datastore-matrix.

create(data_reference_name=None, datapath=None, datapath_compute_binding=None)

Create a DataReference using DataPath and DataPathComputeBinding.

path(path=None, data_reference_name=None)

Create a DataReference instance based on the given path.

to_config()

Convert the DataReference object to DataReferenceConfiguration object.

as_download(path_on_compute=None, overwrite=False)

Switch data reference operation to download.

For more information on which computes and datastores support downloading of the data, see: https://aka.ms/datastore-matrix.

as_download(path_on_compute=None, overwrite=False)

Parameters

path_on_compute
str

The path on the compute for the data reference.

default value: None
overwrite
bool

Indicates whether to overwrite existing data.

default value: False

Returns

A new data reference object.

Return type

as_mount()

Switch data reference operation to mount.

For more information on which computes and datastores support mounting of the data, see: https://aka.ms/datastore-matrix.

as_mount()

Returns

A new data reference object.

Return type

as_upload(path_on_compute=None, overwrite=False)

Switch data reference operation to upload.

For more information on which computes and datastores support uploading of the data, see: https://aka.ms/datastore-matrix.

as_upload(path_on_compute=None, overwrite=False)

Parameters

path_on_compute
str

The path on the compute for the data reference.

default value: None
overwrite
bool

Indicates whether to overwrite existing data.

default value: False

Returns

A new data reference object.

Return type

create(data_reference_name=None, datapath=None, datapath_compute_binding=None)

Create a DataReference using DataPath and DataPathComputeBinding.

create(data_reference_name=None, datapath=None, datapath_compute_binding=None)

Parameters

data_reference_name
str

The name for the data reference to create.

default value: None
datapath
DataPath

[Required] The datapath to use.

default value: None
datapath_compute_binding
DataPathComputeBinding

[Required] The datapath compute binding to use.

default value: None

Returns

A DataReference object.

Return type

path(path=None, data_reference_name=None)

Create a DataReference instance based on the given path.

path(path=None, data_reference_name=None)

Parameters

path
str

The path on the datastore.

default value: None
data_reference_name
str

The name of the data reference.

default value: None

Returns

The data reference object.

Return type

to_config()

Convert the DataReference object to DataReferenceConfiguration object.

to_config()

Returns

A new DataReferenceConfiguration object.

Return type