CLI (v2) dataset YAML schema

The source JSON schema can be found at https://azuremlschemas.azureedge.net/latest/dataset.schema.json.

Important

This feature is currently in public preview. This preview version is provided without a service-level agreement, and it's not recommended for production workloads. Certain features might not be supported or might have constrained capabilities. For more information, see Supplemental Terms of Use for Microsoft Azure Previews.

YAML syntax

Key Type Description Allowed values
$schema string The YAML schema. If you use the Azure Machine Learning VS Code extension to author the YAML file, including $schema at the top of your file enables you to invoke schema and resource completions.
name string Required. Name of the dataset.
version string Version of the dataset. If omitted, Azure ML will autogenerate a version.
description string Description of the dataset.
tags object Dictionary of tags for the dataset.
local_path string Absolute or relative path of a single local file or folder from which the dataset is created. One of local_path or paths is required.
paths array A list of URI sources from which the dataset is created. Each entry in the list should adhere to the schema defined in Dataset source path. Currently, only a single source is supported. One of local_path or paths is required.

Dataset source path

Key Type Description
file string URI to a single file used as a source for the dataset. Supported URI types are azureml, https, wasbs, abfss, and adl. See Core yaml syntax for more information on how to use the azureml:// URI format. One of file or folder is required.
folder string URI to a folder used as a source for the dataset. Supported URI types are azureml, https, wasbs, abfss, and adl. See Core yaml syntax for more information on how to use the azureml:// URI format. One of file or folder is required.

Remarks

The az ml dataset commands can be used for managing Azure Machine Learning datasets.

Examples

Examples are available in the examples GitHub repository. Several are shown below.

YAML: datastore file

$schema: https://azuremlschemas.azureedge.net/latest/dataset.schema.json
name: cloud-file-example
description: Dataset created from file in cloud.
paths:
  - file: azureml://datastores/workspaceblobstore/paths/example-data/titanic.csv

YAML: datastore folder

$schema: https://azuremlschemas.azureedge.net/latest/dataset.schema.json
name: cloud-folder-example
description: Dataset created from folder in cloud.
paths:
  - folder: azureml://datastores/workspaceblobstore/paths/example-data/

YAML: https file

$schema: https://azuremlschemas.azureedge.net/latest/dataset.schema.json
name: cloud-file-https-example
description: Dataset created from a file in cloud using https URL.
paths:
  - file: https://mainstorage9c05dabf5c924.blob.core.windows.net/azureml-blobstore-54887b46-3cb0-485b-bb15-62e7b5578ee6/example-data/titanic.csv

YAML: https folder

$schema: https://azuremlschemas.azureedge.net/latest/dataset.schema.json
name: cloud-folder-https-example
description: Dataset created from folder in cloud using https URL.
paths:
  - folder: https://mainstorage9c05dabf5c924.blob.core.windows.net/azureml-blobstore-54887b46-3cb0-485b-bb15-62e7b5578ee6/example-data/

YAML: wasbs file

$schema: https://azuremlschemas.azureedge.net/latest/dataset.schema.json
name: cloud-file-wasbs-example
description: Dataset created from a file in cloud using wasbs URL.
paths:
  - file: wasbs://mainstorage9c05dabf5c924.blob.core.windows.net/azureml-blobstore-54887b46-3cb0-485b-bb15-62e7b5578ee6/example-data/titanic.csv

YAML: wasbs folder

$schema: https://azuremlschemas.azureedge.net/latest/dataset.schema.json
name: cloud-folder-wasbs-example
description: Dataset created from folder in cloud using wasbs URL.
paths:
  - folder: wasbs://mainstorage9c05dabf5c924.blob.core.windows.net/azureml-blobstore-54887b46-3cb0-485b-bb15-62e7b5578ee6/example-data/

YAML: local file

$schema: https://azuremlschemas.azureedge.net/latest/dataset.schema.json
name: local-file-example
description: Dataset created from local file.
local_path: data/titanic.csv

YAML: local folder

$schema: https://azuremlschemas.azureedge.net/latest/dataset.schema.json
name: local-folder-example
description: Dataset created from local folder.
local_path: data

Next steps