CLI (v2) dataset YAML schema
The source JSON schema can be found at https://azuremlschemas.azureedge.net/latest/dataset.schema.json.
Important
This feature is currently in public preview. This preview version is provided without a service-level agreement, and it's not recommended for production workloads. Certain features might not be supported or might have constrained capabilities. For more information, see Supplemental Terms of Use for Microsoft Azure Previews.
Note
The YAML syntax detailed in this document is based on the JSON schema for the latest version of the ML CLI v2 extension. This syntax is guaranteed only to work with the latest version of the ML CLI v2 extension. You can find the schemas for older extension versions at https://azuremlschemasprod.azureedge.net/.
YAML syntax
| Key | Type | Description | Allowed values |
|---|---|---|---|
$schema |
string | The YAML schema. If you use the Azure Machine Learning VS Code extension to author the YAML file, including $schema at the top of your file enables you to invoke schema and resource completions. |
|
name |
string | Required. Name of the dataset. | |
version |
string | Version of the dataset. If omitted, Azure ML will autogenerate a version. | |
description |
string | Description of the dataset. | |
tags |
object | Dictionary of tags for the dataset. | |
local_path |
string | Absolute or relative path of a single local file or folder from which the dataset is created. One of local_path or paths is required. |
|
paths |
array | A list of URI sources from which the dataset is created. Each entry in the list should adhere to the schema defined in Dataset source path. Currently, only a single source is supported. One of local_path or paths is required. |
Dataset source path
| Key | Type | Description |
|---|---|---|
file |
string | URI to a single file used as a source for the dataset. Supported URI types are azureml, https, wasbs, abfss, and adl. See Core yaml syntax for more information on how to use the azureml:// URI format. One of file or folder is required. |
folder |
string | URI to a folder used as a source for the dataset. Supported URI types are azureml, https, wasbs, abfss, and adl. See Core yaml syntax for more information on how to use the azureml:// URI format. One of file or folder is required. |
Remarks
The az ml dataset commands can be used for managing Azure Machine Learning datasets.
Examples
Examples are available in the examples GitHub repository. Several are shown below.
YAML: datastore file
$schema: https://azuremlschemas.azureedge.net/latest/dataset.schema.json
name: cloud-file-example
description: Dataset created from file in cloud.
paths:
- file: azureml://datastores/workspaceblobstore/paths/example-data/titanic.csv
YAML: datastore folder
$schema: https://azuremlschemas.azureedge.net/latest/dataset.schema.json
name: cloud-folder-example
description: Dataset created from folder in cloud.
paths:
- folder: azureml://datastores/workspaceblobstore/paths/example-data/
YAML: https file
$schema: https://azuremlschemas.azureedge.net/latest/dataset.schema.json
name: cloud-file-https-example
description: Dataset created from a file in cloud using https URL.
paths:
- file: https://mainstorage9c05dabf5c924.blob.core.windows.net/azureml-blobstore-54887b46-3cb0-485b-bb15-62e7b5578ee6/example-data/titanic.csv
YAML: https folder
$schema: https://azuremlschemas.azureedge.net/latest/dataset.schema.json
name: cloud-folder-https-example
description: Dataset created from folder in cloud using https URL.
paths:
- folder: https://mainstorage9c05dabf5c924.blob.core.windows.net/azureml-blobstore-54887b46-3cb0-485b-bb15-62e7b5578ee6/example-data/
YAML: wasbs file
$schema: https://azuremlschemas.azureedge.net/latest/dataset.schema.json
name: cloud-file-wasbs-example
description: Dataset created from a file in cloud using wasbs URL.
paths:
- file: wasbs://mainstorage9c05dabf5c924.blob.core.windows.net/azureml-blobstore-54887b46-3cb0-485b-bb15-62e7b5578ee6/example-data/titanic.csv
YAML: wasbs folder
$schema: https://azuremlschemas.azureedge.net/latest/dataset.schema.json
name: cloud-folder-wasbs-example
description: Dataset created from folder in cloud using wasbs URL.
paths:
- folder: wasbs://mainstorage9c05dabf5c924.blob.core.windows.net/azureml-blobstore-54887b46-3cb0-485b-bb15-62e7b5578ee6/example-data/
YAML: local file
$schema: https://azuremlschemas.azureedge.net/latest/dataset.schema.json
name: local-file-example
description: Dataset created from local file.
local_path: data/titanic.csv
YAML: local folder
$schema: https://azuremlschemas.azureedge.net/latest/dataset.schema.json
name: local-folder-example
description: Dataset created from local folder.
local_path: data