Datastore class

Definition

Represents a storage abstraction over an Azure Machine Learning storage account.

Datastores are attached to workspaces and are used to store connection information to Azure storage services so you can refer to them by name and don't need to remember the connection information and secret used to connect to the storage services.

Examples of supported Azure storage services that can be registered as datastores are:

  • Azure Blob Container

  • Azure File Share

  • Azure Data Lake

  • Azure Data Lake Gen2

  • Azure SQL Database

  • Azure Database for PostgreSQL

  • Databricks File System

  • Azure Database for MySQL

Use this class to perform management operations, including register, list, get, and remove datastores. Datastores for each service are created with the register* methods of this class. When using a datastore to access data, you must have permission to access that data, which depends on the credentials registered with the datastore.

For more information on datastores and how they can be used in machine learning see the following articles:

Datastore(workspace, name=None)
Inheritance
builtins.object
Datastore

Remarks

The following example shows how to create a Datastore connected to Azure Blob Container.


   from msrest.exceptions import HttpOperationError

   blob_datastore_name='MyBlobDatastore'
   account_name=os.getenv("BLOB_ACCOUNTNAME_62", "<my-account-name>") # Storage account name
   container_name=os.getenv("BLOB_CONTAINER_62", "<my-container-name>") # Name of Azure blob container
   account_key=os.getenv("BLOB_ACCOUNT_KEY_62", "<my-account-key>") # Storage account key

   try:
       blob_datastore = Datastore.get(ws, blob_datastore_name)
       print("Found Blob Datastore with name: %s" % blob_datastore_name)
   except HttpOperationError:
       blob_datastore = Datastore.register_azure_blob_container(
           workspace=ws,
           datastore_name=blob_datastore_name,
           account_name=account_name, # Storage account name
           container_name=container_name, # Name of Azure blob container
           account_key=account_key) # Storage account key
       print("Registered blob datastore with name: %s" % blob_datastore_name)

   blob_data_ref = DataReference(
       datastore=blob_datastore,
       data_reference_name="blob_test_data",
       path_on_datastore="testdata")

Full sample is available from https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-data-transfer.ipynb

Methods

get(workspace, datastore_name)

Get a datastore by name. This is same as calling the constructor.

get_default(workspace)

Get the default datastore for the workspace.

register_azure_blob_container(workspace, datastore_name, container_name, account_name, sas_token=None, account_key=None, protocol=None, endpoint=None, overwrite=False, create_if_not_exists=False, skip_validation=False, blob_cache_timeout=None, grant_workspace_access=False, subscription_id=None, resource_group=None)

Register an Azure Blob Container to the datastore.

You can choose to use SAS Token or Storage Account Key

register_azure_data_lake(workspace, datastore_name, store_name, tenant_id, client_id, client_secret, resource_url=None, authority_url=None, subscription_id=None, resource_group=None, overwrite=False)

Initialize a new Azure Data Lake Datastore.

register_azure_data_lake_gen2(workspace, datastore_name, filesystem, account_name, tenant_id, client_id, client_secret, resource_url=None, authority_url=None, protocol=None, endpoint=None, overwrite=False)

Initialize a new Azure Data Lake Gen2 Datastore.

register_azure_file_share(workspace, datastore_name, file_share_name, account_name, sas_token=None, account_key=None, protocol=None, endpoint=None, overwrite=False, create_if_not_exists=False, skip_validation=False)

Register an Azure File Share to the datastore.

You can choose to use SAS Token or Storage Account Key

register_azure_my_sql(workspace, datastore_name, server_name, database_name, user_id, user_password, port_number=None, endpoint=None, overwrite=False)

Initialize a new Azure MySQL Datastore.

register_azure_postgre_sql(workspace, datastore_name, server_name, database_name, user_id, user_password, port_number=None, endpoint=None, overwrite=False)

Initialize a new Azure PostgreSQL Datastore.

register_azure_sql_database(workspace, datastore_name, server_name, database_name, tenant_id=None, client_id=None, client_secret=None, resource_url=None, authority_url=None, endpoint=None, overwrite=False, username=None, password=None)

Initialize a new Azure SQL database Datastore.

register_dbfs(workspace, datastore_name)

Initialize a new Databricks File System (DBFS) datastore.

set_as_default()

Set the default datastore.

unregister()

Unregisters the datastore. the underlying storage service will not be deleted.

get(workspace, datastore_name)

Get a datastore by name. This is same as calling the constructor.

get(workspace, datastore_name)

Parameters

workspace
Workspace

The workspace.

datastore_name
str, optional

The name of the datastore, defaults to None, which gets the default datastore.

Returns

The corresponding datastore for that name.

Return type

azureml.data.azure_sql_database.AzureSqlDatabaseDatastore

get_default(workspace)

Get the default datastore for the workspace.

get_default(workspace)

Parameters

workspace
Workspace

The workspace.

Returns

The default datastore for the workspace

Return type

register_azure_blob_container(workspace, datastore_name, container_name, account_name, sas_token=None, account_key=None, protocol=None, endpoint=None, overwrite=False, create_if_not_exists=False, skip_validation=False, blob_cache_timeout=None, grant_workspace_access=False, subscription_id=None, resource_group=None)

Register an Azure Blob Container to the datastore.

You can choose to use SAS Token or Storage Account Key

register_azure_blob_container(workspace, datastore_name, container_name, account_name, sas_token=None, account_key=None, protocol=None, endpoint=None, overwrite=False, create_if_not_exists=False, skip_validation=False, blob_cache_timeout=None, grant_workspace_access=False, subscription_id=None, resource_group=None)

Parameters

workspace
Workspace

The workspace.

datastore_name
str

The name of the datastore, case insensitive, can only contain alphanumeric characters and _.

container_name
str

The name of the azure blob container.

account_name
str

The storage account name.

sas_token
str, optional

An account SAS token, defaults to None.

default value: None
account_key
str, optional

A storage account key, defaults to None.

default value: None
protocol
str, optional

Protocol to use to connect to the blob container. If None, defaults to https.

default value: None
endpoint
str, optional

The endpoint of the blob container. If None, defaults to core.windows.net.

default value: None
overwrite
bool, optional

overwrites an existing datastore. If the datastore does not exist, it will create one, defaults to False

default value: False
create_if_not_exists
bool, optional

create the file share if it does not exists, defaults to False

default value: False
skip_validation
bool, optional

skips validation of storage keys, defaults to False

default value: False
blob_cache_timeout
int, optional

When this blob is mounted, set the cache timeout to this many seconds. If None, defaults to no timeout (i.e. blobs will be cached for the duration of the job when read).

default value: None
grant_workspace_access
bool, optional

grants Workspace Managed Identities(MSI) access to the user storage account, defaults to False. This should be set if the Storage account is in VNET. If set to True, we will use the Workspace MSI token to grant access to the user storage account. It may take a while for the granted access to reflect.

default value: False
subscription_id
str, optional

The subscription id of the storage account, defaults to None.

default value: None
resource_group
str, optional

The resource group of the storage account, defaults to None.

default value: None

Returns

The blob datastore.

Return type

register_azure_data_lake(workspace, datastore_name, store_name, tenant_id, client_id, client_secret, resource_url=None, authority_url=None, subscription_id=None, resource_group=None, overwrite=False)

Initialize a new Azure Data Lake Datastore.

register_azure_data_lake(workspace, datastore_name, store_name, tenant_id, client_id, client_secret, resource_url=None, authority_url=None, subscription_id=None, resource_group=None, overwrite=False)

Parameters

workspace
Workspace

The workspace this datastore belongs to.

datastore_name
str

The datastore name.

store_name
str

The ADLS store name.

tenant_id
str

The Directory ID/Tenant ID of the service principal.

client_id
str

The Client ID/Application ID of the service principal.

client_secret
str

The secret of the service principal.

resource_url
str, optional

The resource URL, which determines what operations will be performed on the Data Lake store, if None, defaults to https://datalake.azure.net/ which allows us to perform filesystem operations.

default value: None
authority_url
str, optional

The authority URL used to authenticate the user, defaults to https://login.microsoftonline.com.

default value: None
subscription_id
str, optional

The ID of the subscription the ADLS store belongs to.

default value: None
resource_group
str, optional

The resource group the ADLS store belongs to.

default value: None
overwrite
bool, optional

Whether to overwrite an existing datastore. If the datastore does not exist, it will create one. The default is False.

default value: False

Returns

Returns the Azure Data Lake Datastore.

Return type

Remarks

Note

Azure Data Lake Datastore supports data transfer and running U-Sql jobs using AML Pipelines.

It does not provide upload and download through the SDK.

register_azure_data_lake_gen2(workspace, datastore_name, filesystem, account_name, tenant_id, client_id, client_secret, resource_url=None, authority_url=None, protocol=None, endpoint=None, overwrite=False)

Initialize a new Azure Data Lake Gen2 Datastore.

register_azure_data_lake_gen2(workspace, datastore_name, filesystem, account_name, tenant_id, client_id, client_secret, resource_url=None, authority_url=None, protocol=None, endpoint=None, overwrite=False)

Parameters

workspace
Workspace

The workspace this datastore belongs to.

datastore_name
str

The datastore name.

filesystem
str

The name of the Data Lake Gen2 filesystem.

account_name
str

The storage account name.

tenant_id
str

The Directory ID/Tenant ID of the service principal.

client_id
str

The Client ID/Application ID of the service principal.

client_secret
str

The secret of the service principal.

resource_url
str, optional

The resource URL, which determines what operations will be performed on the data lake store, defaults to https://storage.azure.com/ which allows us to perform filesystem operations.

default value: None
authority_url
str, optional

The authority URL used to authenticate the user, defaults to https://login.microsoftonline.com.

default value: None
protocol
str, optional

Protocol to use to connect to the blob container. If None, defaults to https.

default value: None
endpoint
str, optional

The endpoint of the blob container. If None, defaults to core.windows.net.

default value: None
overwrite
bool, optional

Whether to overwrite an existing datastore. If the datastore does not exist, it will create one. The default is False.

default value: False

Returns

Returns the Azure Data Lake Gen2 Datastore.

Return type

register_azure_file_share(workspace, datastore_name, file_share_name, account_name, sas_token=None, account_key=None, protocol=None, endpoint=None, overwrite=False, create_if_not_exists=False, skip_validation=False)

Register an Azure File Share to the datastore.

You can choose to use SAS Token or Storage Account Key

register_azure_file_share(workspace, datastore_name, file_share_name, account_name, sas_token=None, account_key=None, protocol=None, endpoint=None, overwrite=False, create_if_not_exists=False, skip_validation=False)

Parameters

workspace
Workspace

The workspace this datastore belongs to.

datastore_name
str

The name of the datastore, case insensitive, can only contain alphanumeric characters and _.

file_share_name
str

The name of the azure file container.

account_name
str

The storage account name.

sas_token
str, optional

An account SAS token, defaults to None.

default value: None
account_key
str, optional

A storage account key, defaults to None.

default value: None
protocol
str, optional

The protocol to use to connect to the file share. If None, defaults to https.

default value: None
endpoint
str, optional

The endpoint of the file share. If None, defaults to core.windows.net.

default value: None
overwrite
bool, optional

Whether to overwrite an existing datastore. If the datastore does not exist, it will create one. The default is False.

default value: False
create_if_not_exists
bool, optional

Whether to create the file share if it does not exists. The default is False.

default value: False
skip_validation
bool, optional

Whether to skip validation of storage keys. The default is False.

default value: False

Returns

The file datastore.

Return type

register_azure_my_sql(workspace, datastore_name, server_name, database_name, user_id, user_password, port_number=None, endpoint=None, overwrite=False)

Initialize a new Azure MySQL Datastore.

register_azure_my_sql(workspace, datastore_name, server_name, database_name, user_id, user_password, port_number=None, endpoint=None, overwrite=False)

Parameters

workspace
Workspace

The workspace this datastore belongs to.

datastore_name
str

The datastore name.

server_name
str

The MySQL server name.

database_name
str

The MySQL database name.

user_id
str

The User ID of the MySQL server.

user_password
str

The user password of the MySQL server.

port_number
str

The port number of the MySQL server.

default value: None
endpoint
str, optional

The endpoint of the MySQL server. If None, defaults to mysql.database.azure.com.

default value: None
overwrite
bool, optional

Whether to overwrite an existing datastore. If the datastore does not exist, it will create one. The default is False.

default value: False

Returns

Returns the MySQL database Datastore.

Return type

register_azure_postgre_sql(workspace, datastore_name, server_name, database_name, user_id, user_password, port_number=None, endpoint=None, overwrite=False)

Initialize a new Azure PostgreSQL Datastore.

register_azure_postgre_sql(workspace, datastore_name, server_name, database_name, user_id, user_password, port_number=None, endpoint=None, overwrite=False)

Parameters

workspace
Workspace

The workspace this datastore belongs to.

datastore_name
str

The datastore name.

server_name
str

The PostgreSQL server name.

database_name
str

The PostgreSQL database name.

user_id
str

The User ID of the PostgreSQL server.

user_password
str

The User Password of the PostgreSQL server.

port_number
str

The Port Number of the PostgreSQL server

default value: None
endpoint
str, optional

The endpoint of the PostgreSQL server. If None, defaults to postgres.database.azure.com.

default value: None
overwrite
bool, optional

Whether to overwrite an existing datastore. If the datastore does not exist, it will create one. The default is False.

default value: False

Returns

Returns the PostgreSQL database Datastore.

Return type

register_azure_sql_database(workspace, datastore_name, server_name, database_name, tenant_id=None, client_id=None, client_secret=None, resource_url=None, authority_url=None, endpoint=None, overwrite=False, username=None, password=None)

Initialize a new Azure SQL database Datastore.

register_azure_sql_database(workspace, datastore_name, server_name, database_name, tenant_id=None, client_id=None, client_secret=None, resource_url=None, authority_url=None, endpoint=None, overwrite=False, username=None, password=None)

Parameters

workspace
Workspace

The workspace this datastore belongs to.

datastore_name
str

The datastore name.

server_name
str

The SQL server name.

database_name
str

The SQL database name.

tenant_id
str

The Directory ID/Tenant ID of the service principal.

default value: None
client_id
str

The Client ID/Application ID of the service principal.

default value: None
client_secret
str

The secret of the service principal.

default value: None
resource_url
str, optional

The resource URL, which determines what operations will be performed on the SQL database store, if None, defaults to https://database.windows.net/.

default value: None
authority_url
str, optional

The authority URL used to authenticate the user, defaults to https://login.microsoftonline.com.

default value: None
endpoint
str, optional

The endpoint of the SQL server. If None, defaults to database.windows.net.

default value: None
overwrite
bool, optional

Whether to overwrite an existing datastore. If the datastore does not exist, it will create one. The default is False.

default value: False
username
str

The username of the database user to access the database.

default value: None
password
str

The password of the database user to access the database.

default value: None

Returns

Returns the SQL database Datastore.

Return type

register_dbfs(workspace, datastore_name)

Initialize a new Databricks File System (DBFS) datastore.

register_dbfs(workspace, datastore_name)

Parameters

workspace
Workspace

The workspace this datastore belongs to.

datastore_name
str

The datastore name.

set_as_default()

Set the default datastore.

set_as_default()

Parameters

datastore_name
str

The name of the datastore.

unregister()

Unregisters the datastore. the underlying storage service will not be deleted.

unregister()