Datastore Class

Reference

Represents a storage abstraction over an Azure Machine Learning storage account.

Datastores are attached to workspaces and are used to store connection information to Azure storage services so you can refer to them by name and don't need to remember the connection information and secret used to connect to the storage services.

Examples of supported Azure storage services that can be registered as datastores are:

Azure Blob Container
Azure File Share
Azure Data Lake
Azure Data Lake Gen2
Azure SQL Database
Azure Database for PostgreSQL
Databricks File System
Azure Database for MySQL

Use this class to perform management operations, including register, list, get, and remove datastores. Datastores for each service are created with the register* methods of this class. When using a datastore to access data, you must have permission to access that data, which depends on the credentials registered with the datastore.

For more information on datastores and how they can be used in machine learning see the following articles:

Get a datastore by name. This call will make a request to the datastore service.

Inheritance: builtins.object

Datastore

Constructor

Datastore(workspace, name=None)

Parameters

workspace: Workspace

Required

The workspace.

name: str, <xref:optional>

default value: None

The name of the datastore, defaults to None, which gets the default datastore.

Remarks

To interact with data in your datastores for machine learning tasks, like training, create an Azure Machine Learning dataset. Datasets provide functions that load tabular data into a pandas or Spark DataFrame. Datasets also provide the ability to download or mount files of any format from Azure Blob storage, Azure Files, Azure Data Lake Storage Gen1, Azure Data Lake Storage Gen2, Azure SQL Database, and Azure Database for PostgreSQL. Learn more about how to train with datasets.

The following example shows how to create a Datastore connected to Azure Blob Container.


   from azureml.exceptions import UserErrorException

   blob_datastore_name='MyBlobDatastore'
   account_name=os.getenv("BLOB_ACCOUNTNAME_62", "<my-account-name>") # Storage account name
   container_name=os.getenv("BLOB_CONTAINER_62", "<my-container-name>") # Name of Azure blob container
   account_key=os.getenv("BLOB_ACCOUNT_KEY_62", "<my-account-key>") # Storage account key

   try:
       blob_datastore = Datastore.get(ws, blob_datastore_name)
       print("Found Blob Datastore with name: %s" % blob_datastore_name)
   except UserErrorException:
       blob_datastore = Datastore.register_azure_blob_container(
           workspace=ws,
           datastore_name=blob_datastore_name,
           account_name=account_name, # Storage account name
           container_name=container_name, # Name of Azure blob container
           account_key=account_key) # Storage account key
       print("Registered blob datastore with name: %s" % blob_datastore_name)

   blob_data_ref = DataReference(
       datastore=blob_datastore,
       data_reference_name="blob_test_data",
       path_on_datastore="testdata")

Full sample is available from https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-data-transfer.ipynb

Methods

get	Get a datastore by name. This is same as calling the constructor.
get_default	Get the default datastore for the workspace.
register_azure_blob_container	Register an Azure Blob Container to the datastore. Credential based (GA) and identity based (Preview) data access are supported, you can choose to use SAS Token or Storage Account Key. If no credential is saved with the datastore, users' AAD token will be used in notebook or local python program if it directly calls one of these functions: FileDataset.mount FileDataset.download FileDataset.to_path TabularDataset.to_pandas_dataframe TabularDataset.to_dask_dataframe TabularDataset.to_spark_dataframe TabularDataset.to_parquet_files TabularDataset.to_csv_files the identity of the compute target will be used in jobs submitted by Experiment.submit for data access authentication. Learn more here.
register_azure_data_lake	Initialize a new Azure Data Lake Datastore. Credential based (GA) and identity based (Preview) data access are supported, You can register a datastore with Service Principal for credential based data access. If no credential is saved with the datastore, users' AAD token will be used in notebook or local python program if it directly calls one of these functions: FileDataset.mount FileDataset.download FileDataset.to_path TabularDataset.to_pandas_dataframe TabularDataset.to_dask_dataframe TabularDataset.to_spark_dataframe TabularDataset.to_parquet_files TabularDataset.to_csv_files the identity of the compute target will be used in jobs submitted by Experiment.submit for data access authentication. Learn more here. Please see below for an example of how to register an Azure Data Lake Gen1 as a Datastore. adlsgen1_datastore_name='adlsgen1datastore' store_name=os.getenv("ADL_STORENAME", "<my_datastore_name>") # the ADLS name subscription_id=os.getenv("ADL_SUBSCRIPTION", "<my_subscription_id>") # subscription id of the ADLS resource_group=os.getenv("ADL_RESOURCE_GROUP", "<my_resource_group>") # resource group of ADLS tenant_id=os.getenv("ADL_TENANT", "<my_tenant_id>") # tenant id of service principal client_id=os.getenv("ADL_CLIENTID", "<my_client_id>") # client id of service principal client_secret=os.getenv("ADL_CLIENT_SECRET", "<my_client_secret>") # the secret of service principal adls_datastore = Datastore.register_azure_data_lake( workspace=ws, datastore_name=aslsgen1_datastore_name, subscription_id=subscription_id, # subscription id of ADLS account resource_group=resource_group, # resource group of ADLS account store_name=store_name, # ADLS account name tenant_id=tenant_id, # tenant id of service principal client_id=client_id, # client id of service principal client_secret=client_secret) # the secret of service principal
register_azure_data_lake_gen2	Initialize a new Azure Data Lake Gen2 Datastore. Credential based (GA) and identity based (Preview) data access are supported, You can register a datastore with Service Principal for credential based data access. If no credential is saved with the datastore, users' AAD token will be used in notebook or local python program if it directly calls one of these functions: FileDataset.mount FileDataset.download FileDataset.to_path TabularDataset.to_pandas_dataframe TabularDataset.to_dask_dataframe TabularDataset.to_spark_dataframe TabularDataset.to_parquet_files TabularDataset.to_csv_files the identity of the compute target will be used in jobs submitted by Experiment.submit for data access authentication. Learn more here.
register_azure_file_share	Register an Azure File Share to the datastore. You can choose to use SAS Token or Storage Account Key
register_azure_my_sql	Initialize a new Azure MySQL Datastore. MySQL datastore can only be used to create DataReference as input and output to DataTransferStep in Azure Machine Learning pipelines. More details can be found here. Please see below for an example of how to register an Azure MySQL database as a Datastore.
register_azure_postgre_sql	Initialize a new Azure PostgreSQL Datastore. Please see below for an example of how to register an Azure PostgreSQL database as a Datastore.
register_azure_sql_database	Initialize a new Azure SQL database Datastore. Credential based (GA) and identity based (Preview) data access are supported, you can choose to use Service Principal or username + password. If no credential is saved with the datastore, users' AAD token will be used in notebook or local python program if it directly calls one of these functions: FileDataset.mount FileDataset.download FileDataset.to_path TabularDataset.to_pandas_dataframe TabularDataset.to_dask_dataframe TabularDataset.to_spark_dataframe TabularDataset.to_parquet_files TabularDataset.to_csv_files the identity of the compute target will be used in jobs submitted by Experiment.submit for data access authentication. Learn more here. Please see below for an example of how to register an Azure SQL database as a Datastore.
register_dbfs	Initialize a new Databricks File System (DBFS) datastore. The DBFS datastore can only be used to create DataReference as input and PipelineData as output to DatabricksStep in Azure Machine Learning pipelines. More details can be found here..
register_hdfs	Note This is an experimental method, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information. Initialize a new HDFS datastore.
set_as_default	Set the default datastore.
unregister	Unregisters the datastore. the underlying storage service will not be deleted.

get

Get a datastore by name. This is same as calling the constructor.

static get(workspace, datastore_name)

Parameters

workspace: Workspace

Required

The workspace.

datastore_name: str, <xref:optional>

Required

The name of the datastore, defaults to None, which gets the default datastore.

Returns

The corresponding datastore for that name.

Return type

AzureFileDatastore,

AzureBlobDatastore,

AzureDataLakeDatastore,

AzureDataLakeGen2Datastore,

AzureSqlDatabaseDatastore,

AzurePostgreSqlDatastore,

AzureMySqlDatastore,

DBFSDatastore

get_default

Get the default datastore for the workspace.

static get_default(workspace)

Parameters

workspace: Workspace

Required

The workspace.

Returns

The default datastore for the workspace

Return type

AzureFileDatastore,

AzureBlobDatastore

register_azure_blob_container

Credential based (GA) and identity based (Preview) data access are supported, you can choose to use SAS Token or Storage Account Key. If no credential is saved with the datastore, users' AAD token will be used in notebook or local python program if it directly calls one of these functions: FileDataset.mount FileDataset.download FileDataset.to_path TabularDataset.to_pandas_dataframe TabularDataset.to_dask_dataframe TabularDataset.to_spark_dataframe TabularDataset.to_parquet_files TabularDataset.to_csv_files the identity of the compute target will be used in jobs submitted by Experiment.submit for data access authentication. Learn more here.

static register_azure_blob_container(workspace, datastore_name, container_name, account_name, sas_token=None, account_key=None, protocol=None, endpoint=None, overwrite=False, create_if_not_exists=False, skip_validation=False, blob_cache_timeout=None, grant_workspace_access=False, subscription_id=None, resource_group=None)

Parameters

workspace: Workspace

Required

The workspace.

datastore_name: str

Required

The name of the datastore, case insensitive, can only contain alphanumeric characters and _.

container_name: str

Required

The name of the azure blob container.

account_name: str

Required

The storage account name.

sas_token: str, <xref:optional>

default value: None

An account SAS token, defaults to None. For data read, we require a minimum of List & Read permissions for Containers & Objects and for data write we additionally require Write & Add permissions.

account_key: str, <xref:optional>

default value: None

Access keys of your storage account, defaults to None.

protocol: str, <xref:optional>

default value: None

Protocol to use to connect to the blob container. If None, defaults to https.

endpoint: str, <xref:optional>

default value: None

The endpoint of the storage account. If None, defaults to core.windows.net.

overwrite: bool, <xref:optional>

default value: False

overwrites an existing datastore. If the datastore does not exist, it will create one, defaults to False

create_if_not_exists: bool, <xref:optional>

default value: False

create the blob container if it does not exists, defaults to False

skip_validation: bool, <xref:optional>

default value: False

skips validation of storage keys, defaults to False

blob_cache_timeout: int, <xref:optional>

default value: None

When this blob is mounted, set the cache timeout to this many seconds. If None, defaults to no timeout (i.e. blobs will be cached for the duration of the job when read).

grant_workspace_access: bool, <xref:optional>

default value: False

Defaults to False. Set it to True to access data behind virtual network from Machine Learning Studio.This makes data access from Machine Learning Studio use workspace managed identity for authentication, and adds the workspace managed identity as Reader of the storage. You have to be owner or user access administrator of the storage to opt-in. Ask your administrator to configure it for you if you do not have the required permission. Learn more 'https://docs.microsoft.com/azure/machine-learning/how-to-enable-studio-virtual-network'

subscription_id: str, <xref:optional>

default value: None

The subscription id of the storage account, defaults to None.

resource_group: str, <xref:optional>

default value: None

The resource group of the storage account, defaults to None.

Returns

The blob datastore.

Return type

AzureBlobDatastore

Remarks

If you are attaching storage from different region than workspace region, it can result in higher latency and additional network usage costs.

register_azure_data_lake

Initialize a new Azure Data Lake Datastore.

Credential based (GA) and identity based (Preview) data access are supported, You can register a datastore with Service Principal for credential based data access. If no credential is saved with the datastore, users' AAD token will be used in notebook or local python program if it directly calls one of these functions: FileDataset.mount FileDataset.download FileDataset.to_path TabularDataset.to_pandas_dataframe TabularDataset.to_dask_dataframe TabularDataset.to_spark_dataframe TabularDataset.to_parquet_files TabularDataset.to_csv_files the identity of the compute target will be used in jobs submitted by Experiment.submit for data access authentication. Learn more here.

Please see below for an example of how to register an Azure Data Lake Gen1 as a Datastore.


   adlsgen1_datastore_name='adlsgen1datastore'

   store_name=os.getenv("ADL_STORENAME", "<my_datastore_name>") # the ADLS name
   subscription_id=os.getenv("ADL_SUBSCRIPTION", "<my_subscription_id>") # subscription id of the ADLS
   resource_group=os.getenv("ADL_RESOURCE_GROUP", "<my_resource_group>") # resource group of ADLS
   tenant_id=os.getenv("ADL_TENANT", "<my_tenant_id>") # tenant id of service principal
   client_id=os.getenv("ADL_CLIENTID", "<my_client_id>") # client id of service principal
   client_secret=os.getenv("ADL_CLIENT_SECRET", "<my_client_secret>") # the secret of service principal

   adls_datastore = Datastore.register_azure_data_lake(
       workspace=ws,
       datastore_name=aslsgen1_datastore_name,
       subscription_id=subscription_id, # subscription id of ADLS account
       resource_group=resource_group, # resource group of ADLS account
       store_name=store_name, # ADLS account name
       tenant_id=tenant_id, # tenant id of service principal
       client_id=client_id, # client id of service principal
       client_secret=client_secret) # the secret of service principal

static register_azure_data_lake(workspace, datastore_name, store_name, tenant_id=None, client_id=None, client_secret=None, resource_url=None, authority_url=None, subscription_id=None, resource_group=None, overwrite=False, grant_workspace_access=False)

Parameters

workspace: Workspace

Required

The workspace this datastore belongs to.

datastore_name: str

Required

The datastore name.

store_name: str

Required

The ADLS store name.

tenant_id: str, <xref:optional>

default value: None

The Directory ID/Tenant ID of the service principal used to access data.

client_id: str, <xref:optional>

default value: None

The Client ID/Application ID of the service principal used to access data.

client_secret: str, <xref:optional>

default value: None

The Client Secret of the service principal used to access data.

resource_url: str, <xref:optional>

default value: None

The resource URL, which determines what operations will be performed on the Data Lake store, if None, defaults to https://datalake.azure.net/ which allows us to perform filesystem operations.

authority_url: str, <xref:optional>

default value: None

The authority URL used to authenticate the user, defaults to https://login.microsoftonline.com.

subscription_id: str, <xref:optional>

default value: None

The ID of the subscription the ADLS store belongs to.

resource_group: str, <xref:optional>

default value: None

The resource group the ADLS store belongs to.

overwrite: bool, <xref:optional>

default value: False

Whether to overwrite an existing datastore. If the datastore does not exist, it will create one. The default is False.

grant_workspace_access: bool, <xref:optional>

default value: False

Defaults to False. Set it to True to access data behind virtual network from Machine Learning Studio.This makes data access from Machine Learning Studio use workspace managed identity for authentication, and adds the workspace managed identity as Reader of the storage. You have to be Owner or User Access Administrator of the storage to opt-in. Ask your administrator to configure it for you if you do not have the required permission. Learn more 'https://docs.microsoft.com/azure/machine-learning/how-to-enable-studio-virtual-network'

Returns

Returns the Azure Data Lake Datastore.

Return type

AzureDataLakeDatastore

Remarks

If you are attaching storage from different region than workspace region, it can result in higher latency and additional network usage costs.

Note

Azure Data Lake Datastore supports data transfer and running U-Sql jobs using Azure Machine Learning Pipelines.

You can also use it as a data source for Azure Machine Learning Dataset which can be downloaded or mounted on any supported compute.

register_azure_data_lake_gen2

Initialize a new Azure Data Lake Gen2 Datastore.

static register_azure_data_lake_gen2(workspace, datastore_name, filesystem, account_name, tenant_id=None, client_id=None, client_secret=None, resource_url=None, authority_url=None, protocol=None, endpoint=None, overwrite=False, subscription_id=None, resource_group=None, grant_workspace_access=False)

Parameters

workspace: Workspace

Required

The workspace this datastore belongs to.

datastore_name: str

Required

The datastore name.

filesystem: str

Required

The name of the Data Lake Gen2 filesystem.

account_name: str

Required

The storage account name.

tenant_id: str, <xref:optional>

default value: None

The Directory ID/Tenant ID of the service principal.

client_id: str, <xref:optional>

default value: None

The Client ID/Application ID of the service principal.

client_secret: str, <xref:optional>

default value: None

The secret of the service principal.

resource_url: str, <xref:optional>

default value: None

The resource URL, which determines what operations will be performed on the data lake store, defaults to https://storage.azure.com/ which allows us to perform filesystem operations.

authority_url: str, <xref:optional>

default value: None

The authority URL used to authenticate the user, defaults to https://login.microsoftonline.com.

protocol: str, <xref:optional>

default value: None

Protocol to use to connect to the blob container. If None, defaults to https.

endpoint: str, <xref:optional>

default value: None

The endpoint of the storage account. If None, defaults to core.windows.net.

overwrite: bool, <xref:optional>

default value: False

Whether to overwrite an existing datastore. If the datastore does not exist, it will create one. The default is False.

subscription_id: str, <xref:optional>

default value: None

The ID of the subscription the ADLS store belongs to.

resource_group: str, <xref:optional>

default value: None

The resource group the ADLS store belongs to.

grant_workspace_access: bool, <xref:optional>

default value: False

Returns

Returns the Azure Data Lake Gen2 Datastore.

Return type

AzureDataLakeGen2Datastore

Remarks

If you are attaching storage from different region than workspace region, it can result in higher latency and additional network usage costs.

You can choose to use SAS Token or Storage Account Key

static register_azure_file_share(workspace, datastore_name, file_share_name, account_name, sas_token=None, account_key=None, protocol=None, endpoint=None, overwrite=False, create_if_not_exists=False, skip_validation=False)

Parameters

workspace: Workspace

Required

The workspace this datastore belongs to.

datastore_name: str

Required

The name of the datastore, case insensitive, can only contain alphanumeric characters and _.

file_share_name: str

Required

The name of the azure file container.

account_name: str

Required

The storage account name.

sas_token: str, <xref:optional>

default value: None

An account SAS token, defaults to None. For data read, we require a minimum of List & Read permissions for Containers & Objects and for data write we additionally require Write & Add permissions.

account_key: str, <xref:optional>

default value: None

Access keys of your storage account, defaults to None.

protocol: str, <xref:optional>

default value: None

The protocol to use to connect to the file share. If None, defaults to https.

endpoint: str, <xref:optional>

default value: None

The endpoint of the file share. If None, defaults to core.windows.net.

overwrite: bool, <xref:optional>

default value: False

Whether to overwrite an existing datastore. If the datastore does not exist, it will create one. The default is False.

create_if_not_exists: bool, <xref:optional>

default value: False

Whether to create the file share if it does not exists. The default is False.

skip_validation: bool, <xref:optional>

default value: False

Whether to skip validation of storage keys. The default is False.

Returns

The file datastore.

Return type

AzureFileDatastore

Remarks

If you are attaching storage from different region than workspace region, it can result in higher latency and additional network usage costs.

register_azure_my_sql

Initialize a new Azure MySQL Datastore.

MySQL datastore can only be used to create DataReference as input and output to DataTransferStep in Azure Machine Learning pipelines. More details can be found here.

Please see below for an example of how to register an Azure MySQL database as a Datastore.

static register_azure_my_sql(workspace, datastore_name, server_name, database_name, user_id, user_password, port_number=None, endpoint=None, overwrite=False, **kwargs)

Parameters

workspace: Workspace

Required

The workspace this datastore belongs to.

datastore_name: str

Required

The datastore name.

server_name: str

Required

The MySQL server name.

database_name: str

Required

The MySQL database name.

user_id: str

Required

The User ID of the MySQL server.

user_password: str

Required

The user password of the MySQL server.

port_number: str

default value: None

The port number of the MySQL server.

endpoint: str, <xref:optional>

default value: None

The endpoint of the MySQL server. If None, defaults to mysql.database.azure.com.

overwrite: bool, <xref:optional>

default value: False

Whether to overwrite an existing datastore. If the datastore does not exist, it will create one. The default is False.

Returns

Returns the MySQL database Datastore.

Return type

AzureMySqlDatastore

Remarks

If you are attaching storage from different region than workspace region, it can result in higher latency and additional network usage costs.


   mysql_datastore_name="mysqldatastore"
   server_name=os.getenv("MYSQL_SERVERNAME", "<my_server_name>") # FQDN name of the MySQL server
   database_name=os.getenv("MYSQL_DATBASENAME", "<my_database_name>") # Name of the MySQL database
   user_id=os.getenv("MYSQL_USERID", "<my_user_id>") # The User ID of the MySQL server
   user_password=os.getenv("MYSQL_USERPW", "<my_user_password>") # The user password of the MySQL server.

   mysql_datastore = Datastore.register_azure_my_sql(
       workspace=ws,
       datastore_name=mysql_datastore_name,
       server_name=server_name,
       database_name=database_name,
       user_id=user_id,
       user_password=user_password)

register_azure_postgre_sql

Initialize a new Azure PostgreSQL Datastore.

Please see below for an example of how to register an Azure PostgreSQL database as a Datastore.

static register_azure_postgre_sql(workspace, datastore_name, server_name, database_name, user_id, user_password, port_number=None, endpoint=None, overwrite=False, enforce_ssl=True, **kwargs)

Parameters

workspace: Workspace

Required

The workspace this datastore belongs to.

datastore_name: str

Required

The datastore name.

server_name: str

Required

The PostgreSQL server name.

database_name: str

Required

The PostgreSQL database name.

user_id: str

Required

The User ID of the PostgreSQL server.

user_password: str

Required

The User Password of the PostgreSQL server.

port_number: str

default value: None

The Port Number of the PostgreSQL server

endpoint: str, <xref:optional>

default value: None

The endpoint of the PostgreSQL server. If None, defaults to postgres.database.azure.com.

overwrite: bool, <xref:optional>

default value: False

Whether to overwrite an existing datastore. If the datastore does not exist, it will create one. The default is False.

enforce_ssl: bool

default value: True

Indicates SSL requirement of PostgreSQL server. Defaults to True.

Returns

Returns the PostgreSQL database Datastore.

Return type

AzurePostgreSqlDatastore

Remarks

If you are attaching storage from different region than workspace region, it can result in higher latency and additional network usage costs.


   psql_datastore_name="postgresqldatastore"
   server_name=os.getenv("PSQL_SERVERNAME", "<my_server_name>") # FQDN name of the PostgreSQL server
   database_name=os.getenv("PSQL_DATBASENAME", "<my_database_name>") # Name of the PostgreSQL database
   user_id=os.getenv("PSQL_USERID", "<my_user_id>") # The database user id
   user_password=os.getenv("PSQL_USERPW", "<my_user_password>") # The database user password

   psql_datastore = Datastore.register_azure_postgre_sql(
       workspace=ws,
       datastore_name=psql_datastore_name,
       server_name=server_name,
       database_name=database_name,
       user_id=user_id,
       user_password=user_password)

register_azure_sql_database

Initialize a new Azure SQL database Datastore.

Credential based (GA) and identity based (Preview) data access are supported, you can choose to use Service Principal or username + password. If no credential is saved with the datastore, users' AAD token will be used in notebook or local python program if it directly calls one of these functions: FileDataset.mount FileDataset.download FileDataset.to_path TabularDataset.to_pandas_dataframe TabularDataset.to_dask_dataframe TabularDataset.to_spark_dataframe TabularDataset.to_parquet_files TabularDataset.to_csv_files the identity of the compute target will be used in jobs submitted by Experiment.submit for data access authentication. Learn more here.

Please see below for an example of how to register an Azure SQL database as a Datastore.

static register_azure_sql_database(workspace, datastore_name, server_name, database_name, tenant_id=None, client_id=None, client_secret=None, resource_url=None, authority_url=None, endpoint=None, overwrite=False, username=None, password=None, subscription_id=None, resource_group=None, grant_workspace_access=False, **kwargs)

Parameters

workspace: Workspace

Required

The workspace this datastore belongs to.

datastore_name: str

Required

The datastore name.

server_name: str

Required

The SQL server name. For fully qualified domain name like "sample.database.windows.net", the server_name value should be "sample", and the endpoint value should be "database.windows.net".

database_name: str

Required

The SQL database name.

tenant_id: str

default value: None

The Directory ID/Tenant ID of the service principal.

client_id: str

default value: None

The Client ID/Application ID of the service principal.

client_secret: str

default value: None

The secret of the service principal.

resource_url: str, <xref:optional>

default value: None

The resource URL, which determines what operations will be performed on the SQL database store, if None, defaults to https://database.windows.net/.

authority_url: str, <xref:optional>

default value: None

The authority URL used to authenticate the user, defaults to https://login.microsoftonline.com.

endpoint: str, <xref:optional>

default value: None

The endpoint of the SQL server. If None, defaults to database.windows.net.

overwrite: bool, <xref:optional>

default value: False

Whether to overwrite an existing datastore. If the datastore does not exist, it will create one. The default is False.

username: str

default value: None

The username of the database user to access the database.

password: str

default value: None

The password of the database user to access the database.

skip_validation: bool, <xref:optional>

Required

Whether to skip validation of connecting to the SQL database. Defaults to False.

subscription_id: str, <xref:optional>

default value: None

The ID of the subscription the ADLS store belongs to.

resource_group: str, <xref:optional>

default value: None

The resource group the ADLS store belongs to.

grant_workspace_access: bool, <xref:optional>

default value: False

Returns

Returns the SQL database Datastore.

Return type

AzureSqlDatabaseDatastore

Remarks

If you are attaching storage from different region than workspace region, it can result in higher latency and additional network usage costs.


   sql_datastore_name="azuresqldatastore"
   server_name=os.getenv("SQL_SERVERNAME", "<my_server_name>") # Name of the Azure SQL server
   database_name=os.getenv("SQL_DATABASENAME", "<my_database_name>") # Name of the Azure SQL database
   username=os.getenv("SQL_USER_NAME", "<my_sql_user_name>") # The username of the database user.
   password=os.getenv("SQL_USER_PASSWORD", "<my_sql_user_password>") # The password of the database user.

   sql_datastore = Datastore.register_azure_sql_database(
       workspace=ws,
       datastore_name=sql_datastore_name,
       server_name=server_name,  # name should not contain fully qualified domain endpoint
       database_name=database_name,
       username=username,
       password=password,
       endpoint='database.windows.net')

register_dbfs

Initialize a new Databricks File System (DBFS) datastore.

The DBFS datastore can only be used to create DataReference as input and PipelineData as output to DatabricksStep in Azure Machine Learning pipelines. More details can be found here..

static register_dbfs(workspace, datastore_name)

Parameters

workspace: Workspace

Required

The workspace this datastore belongs to.

datastore_name: str

Required

The datastore name.

Returns

Returns the DBFS Datastore.

Return type

DBFSDatastore

Remarks

If you are attaching storage from different region than workspace region, it can result in higher latency and additional network usage costs.

register_hdfs

Note

This is an experimental method, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.

Initialize a new HDFS datastore.

static register_hdfs(workspace, datastore_name, protocol, namenode_address, hdfs_server_certificate, kerberos_realm, kerberos_kdc_address, kerberos_principal, kerberos_keytab=None, kerberos_password=None, overwrite=False)

Parameters

workspace: Workspace

Required

the workspace this datastore belongs to

datastore_name: str

Required

the datastore name

protocol: str or <xref:_restclient.models.enum>

Required

The protocol to use when communicating with the HDFS cluster. http or https. Possible values include: 'http', 'https'

namenode_address: str

Required

The IP address or DNS hostname of the HDFS namenode. Optionally includes a port.

hdfs_server_certificate: str, <xref:optional>

Required

The path to the TLS signing certificate of the HDFS namenode, if using TLS with a self-signed cert.

kerberos_realm: str

Required

The Kerberos realm.

kerberos_kdc_address: str

Required

The IP address or DNS hostname of the Kerberos KDC.

kerberos_principal: str

Required

The Kerberos principal to use for authentication and authorization.

kerberos_keytab: str, <xref:optional>

Required

The path to the keytab file containing the key(s) corresponding to the Kerberos principal. Provide either this, or a password.

kerberos_password: str, <xref:optional>

Required

The password corresponding to the Kerberos principal. Provide either this, or the path to a keytab file.

overwrite: bool, <xref:optional>

Required

overwrites an existing datastore. If the datastore does not exist, it will create one. Defaults to False.

set_as_default

Set the default datastore.

set_as_default()

Parameters

datastore_name: str

Required

The name of the datastore.

unregister

Unregisters the datastore. the underlying storage service will not be deleted.

unregister()

Datastore Class

Constructor

Parameters

Remarks

Methods

get

Parameters

Returns

Return type

get_default

Parameters

Returns

Return type

register_azure_blob_container

Parameters

Returns

Return type

Remarks

register_azure_data_lake

Parameters

Returns

Return type

Remarks

register_azure_data_lake_gen2

Parameters

Returns

Return type

Remarks

register_azure_file_share

Parameters

Returns

Return type

Remarks

register_azure_my_sql

Parameters

Returns

Return type

Remarks

register_azure_postgre_sql

Parameters

Returns

Return type

Remarks

register_azure_sql_database

Parameters

Returns

Return type

Remarks

register_dbfs

Parameters

Returns

Return type

Remarks

register_hdfs

Parameters

set_as_default

Parameters

unregister

Feedback

Feedback

Additional resources