Datastore Class

Represents a storage abstraction over an Azure Machine Learning storage account.

Datastores are attached to workspaces and are used to store connection information to Azure storage services so you can refer to them by name and don't need to remember the connection information and secret used to connect to the storage services.

Examples of supported Azure storage services that can be registered as datastores are:

  • Azure Blob Container

  • Azure File Share

  • Azure Data Lake

  • Azure Data Lake Gen2

  • Azure SQL Database

  • Azure Database for PostgreSQL

  • Databricks File System

  • Azure Database for MySQL

Use this class to perform management operations, including register, list, get, and remove datastores. Datastores for each service are created with the register* methods of this class. When using a datastore to access data, you must have permission to access that data, which depends on the credentials registered with the datastore.

For more information on datastores and how they can be used in machine learning see the following articles:

Get a datastore by name. This call will make a request to the datastore service.

Inheritance
builtins.object
Datastore

Constructor

Datastore(workspace, name=None)

Parameters

Name Description
workspace
Required

The workspace.

name
str, <xref:optional>

The name of the datastore, defaults to None, which gets the default datastore.

default value: None

Remarks

To interact with data in your datastores for machine learning tasks, like training, create an Azure Machine Learning dataset. Datasets provide functions that load tabular data into a pandas or Spark DataFrame. Datasets also provide the ability to download or mount files of any format from Azure Blob storage, Azure Files, Azure Data Lake Storage Gen1, Azure Data Lake Storage Gen2, Azure SQL Database, and Azure Database for PostgreSQL. Learn more about how to train with datasets.

The following example shows how to create a Datastore connected to Azure Blob Container.


   from azureml.exceptions import UserErrorException

   blob_datastore_name='MyBlobDatastore'
   account_name=os.getenv("BLOB_ACCOUNTNAME_62", "<my-account-name>") # Storage account name
   container_name=os.getenv("BLOB_CONTAINER_62", "<my-container-name>") # Name of Azure blob container
   account_key=os.getenv("BLOB_ACCOUNT_KEY_62", "<my-account-key>") # Storage account key

   try:
       blob_datastore = Datastore.get(ws, blob_datastore_name)
       print("Found Blob Datastore with name: %s" % blob_datastore_name)
   except UserErrorException:
       blob_datastore = Datastore.register_azure_blob_container(
           workspace=ws,
           datastore_name=blob_datastore_name,
           account_name=account_name, # Storage account name
           container_name=container_name, # Name of Azure blob container
           account_key=account_key) # Storage account key
       print("Registered blob datastore with name: %s" % blob_datastore_name)

   blob_data_ref = DataReference(
       datastore=blob_datastore,
       data_reference_name="blob_test_data",
       path_on_datastore="testdata")

Full sample is available from https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-data-transfer.ipynb

Methods

get

Get a datastore by name. This is same as calling the constructor.

get_default

Get the default datastore for the workspace.

register_azure_blob_container

Register an Azure Blob Container to the datastore.

Credential based (GA) and identity based (Preview) data access are supported, you can choose to use SAS Token or Storage Account Key. If no credential is saved with the datastore, users' AAD token will be used in notebook or local python program if it directly calls one of these functions: FileDataset.mount FileDataset.download FileDataset.to_path TabularDataset.to_pandas_dataframe TabularDataset.to_dask_dataframe TabularDataset.to_spark_dataframe TabularDataset.to_parquet_files TabularDataset.to_csv_files the identity of the compute target will be used in jobs submitted by Experiment.submit for data access authentication. Learn more here.

register_azure_data_lake

Initialize a new Azure Data Lake Datastore.

Credential based (GA) and identity based (Preview) data access are supported, You can register a datastore with Service Principal for credential based data access. If no credential is saved with the datastore, users' AAD token will be used in notebook or local python program if it directly calls one of these functions: FileDataset.mount FileDataset.download FileDataset.to_path TabularDataset.to_pandas_dataframe TabularDataset.to_dask_dataframe TabularDataset.to_spark_dataframe TabularDataset.to_parquet_files TabularDataset.to_csv_files the identity of the compute target will be used in jobs submitted by Experiment.submit for data access authentication. Learn more here.

Please see below for an example of how to register an Azure Data Lake Gen1 as a Datastore.


   adlsgen1_datastore_name='adlsgen1datastore'

   store_name=os.getenv("ADL_STORENAME", "<my_datastore_name>") # the ADLS name
   subscription_id=os.getenv("ADL_SUBSCRIPTION", "<my_subscription_id>") # subscription id of the ADLS
   resource_group=os.getenv("ADL_RESOURCE_GROUP", "<my_resource_group>") # resource group of ADLS
   tenant_id=os.getenv("ADL_TENANT", "<my_tenant_id>") # tenant id of service principal
   client_id=os.getenv("ADL_CLIENTID", "<my_client_id>") # client id of service principal
   client_secret=os.getenv("ADL_CLIENT_SECRET", "<my_client_secret>") # the secret of service principal

   adls_datastore = Datastore.register_azure_data_lake(
       workspace=ws,
       datastore_name=aslsgen1_datastore_name,
       subscription_id=subscription_id, # subscription id of ADLS account
       resource_group=resource_group, # resource group of ADLS account
       store_name=store_name, # ADLS account name
       tenant_id=tenant_id, # tenant id of service principal
       client_id=client_id, # client id of service principal
       client_secret=client_secret) # the secret of service principal
register_azure_data_lake_gen2

Initialize a new Azure Data Lake Gen2 Datastore.

Credential based (GA) and identity based (Preview) data access are supported, You can register a datastore with Service Principal for credential based data access. If no credential is saved with the datastore, users' AAD token will be used in notebook or local python program if it directly calls one of these functions: FileDataset.mount FileDataset.download FileDataset.to_path TabularDataset.to_pandas_dataframe TabularDataset.to_dask_dataframe TabularDataset.to_spark_dataframe TabularDataset.to_parquet_files TabularDataset.to_csv_files the identity of the compute target will be used in jobs submitted by Experiment.submit for data access authentication. Learn more here.

register_azure_file_share

Register an Azure File Share to the datastore.

You can choose to use SAS Token or Storage Account Key

register_azure_my_sql

Initialize a new Azure MySQL Datastore.

MySQL datastore can only be used to create DataReference as input and output to DataTransferStep in Azure Machine Learning pipelines. More details can be found here.

Please see below for an example of how to register an Azure MySQL database as a Datastore.

register_azure_postgre_sql

Initialize a new Azure PostgreSQL Datastore.

Please see below for an example of how to register an Azure PostgreSQL database as a Datastore.

register_azure_sql_database

Initialize a new Azure SQL database Datastore.

Credential based (GA) and identity based (Preview) data access are supported, you can choose to use Service Principal or username + password. If no credential is saved with the datastore, users' AAD token will be used in notebook or local python program if it directly calls one of these functions: FileDataset.mount FileDataset.download FileDataset.to_path TabularDataset.to_pandas_dataframe TabularDataset.to_dask_dataframe TabularDataset.to_spark_dataframe TabularDataset.to_parquet_files TabularDataset.to_csv_files the identity of the compute target will be used in jobs submitted by Experiment.submit for data access authentication. Learn more here.

Please see below for an example of how to register an Azure SQL database as a Datastore.

register_dbfs

Initialize a new Databricks File System (DBFS) datastore.

The DBFS datastore can only be used to create DataReference as input and PipelineData as output to DatabricksStep in Azure Machine Learning pipelines. More details can be found here..

register_hdfs

Note

This is an experimental method, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.

Initialize a new HDFS datastore.

set_as_default

Set the default datastore.

unregister

Unregisters the datastore. the underlying storage service will not be deleted.

get

Get a datastore by name. This is same as calling the constructor.

static get(workspace, datastore_name)

Parameters

Name Description
workspace
Required

The workspace.

datastore_name
Required
str, <xref:optional>

The name of the datastore, defaults to None, which gets the default datastore.

Returns

Type Description

The corresponding datastore for that name.

get_default

Get the default datastore for the workspace.

static get_default(workspace)

Parameters

Name Description
workspace
Required

The workspace.

Returns

Type Description

The default datastore for the workspace

register_azure_blob_container

Register an Azure Blob Container to the datastore.

Credential based (GA) and identity based (Preview) data access are supported, you can choose to use SAS Token or Storage Account Key. If no credential is saved with the datastore, users' AAD token will be used in notebook or local python program if it directly calls one of these functions: FileDataset.mount FileDataset.download FileDataset.to_path TabularDataset.to_pandas_dataframe TabularDataset.to_dask_dataframe TabularDataset.to_spark_dataframe TabularDataset.to_parquet_files TabularDataset.to_csv_files the identity of the compute target will be used in jobs submitted by Experiment.submit for data access authentication. Learn more here.

static register_azure_blob_container(workspace, datastore_name, container_name, account_name, sas_token=None, account_key=None, protocol=None, endpoint=None, overwrite=False, create_if_not_exists=False, skip_validation=False, blob_cache_timeout=None, grant_workspace_access=False, subscription_id=None, resource_group=None)

Parameters

Name Description
workspace
Required

The workspace.

datastore_name
Required
str

The name of the datastore, case insensitive, can only contain alphanumeric characters and _.

container_name
Required
str

The name of the azure blob container.

account_name
Required
str

The storage account name.

sas_token
str, <xref:optional>

An account SAS token, defaults to None. For data read, we require a minimum of List & Read permissions for Containers & Objects and for data write we additionally require Write & Add permissions.

default value: None
account_key
str, <xref:optional>

Access keys of your storage account, defaults to None.

default value: None
protocol
str, <xref:optional>

Protocol to use to connect to the blob container. If None, defaults to https.

default value: None
endpoint
str, <xref:optional>

The endpoint of the storage account. If None, defaults to core.windows.net.

default value: None
overwrite
bool, <xref:optional>

overwrites an existing datastore. If the datastore does not exist, it will create one, defaults to False

default value: False
create_if_not_exists
bool, <xref:optional>

create the blob container if it does not exists, defaults to False

default value: False
skip_validation
bool, <xref:optional>

skips validation of storage keys, defaults to False

default value: False
blob_cache_timeout
int, <xref:optional>

When this blob is mounted, set the cache timeout to this many seconds. If None, defaults to no timeout (i.e. blobs will be cached for the duration of the job when read).

default value: None
grant_workspace_access
bool, <xref:optional>

Defaults to False. Set it to True to access data behind virtual network from Machine Learning Studio.This makes data access from Machine Learning Studio use workspace managed identity for authentication, and adds the workspace managed identity as Reader of the storage. You have to be owner or user access administrator of the storage to opt-in. Ask your administrator to configure it for you if you do not have the required permission. Learn more 'https://docs.microsoft.com/azure/machine-learning/how-to-enable-studio-virtual-network'

default value: False
subscription_id
str, <xref:optional>

The subscription id of the storage account, defaults to None.

default value: None
resource_group
str, <xref:optional>

The resource group of the storage account, defaults to None.

default value: None

Returns

Type Description

The blob datastore.

Remarks

If you are attaching storage from different region than workspace region, it can result in higher latency and additional network usage costs.

register_azure_data_lake

Initialize a new Azure Data Lake Datastore.

Credential based (GA) and identity based (Preview) data access are supported, You can register a datastore with Service Principal for credential based data access. If no credential is saved with the datastore, users' AAD token will be used in notebook or local python program if it directly calls one of these functions: FileDataset.mount FileDataset.download FileDataset.to_path TabularDataset.to_pandas_dataframe TabularDataset.to_dask_dataframe TabularDataset.to_spark_dataframe TabularDataset.to_parquet_files TabularDataset.to_csv_files the identity of the compute target will be used in jobs submitted by Experiment.submit for data access authentication. Learn more here.

Please see below for an example of how to register an Azure Data Lake Gen1 as a Datastore.


   adlsgen1_datastore_name='adlsgen1datastore'

   store_name=os.getenv("ADL_STORENAME", "<my_datastore_name>") # the ADLS name
   subscription_id=os.getenv("ADL_SUBSCRIPTION", "<my_subscription_id>") # subscription id of the ADLS
   resource_group=os.getenv("ADL_RESOURCE_GROUP", "<my_resource_group>") # resource group of ADLS
   tenant_id=os.getenv("ADL_TENANT", "<my_tenant_id>") # tenant id of service principal
   client_id=os.getenv("ADL_CLIENTID", "<my_client_id>") # client id of service principal
   client_secret=os.getenv("ADL_CLIENT_SECRET", "<my_client_secret>") # the secret of service principal

   adls_datastore = Datastore.register_azure_data_lake(
       workspace=ws,
       datastore_name=aslsgen1_datastore_name,
       subscription_id=subscription_id, # subscription id of ADLS account
       resource_group=resource_group, # resource group of ADLS account
       store_name=store_name, # ADLS account name
       tenant_id=tenant_id, # tenant id of service principal
       client_id=client_id, # client id of service principal
       client_secret=client_secret) # the secret of service principal
static register_azure_data_lake(workspace, datastore_name, store_name, tenant_id=None, client_id=None, client_secret=None, resource_url=None, authority_url=None, subscription_id=None, resource_group=None, overwrite=False, grant_workspace_access=False)

Parameters

Name Description
workspace
Required

The workspace this datastore belongs to.

datastore_name
Required
str

The datastore name.

store_name
Required
str

The ADLS store name.

tenant_id
str, <xref:optional>

The Directory ID/Tenant ID of the service principal used to access data.

default value: None
client_id
str, <xref:optional>

The Client ID/Application ID of the service principal used to access data.

default value: None
client_secret
str, <xref:optional>

The Client Secret of the service principal used to access data.

default value: None
resource_url
str, <xref:optional>

The resource URL, which determines what operations will be performed on the Data Lake store, if None, defaults to https://datalake.azure.net/ which allows us to perform filesystem operations.

default value: None
authority_url
str, <xref:optional>

The authority URL used to authenticate the user, defaults to https://login.microsoftonline.com.

default value: None
subscription_id
str, <xref:optional>

The ID of the subscription the ADLS store belongs to.

default value: None
resource_group
str, <xref:optional>

The resource group the ADLS store belongs to.

default value: None
overwrite
bool, <xref:optional>

Whether to overwrite an existing datastore. If the datastore does not exist, it will create one. The default is False.

default value: False
grant_workspace_access
bool, <xref:optional>

Defaults to False. Set it to True to access data behind virtual network from Machine Learning Studio.This makes data access from Machine Learning Studio use workspace managed identity for authentication, and adds the workspace managed identity as Reader of the storage. You have to be Owner or User Access Administrator of the storage to opt-in. Ask your administrator to configure it for you if you do not have the required permission. Learn more 'https://docs.microsoft.com/azure/machine-learning/how-to-enable-studio-virtual-network'

default value: False

Returns

Type Description

Returns the Azure Data Lake Datastore.

Remarks

If you are attaching storage from different region than workspace region, it can result in higher latency and additional network usage costs.

Note

Azure Data Lake Datastore supports data transfer and running U-Sql jobs using Azure Machine Learning Pipelines.

You can also use it as a data source for Azure Machine Learning Dataset which can be downloaded or mounted on any supported compute.

register_azure_data_lake_gen2

Initialize a new Azure Data Lake Gen2 Datastore.

Credential based (GA) and identity based (Preview) data access are supported, You can register a datastore with Service Principal for credential based data access. If no credential is saved with the datastore, users' AAD token will be used in notebook or local python program if it directly calls one of these functions: FileDataset.mount FileDataset.download FileDataset.to_path TabularDataset.to_pandas_dataframe TabularDataset.to_dask_dataframe TabularDataset.to_spark_dataframe TabularDataset.to_parquet_files TabularDataset.to_csv_files the identity of the compute target will be used in jobs submitted by Experiment.submit for data access authentication. Learn more here.

static register_azure_data_lake_gen2(workspace, datastore_name, filesystem, account_name, tenant_id=None, client_id=None, client_secret=None, resource_url=None, authority_url=None, protocol=None, endpoint=None, overwrite=False, subscription_id=None, resource_group=None, grant_workspace_access=False)

Parameters

Name Description
workspace
Required

The workspace this datastore belongs to.

datastore_name
Required
str

The datastore name.

filesystem
Required
str

The name of the Data Lake Gen2 filesystem.

account_name
Required
str

The storage account name.

tenant_id
str, <xref:optional>

The Directory ID/Tenant ID of the service principal.

default value: None
client_id
str, <xref:optional>

The Client ID/Application ID of the service principal.

default value: None
client_secret
str, <xref:optional>

The secret of the service principal.

default value: None
resource_url
str, <xref:optional>

The resource URL, which determines what operations will be performed on the data lake store, defaults to https://storage.azure.com/ which allows us to perform filesystem operations.

default value: None
authority_url
str, <xref:optional>

The authority URL used to authenticate the user, defaults to https://login.microsoftonline.com.

default value: None
protocol
str, <xref:optional>

Protocol to use to connect to the blob container. If None, defaults to https.

default value: None
endpoint
str, <xref:optional>

The endpoint of the storage account. If None, defaults to core.windows.net.

default value: None
overwrite
bool, <xref:optional>

Whether to overwrite an existing datastore. If the datastore does not exist, it will create one. The default is False.

default value: False
subscription_id
str, <xref:optional>

The ID of the subscription the ADLS store belongs to.

default value: None
resource_group
str, <xref:optional>

The resource group the ADLS store belongs to.

default value: None
grant_workspace_access
bool, <xref:optional>

Defaults to False. Set it to True to access data behind virtual network from Machine Learning Studio.This makes data access from Machine Learning Studio use workspace managed identity for authentication, and adds the workspace managed identity as Reader of the storage. You have to be owner or user access administrator of the storage to opt-in. Ask your administrator to configure it for you if you do not have the required permission. Learn more 'https://docs.microsoft.com/azure/machine-learning/how-to-enable-studio-virtual-network'

default value: False

Returns

Type Description

Returns the Azure Data Lake Gen2 Datastore.

Remarks

If you are attaching storage from different region than workspace region, it can result in higher latency and additional network usage costs.

register_azure_file_share

Register an Azure File Share to the datastore.

You can choose to use SAS Token or Storage Account Key

static register_azure_file_share(workspace, datastore_name, file_share_name, account_name, sas_token=None, account_key=None, protocol=None, endpoint=None, overwrite=False, create_if_not_exists=False, skip_validation=False)

Parameters

Name Description
workspace
Required

The workspace this datastore belongs to.

datastore_name
Required
str

The name of the datastore, case insensitive, can only contain alphanumeric characters and _.

file_share_name
Required
str

The name of the azure file container.

account_name
Required
str

The storage account name.

sas_token
str, <xref:optional>

An account SAS token, defaults to None. For data read, we require a minimum of List & Read permissions for Containers & Objects and for data write we additionally require Write & Add permissions.

default value: None
account_key
str, <xref:optional>

Access keys of your storage account, defaults to None.

default value: None
protocol
str, <xref:optional>

The protocol to use to connect to the file share. If None, defaults to https.

default value: None
endpoint
str, <xref:optional>

The endpoint of the file share. If None, defaults to core.windows.net.

default value: None
overwrite
bool, <xref:optional>

Whether to overwrite an existing datastore. If the datastore does not exist, it will create one. The default is False.

default value: False
create_if_not_exists
bool, <xref:optional>

Whether to create the file share if it does not exists. The default is False.

default value: False
skip_validation
bool, <xref:optional>

Whether to skip validation of storage keys. The default is False.

default value: False

Returns

Type Description

The file datastore.

Remarks

If you are attaching storage from different region than workspace region, it can result in higher latency and additional network usage costs.

register_azure_my_sql

Initialize a new Azure MySQL Datastore.

MySQL datastore can only be used to create DataReference as input and output to DataTransferStep in Azure Machine Learning pipelines. More details can be found here.

Please see below for an example of how to register an Azure MySQL database as a Datastore.

static register_azure_my_sql(workspace, datastore_name, server_name, database_name, user_id, user_password, port_number=None, endpoint=None, overwrite=False, **kwargs)

Parameters

Name Description
workspace
Required

The workspace this datastore belongs to.

datastore_name
Required
str

The datastore name.

server_name
Required
str

The MySQL server name.

database_name
Required
str

The MySQL database name.

user_id
Required
str

The User ID of the MySQL server.

user_password
Required
str

The user password of the MySQL server.

port_number
str

The port number of the MySQL server.

default value: None
endpoint
str, <xref:optional>

The endpoint of the MySQL server. If None, defaults to mysql.database.azure.com.

default value: None
overwrite
bool, <xref:optional>

Whether to overwrite an existing datastore. If the datastore does not exist, it will create one. The default is False.

default value: False

Returns

Type Description

Returns the MySQL database Datastore.

Remarks

If you are attaching storage from different region than workspace region, it can result in higher latency and additional network usage costs.


   mysql_datastore_name="mysqldatastore"
   server_name=os.getenv("MYSQL_SERVERNAME", "<my_server_name>") # FQDN name of the MySQL server
   database_name=os.getenv("MYSQL_DATBASENAME", "<my_database_name>") # Name of the MySQL database
   user_id=os.getenv("MYSQL_USERID", "<my_user_id>") # The User ID of the MySQL server
   user_password=os.getenv("MYSQL_USERPW", "<my_user_password>") # The user password of the MySQL server.

   mysql_datastore = Datastore.register_azure_my_sql(
       workspace=ws,
       datastore_name=mysql_datastore_name,
       server_name=server_name,
       database_name=database_name,
       user_id=user_id,
       user_password=user_password)

register_azure_postgre_sql

Initialize a new Azure PostgreSQL Datastore.

Please see below for an example of how to register an Azure PostgreSQL database as a Datastore.

static register_azure_postgre_sql(workspace, datastore_name, server_name, database_name, user_id, user_password, port_number=None, endpoint=None, overwrite=False, enforce_ssl=True, **kwargs)

Parameters

Name Description
workspace
Required

The workspace this datastore belongs to.

datastore_name
Required
str

The datastore name.

server_name
Required
str

The PostgreSQL server name.

database_name
Required
str

The PostgreSQL database name.

user_id
Required
str

The User ID of the PostgreSQL server.

user_password
Required
str

The User Password of the PostgreSQL server.

port_number
str

The Port Number of the PostgreSQL server

default value: None
endpoint
str, <xref:optional>

The endpoint of the PostgreSQL server. If None, defaults to postgres.database.azure.com.

default value: None
overwrite
bool, <xref:optional>

Whether to overwrite an existing datastore. If the datastore does not exist, it will create one. The default is False.

default value: False
enforce_ssl

Indicates SSL requirement of PostgreSQL server. Defaults to True.

default value: True

Returns

Type Description

Returns the PostgreSQL database Datastore.

Remarks

If you are attaching storage from different region than workspace region, it can result in higher latency and additional network usage costs.


   psql_datastore_name="postgresqldatastore"
   server_name=os.getenv("PSQL_SERVERNAME", "<my_server_name>") # FQDN name of the PostgreSQL server
   database_name=os.getenv("PSQL_DATBASENAME", "<my_database_name>") # Name of the PostgreSQL database
   user_id=os.getenv("PSQL_USERID", "<my_user_id>") # The database user id
   user_password=os.getenv("PSQL_USERPW", "<my_user_password>") # The database user password

   psql_datastore = Datastore.register_azure_postgre_sql(
       workspace=ws,
       datastore_name=psql_datastore_name,
       server_name=server_name,
       database_name=database_name,
       user_id=user_id,
       user_password=user_password)

register_azure_sql_database

Initialize a new Azure SQL database Datastore.

Credential based (GA) and identity based (Preview) data access are supported, you can choose to use Service Principal or username + password. If no credential is saved with the datastore, users' AAD token will be used in notebook or local python program if it directly calls one of these functions: FileDataset.mount FileDataset.download FileDataset.to_path TabularDataset.to_pandas_dataframe TabularDataset.to_dask_dataframe TabularDataset.to_spark_dataframe TabularDataset.to_parquet_files TabularDataset.to_csv_files the identity of the compute target will be used in jobs submitted by Experiment.submit for data access authentication. Learn more here.

Please see below for an example of how to register an Azure SQL database as a Datastore.

static register_azure_sql_database(workspace, datastore_name, server_name, database_name, tenant_id=None, client_id=None, client_secret=None, resource_url=None, authority_url=None, endpoint=None, overwrite=False, username=None, password=None, subscription_id=None, resource_group=None, grant_workspace_access=False, **kwargs)

Parameters

Name Description
workspace
Required

The workspace this datastore belongs to.

datastore_name
Required
str

The datastore name.

server_name
Required
str

The SQL server name. For fully qualified domain name like "sample.database.windows.net", the server_name value should be "sample", and the endpoint value should be "database.windows.net".

database_name
Required
str

The SQL database name.

tenant_id
str

The Directory ID/Tenant ID of the service principal.

default value: None
client_id
str

The Client ID/Application ID of the service principal.

default value: None
client_secret
str

The secret of the service principal.

default value: None
resource_url
str, <xref:optional>

The resource URL, which determines what operations will be performed on the SQL database store, if None, defaults to https://database.windows.net/.

default value: None
authority_url
str, <xref:optional>

The authority URL used to authenticate the user, defaults to https://login.microsoftonline.com.

default value: None
endpoint
str, <xref:optional>

The endpoint of the SQL server. If None, defaults to database.windows.net.

default value: None
overwrite
bool, <xref:optional>

Whether to overwrite an existing datastore. If the datastore does not exist, it will create one. The default is False.

default value: False
username
str

The username of the database user to access the database.

default value: None
password
str

The password of the database user to access the database.

default value: None
skip_validation
Required
bool, <xref:optional>

Whether to skip validation of connecting to the SQL database. Defaults to False.

subscription_id
str, <xref:optional>

The ID of the subscription the ADLS store belongs to.

default value: None
resource_group
str, <xref:optional>

The resource group the ADLS store belongs to.

default value: None
grant_workspace_access
bool, <xref:optional>

Defaults to False. Set it to True to access data behind virtual network from Machine Learning Studio.This makes data access from Machine Learning Studio use workspace managed identity for authentication, and adds the workspace managed identity as Reader of the storage. You have to be owner or user access administrator of the storage to opt-in. Ask your administrator to configure it for you if you do not have the required permission. Learn more 'https://docs.microsoft.com/azure/machine-learning/how-to-enable-studio-virtual-network'

default value: False

Returns

Type Description

Returns the SQL database Datastore.

Remarks

If you are attaching storage from different region than workspace region, it can result in higher latency and additional network usage costs.


   sql_datastore_name="azuresqldatastore"
   server_name=os.getenv("SQL_SERVERNAME", "<my_server_name>") # Name of the Azure SQL server
   database_name=os.getenv("SQL_DATABASENAME", "<my_database_name>") # Name of the Azure SQL database
   username=os.getenv("SQL_USER_NAME", "<my_sql_user_name>") # The username of the database user.
   password=os.getenv("SQL_USER_PASSWORD", "<my_sql_user_password>") # The password of the database user.

   sql_datastore = Datastore.register_azure_sql_database(
       workspace=ws,
       datastore_name=sql_datastore_name,
       server_name=server_name,  # name should not contain fully qualified domain endpoint
       database_name=database_name,
       username=username,
       password=password,
       endpoint='database.windows.net')

register_dbfs

Initialize a new Databricks File System (DBFS) datastore.

The DBFS datastore can only be used to create DataReference as input and PipelineData as output to DatabricksStep in Azure Machine Learning pipelines. More details can be found here..

static register_dbfs(workspace, datastore_name)

Parameters

Name Description
workspace
Required

The workspace this datastore belongs to.

datastore_name
Required
str

The datastore name.

Returns

Type Description

Returns the DBFS Datastore.

Remarks

If you are attaching storage from different region than workspace region, it can result in higher latency and additional network usage costs.

register_hdfs

Note

This is an experimental method, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.

Initialize a new HDFS datastore.

static register_hdfs(workspace, datastore_name, protocol, namenode_address, hdfs_server_certificate, kerberos_realm, kerberos_kdc_address, kerberos_principal, kerberos_keytab=None, kerberos_password=None, overwrite=False)

Parameters

Name Description
workspace
Required

the workspace this datastore belongs to

datastore_name
Required
str

the datastore name

protocol
Required
str or <xref:_restclient.models.enum>

The protocol to use when communicating with the HDFS cluster. http or https. Possible values include: 'http', 'https'

namenode_address
Required
str

The IP address or DNS hostname of the HDFS namenode. Optionally includes a port.

hdfs_server_certificate
Required
str, <xref:optional>

The path to the TLS signing certificate of the HDFS namenode, if using TLS with a self-signed cert.

kerberos_realm
Required
str

The Kerberos realm.

kerberos_kdc_address
Required
str

The IP address or DNS hostname of the Kerberos KDC.

kerberos_principal
Required
str

The Kerberos principal to use for authentication and authorization.

kerberos_keytab
Required
str, <xref:optional>

The path to the keytab file containing the key(s) corresponding to the Kerberos principal. Provide either this, or a password.

kerberos_password
Required
str, <xref:optional>

The password corresponding to the Kerberos principal. Provide either this, or the path to a keytab file.

overwrite
Required
bool, <xref:optional>

overwrites an existing datastore. If the datastore does not exist, it will create one. Defaults to False.

set_as_default

Set the default datastore.

set_as_default()

Parameters

Name Description
datastore_name
Required
str

The name of the datastore.

unregister

Unregisters the datastore. the underlying storage service will not be deleted.

unregister()