Datastore Class

Represents a storage abstraction over an Azure Machine Learning storage account.

Datastores are attached to workspaces and are used to store connection information to Azure storage services so you can refer to them by name and don't need to remember the connection information and secret used to connect to the storage services.

Examples of supported Azure storage services that can be registered as datastores are:

  • Azure Blob Container

  • Azure File Share

  • Azure Data Lake

  • Azure Data Lake Gen2

  • Azure SQL Database

  • Azure Database for PostgreSQL

  • Databricks File System

  • Azure Database for MySQL

Use this class to perform management operations, including register, list, get, and remove datastores. Datastores for each service are created with the register* methods of this class. When using a datastore to access data, you must have permission to access that data, which depends on the credentials registered with the datastore.

For more information on datastores and how they can be used in machine learning see the following articles:

Get a datastore by name. This call will make a request to the datastore service.

Inheritance
builtins.object
Datastore

Constructor

Datastore(workspace, name=None)

Parameters

workspace
Workspace
Required

The workspace.

name
str, <xref:optional>
default value: None

The name of the datastore, defaults to None, which gets the default datastore.

Remarks

To interact with data in your datastores for machine learning tasks, like training, create an Azure Machine Learning dataset. Datasets provide functions that load tabular data into a pandas or Spark DataFrame. Datasets also provide the ability to download or mount files of any format from Azure Blob storage, Azure Files, Azure Data Lake Storage Gen1, Azure Data Lake Storage Gen2, Azure SQL Database, and Azure Database for PostgreSQL. Learn more about how to train with datasets.

The following example shows how to create a Datastore connected to Azure Blob Container.


   from azureml.exceptions import UserErrorException

   blob_datastore_name='MyBlobDatastore'
   account_name=os.getenv("BLOB_ACCOUNTNAME_62", "<my-account-name>") # Storage account name
   container_name=os.getenv("BLOB_CONTAINER_62", "<my-container-name>") # Name of Azure blob container
   account_key=os.getenv("BLOB_ACCOUNT_KEY_62", "<my-account-key>") # Storage account key

   try:
       blob_datastore = Datastore.get(ws, blob_datastore_name)
       print("Found Blob Datastore with name: %s" % blob_datastore_name)
   except UserErrorException:
       blob_datastore = Datastore.register_azure_blob_container(
           workspace=ws,
           datastore_name=blob_datastore_name,
           account_name=account_name, # Storage account name
           container_name=container_name, # Name of Azure blob container
           account_key=account_key) # Storage account key
       print("Registered blob datastore with name: %s" % blob_datastore_name)

   blob_data_ref = DataReference(
       datastore=blob_datastore,
       data_reference_name="blob_test_data",
       path_on_datastore="testdata")

Full sample is available from https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-data-transfer.ipynb

Methods

get

Get a datastore by name. This is same as calling the constructor.

get_default

Get the default datastore for the workspace.

register_azure_blob_container

Register an Azure Blob Container to the datastore.

Credential based (GA) and identity based (Preview) data access are supported, you can choose to use SAS Token or Storage Account Key. If no credential is saved with the datastore, users' AAD token will be used in notebook or local python program if it directly calls one of these functions: FileDataset.mount FileDataset.download FileDataset.to_path TabularDataset.to_pandas_dataframe TabularDataset.to_dask_dataframe TabularDataset.to_spark_dataframe TabularDataset.to_parquet_files TabularDataset.to_csv_files the identity of the compute target will be used in jobs submitted by Experiment.submit for data access authentication. Learn more here.

register_azure_data_lake

Initialize a new Azure Data Lake Datastore.

Credential based (GA) and identity based (Preview) data access are supported, You can register a datastore with Service Principal for credential based data access. If no credential is saved with the datastore, users' AAD token will be used in notebook or local python program if it directly calls one of these functions: FileDataset.mount FileDataset.download FileDataset.to_path TabularDataset.to_pandas_dataframe TabularDataset.to_dask_dataframe TabularDataset.to_spark_dataframe TabularDataset.to_parquet_files TabularDataset.to_csv_files the identity of the compute target will be used in jobs submitted by Experiment.submit for data access authentication. Learn more here.

Please see below for an example of how to register an Azure Data Lake Gen1 as a Datastore.


   adlsgen1_datastore_name='adlsgen1datastore'

   store_name=os.getenv("ADL_STORENAME", "<my_datastore_name>") # the ADLS name
   subscription_id=os.getenv("ADL_SUBSCRIPTION", "<my_subscription_id>") # subscription id of the ADLS
   resource_group=os.getenv("ADL_RESOURCE_GROUP", "<my_resource_group>") # resource group of ADLS
   tenant_id=os.getenv("ADL_TENANT", "<my_tenant_id>") # tenant id of service principal
   client_id=os.getenv("ADL_CLIENTID", "<my_client_id>") # client id of service principal
   client_secret=os.getenv("ADL_CLIENT_SECRET", "<my_client_secret>") # the secret of service principal

   adls_datastore = Datastore.register_azure_data_lake(
       workspace=ws,
       datastore_name=aslsgen1_datastore_name,
       subscription_id=subscription_id, # subscription id of ADLS account
       resource_group=resource_group, # resource group of ADLS account
       store_name=store_name, # ADLS account name
       tenant_id=tenant_id, # tenant id of service principal
       client_id=client_id, # client id of service principal
       client_secret=client_secret) # the secret of service principal
register_azure_data_lake_gen2

Initialize a new Azure Data Lake Gen2 Datastore.

Credential based (GA) and identity based (Preview) data access are supported, You can register a datastore with Service Principal for credential based data access. If no credential is saved with the datastore, users' AAD token will be used in notebook or local python program if it directly calls one of these functions: FileDataset.mount FileDataset.download FileDataset.to_path TabularDataset.to_pandas_dataframe TabularDataset.to_dask_dataframe TabularDataset.to_spark_dataframe TabularDataset.to_parquet_files TabularDataset.to_csv_files the identity of the compute target will be used in jobs submitted by Experiment.submit for data access authentication. Learn more here.

register_azure_file_share

Register an Azure File Share to the datastore.

You can choose to use SAS Token or Storage Account Key

register_azure_my_sql

Initialize a new Azure MySQL Datastore.

MySQL datastore can only be used to create DataReference as input and output to DataTransferStep in Azure Machine Learning pipelines. More details can be found here.

Please see below for an example of how to register an Azure MySQL database as a Datastore.

register_azure_postgre_sql

Initialize a new Azure PostgreSQL Datastore.

Please see below for an example of how to register an Azure PostgreSQL database as a Datastore.

register_azure_sql_database

Initialize a new Azure SQL database Datastore.

Credential based (GA) and identity based (Preview) data access are supported, you can choose to use Service Principal or username + password. If no credential is saved with the datastore, users' AAD token will be used in notebook or local python program if it directly calls one of these functions: FileDataset.mount FileDataset.download FileDataset.to_path TabularDataset.to_pandas_dataframe TabularDataset.to_dask_dataframe TabularDataset.to_spark_dataframe TabularDataset.to_parquet_files TabularDataset.to_csv_files the identity of the compute target will be used in jobs submitted by Experiment.submit for data access authentication. Learn more here.

Please see below for an example of how to register an Azure SQL database as a Datastore.

register_dbfs

Initialize a new Databricks File System (DBFS) datastore.

The DBFS datastore can only be used to create DataReference as input and PipelineData as output to DatabricksStep in Azure Machine Learning pipelines. More details can be found here..

register_hdfs

Note

This is an experimental method, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.

Initialize a new HDFS datastore.

set_as_default

Set the default datastore.

unregister

Unregisters the datastore. the underlying storage service will not be deleted.

get

Get a datastore by name. This is same as calling the constructor.

static get(workspace, datastore_name)

Parameters

workspace
Workspace
Required

The workspace.

datastore_name
str, <xref:optional>
Required

The name of the datastore, defaults to None, which gets the default datastore.

Returns

The corresponding datastore for that name.

Return type

get_default

Get the default datastore for the workspace.

static get_default(workspace)

Parameters

workspace
Workspace
Required

The workspace.

Returns

The default datastore for the workspace

Return type

register_azure_blob_container

Register an Azure Blob Container to the datastore.

Credential based (GA) and identity based (Preview) data access are supported, you can choose to use SAS Token or Storage Account Key. If no credential is saved with the datastore, users' AAD token will be used in notebook or local python program if it directly calls one of these functions: FileDataset.mount FileDataset.download FileDataset.to_path TabularDataset.to_pandas_dataframe TabularDataset.to_dask_dataframe TabularDataset.to_spark_dataframe TabularDataset.to_parquet_files TabularDataset.to_csv_files the identity of the compute target will be used in jobs submitted by Experiment.submit for data access authentication. Learn more here.

static register_azure_blob_container(workspace, datastore_name, container_name, account_name, sas_token=None, account_key=None, protocol=None, endpoint=None, overwrite=False, create_if_not_exists=False, skip_validation=False, blob_cache_timeout=None, grant_workspace_access=False, subscription_id=None, resource_group=None)

Parameters

workspace
Workspace
Required

The workspace.

datastore_name
str
Required

The name of the datastore, case insensitive, can only contain alphanumeric characters and _.

container_name
str
Required

The name of the azure blob container.

account_name
str
Required

The storage account name.

sas_token
str, <xref:optional>
default value: None

An account SAS token, defaults to None. For data read, we require a minimum of List & Read permissions for Containers & Objects and for data write we additionally require Write & Add permissions.

account_key
str, <xref:optional>
default value: None

Access keys of your storage account, defaults to None.

protocol
str, <xref:optional>
default value: None

Protocol to use to connect to the blob container. If None, defaults to https.

endpoint
str, <xref:optional>
default value: None

The endpoint of the storage account. If None, defaults to core.windows.net.

overwrite
bool, <xref:optional>
default value: False

overwrites an existing datastore. If the datastore does not exist, it will create one, defaults to False

create_if_not_exists
bool, <xref:optional>
default value: False

create the blob container if it does not exists, defaults to False

skip_validation
bool, <xref:optional>
default value: False

skips validation of storage keys, defaults to False

blob_cache_timeout
int, <xref:optional>
default value: None

When this blob is mounted, set the cache timeout to this many seconds. If None, defaults to no timeout (i.e. blobs will be cached for the duration of the job when read).

grant_workspace_access
bool, <xref:optional>
default value: False

Defaults to False. Set it to True to access data behind virtual network from Machine Learning Studio.This makes data access from Machine Learning Studio use workspace managed identity for authentication, and adds the workspace managed identity as Reader of the storage. You have to be owner or user access administrator of the storage to opt-in. Ask your administrator to configure it for you if you do not have the required permission. Learn more 'https://docs.microsoft.com/azure/machine-learning/how-to-enable-studio-virtual-network'

subscription_id
str, <xref:optional>
default value: None

The subscription id of the storage account, defaults to None.

resource_group
str, <xref:optional>
default value: None

The resource group of the storage account, defaults to None.

Returns

The blob datastore.

Return type

Remarks

If you are attaching storage from different region than workspace region, it can result in higher latency and additional network usage costs.

register_azure_data_lake

Initialize a new Azure Data Lake Datastore.

Credential based (GA) and identity based (Preview) data access are supported, You can register a datastore with Service Principal for credential based data access. If no credential is saved with the datastore, users' AAD token will be used in notebook or local python program if it directly calls one of these functions: FileDataset.mount FileDataset.download FileDataset.to_path TabularDataset.to_pandas_dataframe TabularDataset.to_dask_dataframe TabularDataset.to_spark_dataframe TabularDataset.to_parquet_files TabularDataset.to_csv_files the identity of the compute target will be used in jobs submitted by Experiment.submit for data access authentication. Learn more here.

Please see below for an example of how to register an Azure Data Lake Gen1 as a Datastore.


   adlsgen1_datastore_name='adlsgen1datastore'

   store_name=os.getenv("ADL_STORENAME", "<my_datastore_name>") # the ADLS name
   subscription_id=os.getenv("ADL_SUBSCRIPTION", "<my_subscription_id>") # subscription id of the ADLS
   resource_group=os.getenv("ADL_RESOURCE_GROUP", "<my_resource_group>") # resource group of ADLS
   tenant_id=os.getenv("ADL_TENANT", "<my_tenant_id>") # tenant id of service principal
   client_id=os.getenv("ADL_CLIENTID", "<my_client_id>") # client id of service principal
   client_secret=os.getenv("ADL_CLIENT_SECRET", "<my_client_secret>") # the secret of service principal

   adls_datastore = Datastore.register_azure_data_lake(
       workspace=ws,
       datastore_name=aslsgen1_datastore_name,
       subscription_id=subscription_id, # subscription id of ADLS account
       resource_group=resource_group, # resource group of ADLS account
       store_name=store_name, # ADLS account name
       tenant_id=tenant_id, # tenant id of service principal
       client_id=client_id, # client id of service principal
       client_secret=client_secret) # the secret of service principal
static register_azure_data_lake(workspace, datastore_name, store_name, tenant_id=None, client_id=None, client_secret=None, resource_url=None, authority_url=None, subscription_id=None, resource_group=None, overwrite=False, grant_workspace_access=False)

Parameters

workspace
Workspace
Required

The workspace this datastore belongs to.

datastore_name
str
Required

The datastore name.

store_name
str
Required

The ADLS store name.

tenant_id
str, <xref:optional>
default value: None

The Directory ID/Tenant ID of the service principal used to access data.

client_id
str, <xref:optional>
default value: None

The Client ID/Application ID of the service principal used to access data.

client_secret
str, <xref:optional>
default value: None

The Client Secret of the service principal used to access data.

resource_url
str, <xref:optional>
default value: None

The resource URL, which determines what operations will be performed on the Data Lake store, if None, defaults to https://datalake.azure.net/ which allows us to perform filesystem operations.

authority_url
str, <xref:optional>
default value: None

The authority URL used to authenticate the user, defaults to https://login.microsoftonline.com.

subscription_id
str, <xref:optional>
default value: None

The ID of the subscription the ADLS store belongs to.

resource_group
str, <xref:optional>
default value: None

The resource group the ADLS store belongs to.

overwrite
bool, <xref:optional>
default value: False

Whether to overwrite an existing datastore. If the datastore does not exist, it will create one. The default is False.

grant_workspace_access
bool, <xref:optional>
default value: False

Defaults to False. Set it to True to access data behind virtual network from Machine Learning Studio.This makes data access from Machine Learning Studio use workspace managed identity for authentication, and adds the workspace managed identity as Reader of the storage. You have to be Owner or User Access Administrator of the storage to opt-in. Ask your administrator to configure it for you if you do not have the required permission. Learn more 'https://docs.microsoft.com/azure/machine-learning/how-to-enable-studio-virtual-network'

Returns

Returns the Azure Data Lake Datastore.

Return type

Remarks

If you are attaching storage from different region than workspace region, it can result in higher latency and additional network usage costs.

Note

Azure Data Lake Datastore supports data transfer and running U-Sql jobs using Azure Machine Learning Pipelines.

You can also use it as a data source for Azure Machine Learning Dataset which can be downloaded or mounted on any supported compute.

register_azure_data_lake_gen2

Initialize a new Azure Data Lake Gen2 Datastore.

Credential based (GA) and identity based (Preview) data access are supported, You can register a datastore with Service Principal for credential based data access. If no credential is saved with the datastore, users' AAD token will be used in notebook or local python program if it directly calls one of these functions: FileDataset.mount FileDataset.download FileDataset.to_path TabularDataset.to_pandas_dataframe TabularDataset.to_dask_dataframe TabularDataset.to_spark_dataframe TabularDataset.to_parquet_files TabularDataset.to_csv_files the identity of the compute target will be used in jobs submitted by Experiment.submit for data access authentication. Learn more here.

static register_azure_data_lake_gen2(workspace, datastore_name, filesystem, account_name, tenant_id=None, client_id=None, client_secret=None, resource_url=None, authority_url=None, protocol=None, endpoint=None, overwrite=False, subscription_id=None, resource_group=None, grant_workspace_access=False)

Parameters

workspace
Workspace
Required

The workspace this datastore belongs to.

datastore_name
str
Required

The datastore name.

filesystem
str
Required

The name of the Data Lake Gen2 filesystem.

account_name
str
Required

The storage account name.

tenant_id
str, <xref:optional>
default value: None

The Directory ID/Tenant ID of the service principal.

client_id
str, <xref:optional>
default value: None

The Client ID/Application ID of the service principal.

client_secret
str, <xref:optional>
default value: None

The secret of the service principal.

resource_url
str, <xref:optional>
default value: None

The resource URL, which determines what operations will be performed on the data lake store, defaults to https://storage.azure.com/ which allows us to perform filesystem operations.

authority_url
str, <xref:optional>
default value: None

The authority URL used to authenticate the user, defaults to https://login.microsoftonline.com.

protocol
str, <xref:optional>
default value: None

Protocol to use to connect to the blob container. If None, defaults to https.

endpoint
str, <xref:optional>
default value: None

The endpoint of the storage account. If None, defaults to core.windows.net.

overwrite
bool, <xref:optional>
default value: False

Whether to overwrite an existing datastore. If the datastore does not exist, it will create one. The default is False.

subscription_id
str, <xref:optional>
default value: None

The ID of the subscription the ADLS store belongs to.

resource_group
str, <xref:optional>
default value: None

The resource group the ADLS store belongs to.

grant_workspace_access
bool, <xref:optional>
default value: False

Defaults to False. Set it to True to access data behind virtual network from Machine Learning Studio.This makes data access from Machine Learning Studio use workspace managed identity for authentication, and adds the workspace managed identity as Reader of the storage. You have to be owner or user access administrator of the storage to opt-in. Ask your administrator to configure it for you if you do not have the required permission. Learn more 'https://docs.microsoft.com/azure/machine-learning/how-to-enable-studio-virtual-network'

Returns

Returns the Azure Data Lake Gen2 Datastore.

Return type

Remarks

If you are attaching storage from different region than workspace region, it can result in higher latency and additional network usage costs.

register_azure_file_share

Register an Azure File Share to the datastore.

You can choose to use SAS Token or Storage Account Key

static register_azure_file_share(workspace, datastore_name, file_share_name, account_name, sas_token=None, account_key=None, protocol=None, endpoint=None, overwrite=False, create_if_not_exists=False, skip_validation=False)

Parameters

workspace
Workspace
Required

The workspace this datastore belongs to.

datastore_name
str
Required

The name of the datastore, case insensitive, can only contain alphanumeric characters and _.

file_share_name
str
Required

The name of the azure file container.

account_name
str
Required

The storage account name.

sas_token
str, <xref:optional>
default value: None

An account SAS token, defaults to None. For data read, we require a minimum of List & Read permissions for Containers & Objects and for data write we additionally require Write & Add permissions.

account_key
str, <xref:optional>
default value: None

Access keys of your storage account, defaults to None.

protocol
str, <xref:optional>
default value: None

The protocol to use to connect to the file share. If None, defaults to https.

endpoint
str, <xref:optional>
default value: None

The endpoint of the file share. If None, defaults to core.windows.net.

overwrite
bool, <xref:optional>
default value: False

Whether to overwrite an existing datastore. If the datastore does not exist, it will create one. The default is False.

create_if_not_exists
bool, <xref:optional>
default value: False

Whether to create the file share if it does not exists. The default is False.

skip_validation
bool, <xref:optional>
default value: False

Whether to skip validation of storage keys. The default is False.

Returns

The file datastore.

Return type

Remarks

If you are attaching storage from different region than workspace region, it can result in higher latency and additional network usage costs.

register_azure_my_sql

Initialize a new Azure MySQL Datastore.

MySQL datastore can only be used to create DataReference as input and output to DataTransferStep in Azure Machine Learning pipelines. More details can be found here.

Please see below for an example of how to register an Azure MySQL database as a Datastore.

static register_azure_my_sql(workspace, datastore_name, server_name, database_name, user_id, user_password, port_number=None, endpoint=None, overwrite=False, **kwargs)

Parameters

workspace
Workspace
Required

The workspace this datastore belongs to.

datastore_name
str
Required

The datastore name.

server_name
str
Required

The MySQL server name.

database_name
str
Required

The MySQL database name.

user_id
str
Required

The User ID of the MySQL server.

user_password
str
Required

The user password of the MySQL server.

port_number
str
default value: None

The port number of the MySQL server.

endpoint
str, <xref:optional>
default value: None

The endpoint of the MySQL server. If None, defaults to mysql.database.azure.com.

overwrite
bool, <xref:optional>
default value: False

Whether to overwrite an existing datastore. If the datastore does not exist, it will create one. The default is False.

Returns

Returns the MySQL database Datastore.

Return type

Remarks

If you are attaching storage from different region than workspace region, it can result in higher latency and additional network usage costs.


   mysql_datastore_name="mysqldatastore"
   server_name=os.getenv("MYSQL_SERVERNAME", "<my_server_name>") # FQDN name of the MySQL server
   database_name=os.getenv("MYSQL_DATBASENAME", "<my_database_name>") # Name of the MySQL database
   user_id=os.getenv("MYSQL_USERID", "<my_user_id>") # The User ID of the MySQL server
   user_password=os.getenv("MYSQL_USERPW", "<my_user_password>") # The user password of the MySQL server.

   mysql_datastore = Datastore.register_azure_my_sql(
       workspace=ws,
       datastore_name=mysql_datastore_name,
       server_name=server_name,
       database_name=database_name,
       user_id=user_id,
       user_password=user_password)

register_azure_postgre_sql

Initialize a new Azure PostgreSQL Datastore.

Please see below for an example of how to register an Azure PostgreSQL database as a Datastore.

static register_azure_postgre_sql(workspace, datastore_name, server_name, database_name, user_id, user_password, port_number=None, endpoint=None, overwrite=False, enforce_ssl=True, **kwargs)

Parameters

workspace
Workspace
Required

The workspace this datastore belongs to.

datastore_name
str
Required

The datastore name.

server_name
str
Required

The PostgreSQL server name.

database_name
str
Required

The PostgreSQL database name.

user_id
str
Required

The User ID of the PostgreSQL server.

user_password
str
Required

The User Password of the PostgreSQL server.

port_number
str
default value: None

The Port Number of the PostgreSQL server

endpoint
str, <xref:optional>
default value: None

The endpoint of the PostgreSQL server. If None, defaults to postgres.database.azure.com.

overwrite
bool, <xref:optional>
default value: False

Whether to overwrite an existing datastore. If the datastore does not exist, it will create one. The default is False.

enforce_ssl
bool
default value: True

Indicates SSL requirement of PostgreSQL server. Defaults to True.

Returns

Returns the PostgreSQL database Datastore.

Return type

Remarks

If you are attaching storage from different region than workspace region, it can result in higher latency and additional network usage costs.


   psql_datastore_name="postgresqldatastore"
   server_name=os.getenv("PSQL_SERVERNAME", "<my_server_name>") # FQDN name of the PostgreSQL server
   database_name=os.getenv("PSQL_DATBASENAME", "<my_database_name>") # Name of the PostgreSQL database
   user_id=os.getenv("PSQL_USERID", "<my_user_id>") # The database user id
   user_password=os.getenv("PSQL_USERPW", "<my_user_password>") # The database user password

   psql_datastore = Datastore.register_azure_postgre_sql(
       workspace=ws,
       datastore_name=psql_datastore_name,
       server_name=server_name,
       database_name=database_name,
       user_id=user_id,
       user_password=user_password)

register_azure_sql_database

Initialize a new Azure SQL database Datastore.

Credential based (GA) and identity based (Preview) data access are supported, you can choose to use Service Principal or username + password. If no credential is saved with the datastore, users' AAD token will be used in notebook or local python program if it directly calls one of these functions: FileDataset.mount FileDataset.download FileDataset.to_path TabularDataset.to_pandas_dataframe TabularDataset.to_dask_dataframe TabularDataset.to_spark_dataframe TabularDataset.to_parquet_files TabularDataset.to_csv_files the identity of the compute target will be used in jobs submitted by Experiment.submit for data access authentication. Learn more here.

Please see below for an example of how to register an Azure SQL database as a Datastore.

static register_azure_sql_database(workspace, datastore_name, server_name, database_name, tenant_id=None, client_id=None, client_secret=None, resource_url=None, authority_url=None, endpoint=None, overwrite=False, username=None, password=None, subscription_id=None, resource_group=None, grant_workspace_access=False, **kwargs)

Parameters

workspace
Workspace
Required

The workspace this datastore belongs to.

datastore_name
str
Required

The datastore name.

server_name
str
Required

The SQL server name. For fully qualified domain name like "sample.database.windows.net", the server_name value should be "sample", and the endpoint value should be "database.windows.net".

database_name
str
Required

The SQL database name.

tenant_id
str
default value: None

The Directory ID/Tenant ID of the service principal.

client_id
str
default value: None

The Client ID/Application ID of the service principal.

client_secret
str
default value: None

The secret of the service principal.

resource_url
str, <xref:optional>
default value: None

The resource URL, which determines what operations will be performed on the SQL database store, if None, defaults to https://database.windows.net/.

authority_url
str, <xref:optional>
default value: None

The authority URL used to authenticate the user, defaults to https://login.microsoftonline.com.

endpoint
str, <xref:optional>
default value: None

The endpoint of the SQL server. If None, defaults to database.windows.net.

overwrite
bool, <xref:optional>
default value: False

Whether to overwrite an existing datastore. If the datastore does not exist, it will create one. The default is False.

username
str
default value: None

The username of the database user to access the database.

password
str
default value: None

The password of the database user to access the database.

skip_validation
bool, <xref:optional>
Required

Whether to skip validation of connecting to the SQL database. Defaults to False.

subscription_id
str, <xref:optional>
default value: None

The ID of the subscription the ADLS store belongs to.

resource_group
str, <xref:optional>
default value: None

The resource group the ADLS store belongs to.

grant_workspace_access
bool, <xref:optional>
default value: False

Defaults to False. Set it to True to access data behind virtual network from Machine Learning Studio.This makes data access from Machine Learning Studio use workspace managed identity for authentication, and adds the workspace managed identity as Reader of the storage. You have to be owner or user access administrator of the storage to opt-in. Ask your administrator to configure it for you if you do not have the required permission. Learn more 'https://docs.microsoft.com/azure/machine-learning/how-to-enable-studio-virtual-network'

Returns

Returns the SQL database Datastore.

Return type

Remarks

If you are attaching storage from different region than workspace region, it can result in higher latency and additional network usage costs.


   sql_datastore_name="azuresqldatastore"
   server_name=os.getenv("SQL_SERVERNAME", "<my_server_name>") # Name of the Azure SQL server
   database_name=os.getenv("SQL_DATABASENAME", "<my_database_name>") # Name of the Azure SQL database
   username=os.getenv("SQL_USER_NAME", "<my_sql_user_name>") # The username of the database user.
   password=os.getenv("SQL_USER_PASSWORD", "<my_sql_user_password>") # The password of the database user.

   sql_datastore = Datastore.register_azure_sql_database(
       workspace=ws,
       datastore_name=sql_datastore_name,
       server_name=server_name,  # name should not contain fully qualified domain endpoint
       database_name=database_name,
       username=username,
       password=password,
       endpoint='database.windows.net')

register_dbfs

Initialize a new Databricks File System (DBFS) datastore.

The DBFS datastore can only be used to create DataReference as input and PipelineData as output to DatabricksStep in Azure Machine Learning pipelines. More details can be found here..

static register_dbfs(workspace, datastore_name)

Parameters

workspace
Workspace
Required

The workspace this datastore belongs to.

datastore_name
str
Required

The datastore name.

Returns

Returns the DBFS Datastore.

Return type

Remarks

If you are attaching storage from different region than workspace region, it can result in higher latency and additional network usage costs.

register_hdfs

Note

This is an experimental method, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.

Initialize a new HDFS datastore.

static register_hdfs(workspace, datastore_name, protocol, namenode_address, hdfs_server_certificate, kerberos_realm, kerberos_kdc_address, kerberos_principal, kerberos_keytab=None, kerberos_password=None, overwrite=False)

Parameters

workspace
Workspace
Required

the workspace this datastore belongs to

datastore_name
str
Required

the datastore name

protocol
str or <xref:_restclient.models.enum>
Required

The protocol to use when communicating with the HDFS cluster. http or https. Possible values include: 'http', 'https'

namenode_address
str
Required

The IP address or DNS hostname of the HDFS namenode. Optionally includes a port.

hdfs_server_certificate
str, <xref:optional>
Required

The path to the TLS signing certificate of the HDFS namenode, if using TLS with a self-signed cert.

kerberos_realm
str
Required

The Kerberos realm.

kerberos_kdc_address
str
Required

The IP address or DNS hostname of the Kerberos KDC.

kerberos_principal
str
Required

The Kerberos principal to use for authentication and authorization.

kerberos_keytab
str, <xref:optional>
Required

The path to the keytab file containing the key(s) corresponding to the Kerberos principal. Provide either this, or a password.

kerberos_password
str, <xref:optional>
Required

The password corresponding to the Kerberos principal. Provide either this, or the path to a keytab file.

overwrite
bool, <xref:optional>
Required

overwrites an existing datastore. If the datastore does not exist, it will create one. Defaults to False.

set_as_default

Set the default datastore.

set_as_default()

Parameters

datastore_name
str
Required

The name of the datastore.

unregister

Unregisters the datastore. the underlying storage service will not be deleted.

unregister()