데이터 저장소 만들기

아티클
02/21/2024

적용 대상:Azure CLI ml 확장 v2(현재)Python SDK azure-ai-ml v2(현재)

이 문서에서는 Azure Machine Learning 데이터 저장소를 사용하여 Azure 데이터 스토리지 서비스에 연결하는 방법을 알아봅니다.

필수 구성 요소

Azure 구독 Azure 구독이 아직 없는 경우 시작하기 전에 체험 계정을 만듭니다. Azure Machine Learning 평가판 또는 유료 버전을 사용해 보세요.
Python용 Azure Machine Learning SDK.
Azure Machine Learning 작업 영역

참고 항목

Azure Machine Learning 데이터 저장소는 기본 스토리지 계정 리소스를 만들지 않습니다. 대신 Azure Machine Learning 사용을 위해 기존 스토리지 계정을 연결합니다. Azure Machine Learning 데이터 저장소는 필요하지 않습니다. 기본 데이터에 액세스할 수 있는 경우 스토리지 URI를 직접 사용할 수 있습니다.

Azure Blob 데이터 저장소 만들기

from azure.ai.ml.entities import AzureBlobDatastore
from azure.ai.ml import MLClient

ml_client = MLClient.from_config()

store = AzureBlobDatastore(
    name="",
    description="",
    account_name="",
    container_name=""
)

ml_client.create_or_update(store)

from azure.ai.ml.entities import AzureBlobDatastore
from azure.ai.ml.entities import AccountKeyConfiguration
from azure.ai.ml import MLClient

ml_client = MLClient.from_config()

store = AzureBlobDatastore(
    name="blob_protocol_example",
    description="Datastore pointing to a blob container using https protocol.",
    account_name="mytestblobstore",
    container_name="data-container",
    protocol="https",
    credentials=AccountKeyConfiguration(
        account_key="XXXxxxXXXxXXXXxxXXXXXxXXXXXxXxxXxXXXxXXXxXXxxxXXxxXXXxXxXXXxxXxxXXXXxxxxxXXxxxxxxXXXxXXX"
    ),
)

ml_client.create_or_update(store)

from azure.ai.ml.entities import AzureBlobDatastore
from azure.ai.ml.entities import SasTokenConfiguration
from azure.ai.ml import MLClient

ml_client = MLClient.from_config()

store = AzureBlobDatastore(
    name="blob_sas_example",
    description="Datastore pointing to a blob container using SAS token.",
    account_name="mytestblobstore",
    container_name="data-container",
    credentials=SasTokenConfiguration(
        sas_token= "?xx=XXXX-XX-XX&xx=xxxx&xxx=xxx&xx=xxxxxxxxxxx&xx=XXXX-XX-XXXXX:XX:XXX&xx=XXXX-XX-XXXXX:XX:XXX&xxx=xxxxx&xxx=XXxXXXxxxxxXXXXXXXxXxxxXXXXXxxXXXXXxXXXXxXXXxXXxXX"
    ),
)

ml_client.create_or_update(store)

다음 YAML 파일을 만듭니다(적절한 값을 업데이트해야 합니다.)

# my_blob_datastore.yml
$schema: https://azuremlschemas.azureedge.net/latest/azureBlob.schema.json
name: my_blob_ds # add your datastore name here
type: azure_blob
description: here is a description # add a datastore description here
account_name: my_account_name # add the storage account name here
container_name: my_container_name # add the storage container name here

CLI에서 Azure Machine Learning 데이터 저장소를 만듭니다.

az ml datastore create --file my_blob_datastore.yml

이 YAML 파일을 만듭니다(적절한 값을 업데이트해야 합니다.)

# my_blob_datastore.yml
$schema: https://azuremlschemas.azureedge.net/latest/azureBlob.schema.json
name: blob_example
type: azure_blob
description: Datastore pointing to a blob container.
account_name: mytestblobstore
container_name: data-container
credentials:
  account_key: XXXxxxXXXxXXXXxxXXXXXxXXXXXxXxxXxXXXxXXXxXXxxxXXxxXXXxXxXXXxxXxxXXXXxxxxxXXxxxxxxXXXxXXX

CLI에서 Azure Machine Learning 데이터 저장소를 만듭니다.

az ml datastore create --file my_blob_datastore.yml

이 YAML 파일을 만듭니다(적절한 값을 업데이트해야 합니다.)

# my_blob_datastore.yml
$schema: https://azuremlschemas.azureedge.net/latest/azureBlob.schema.json
name: blob_sas_example
type: azure_blob
description: Datastore pointing to a blob container using SAS token.
account_name: mytestblobstore
container_name: data-container
credentials:
  sas_token: ?xx=XXXX-XX-XX&xx=xxxx&xxx=xxx&xx=xxxxxxxxxxx&xx=XXXX-XX-XXXXX:XX:XXX&xx=XXXX-XX-XXXXX:XX:XXX&xxx=xxxxx&xxx=XXxXXXxxxxxXXXXXXXxXxxxXXXXXxxXXXXXxXXXXxXXXxXXxXX

CLI에서 Azure Machine Learning 데이터 저장소를 만듭니다.

az ml datastore create --file my_blob_datastore.yml

Azure Data Lake Gen2 데이터 저장소 만들기

from azure.ai.ml.entities import AzureDataLakeGen2Datastore
from azure.ai.ml import MLClient

ml_client = MLClient.from_config()

store = AzureDataLakeGen2Datastore(
    name="",
    description="",
    account_name="",
    filesystem=""
)

ml_client.create_or_update(store)

from azure.ai.ml.entities import AzureDataLakeGen2Datastore
from azure.ai.ml.entities._datastore.credentials import ServicePrincipalCredentials

from azure.ai.ml import MLClient

ml_client = MLClient.from_config()

store = AzureDataLakeGen2Datastore(
    name="adls_gen2_example",
    description="Datastore pointing to an Azure Data Lake Storage Gen2.",
    account_name="mytestdatalakegen2",
    filesystem="my-gen2-container",
     credentials=ServicePrincipalCredentials(
        tenant_id= "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
        client_id= "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
        client_secret= "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
    ),
)

ml_client.create_or_update(store)

이 YAML 파일(값 업데이트)을 만듭니다.

# my_adls_datastore.yml
$schema: https://azuremlschemas.azureedge.net/latest/azureDataLakeGen2.schema.json
name: adls_gen2_credless_example
type: azure_data_lake_gen2
description: Credential-less datastore pointing to an Azure Data Lake Storage Gen2.
account_name: mytestdatalakegen2
filesystem: my-gen2-container

CLI에서 Azure Machine Learning 데이터 저장소를 만듭니다.

az ml datastore create --file my_adls_datastore.yml

이 YAML 파일(값 업데이트)을 만듭니다.

# my_adls_datastore.yml
$schema: https://azuremlschemas.azureedge.net/latest/azureDataLakeGen2.schema.json
name: adls_gen2_example
type: azure_data_lake_gen2
description: Datastore pointing to an Azure Data Lake Storage Gen2.
account_name: mytestdatalakegen2
filesystem: my-gen2-container
credentials:
  tenant_id: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
  client_id: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
  client_secret: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

CLI에서 Azure Machine Learning 데이터 저장소를 만듭니다.

az ml datastore create --file my_adls_datastore.yml

Azure Files 데이터 저장소 만들기

from azure.ai.ml.entities import AzureFileDatastore
from azure.ai.ml.entities import AccountKeyConfiguration
from azure.ai.ml import MLClient

ml_client = MLClient.from_config()

store = AzureFileDatastore(
    name="file_example",
    description="Datastore pointing to an Azure File Share.",
    account_name="mytestfilestore",
    file_share_name="my-share",
    credentials=AccountKeyConfiguration(
        account_key= "XXXxxxXXXxXXXXxxXXXXXxXXXXXxXxxXxXXXxXXXxXXxxxXXxxXXXxXxXXXxxXxxXXXXxxxxxXXxxxxxxXXXxXXX"
    ),
)

ml_client.create_or_update(store)

from azure.ai.ml.entities import AzureFileDatastore
from azure.ai.ml.entities import SasTokenConfiguration
from azure.ai.ml import MLClient

ml_client = MLClient.from_config()

store = AzureFileDatastore(
    name="file_sas_example",
    description="Datastore pointing to an Azure File Share using SAS token.",
    account_name="mytestfilestore",
    file_share_name="my-share",
    credentials=SasTokenConfiguration(
        sas_token="?xx=XXXX-XX-XX&xx=xxxx&xxx=xxx&xx=xxxxxxxxxxx&xx=XXXX-XX-XXXXX:XX:XXX&xx=XXXX-XX-XXXXX:XX:XXX&xxx=xxxxx&xxx=XXxXXXxxxxxXXXXXXXxXxxxXXXXXxxXXXXXxXXXXxXXXxXXxXX"
    ),
)

ml_client.create_or_update(store)

이 YAML 파일(값 업데이트)을 만듭니다.

# my_files_datastore.yml
$schema: https://azuremlschemas.azureedge.net/latest/azureFile.schema.json
name: file_example
type: azure_file
description: Datastore pointing to an Azure File Share.
account_name: mytestfilestore
file_share_name: my-share
credentials:
  account_key: XxXxXxXXXXXXXxXxXxxXxxXXXXXXXXxXxxXXxXXXXXXXxxxXxXXxXXXXXxXXxXXXxXxXxxxXXxXXxXXXXXxXxxXX

CLI에서 Azure Machine Learning 데이터 저장소를 만듭니다.

az ml datastore create --file my_files_datastore.yml

이 YAML 파일(값 업데이트)을 만듭니다.

# my_files_datastore.yml
$schema: https://azuremlschemas.azureedge.net/latest/azureFile.schema.json
name: file_sas_example
type: azure_file
description: Datastore pointing to an Azure File Share using SAS token.
account_name: mytestfilestore
file_share_name: my-share
credentials:
  sas_token: ?xx=XXXX-XX-XX&xx=xxxx&xxx=xxx&xx=xxxxxxxxxxx&xx=XXXX-XX-XXXXX:XX:XXX&xx=XXXX-XX-XXXXX:XX:XXX&xxx=xxxxx&xxx=XXxXXXxxxxxXXXXXXXxXxxxXXXXXxxXXXXXxXXXXxXXXxXXxXX

CLI에서 Azure Machine Learning 데이터 저장소를 만듭니다.

az ml datastore create --file my_files_datastore.yml

Azure Data Lake Gen1 데이터 저장소 만들기

from azure.ai.ml.entities import AzureDataLakeGen1Datastore
from azure.ai.ml import MLClient

ml_client = MLClient.from_config()

store = AzureDataLakeGen1Datastore(
    name="",
    store_name="",
    description="",
)

ml_client.create_or_update(store)

from azure.ai.ml.entities import AzureDataLakeGen1Datastore
from azure.ai.ml.entities._datastore.credentials import ServicePrincipalCredentials
from azure.ai.ml import MLClient

ml_client = MLClient.from_config()

store = AzureDataLakeGen1Datastore(
    name="adls_gen1_example",
    description="Datastore pointing to an Azure Data Lake Storage Gen1.",
    store_name="mytestdatalakegen1",
    credentials=ServicePrincipalCredentials(
        tenant_id= "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
        client_id= "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
        client_secret= "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
    ),
)

ml_client.create_or_update(store)

이 YAML 파일(값 업데이트)을 만듭니다.

# my_adls_datastore.yml
$schema: https://azuremlschemas.azureedge.net/latest/azureDataLakeGen1.schema.json
name: alds_gen1_credless_example
type: azure_data_lake_gen1
description: Credential-less datastore pointing to an Azure Data Lake Storage Gen1.
store_name: mytestdatalakegen1

CLI에서 Azure Machine Learning 데이터 저장소를 만듭니다.

az ml datastore create --file my_adls_datastore.yml

이 YAML 파일(값 업데이트)을 만듭니다.

# my_adls_datastore.yml
$schema: https://azuremlschemas.azureedge.net/latest/azureDataLakeGen1.schema.json
name: adls_gen1_example
type: azure_data_lake_gen1
description: Datastore pointing to an Azure Data Lake Storage Gen1.
store_name: mytestdatalakegen1
credentials:
  tenant_id: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
  client_id: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
  client_secret: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

CLI에서 Azure Machine Learning 데이터 저장소를 만듭니다.

az ml datastore create --file my_adls_datastore.yml

OneLake(Microsoft Fabric) 데이터 저장소 만들기(미리 보기)

이 섹션에서는 OneLake 데이터 저장소를 만드는 다양한 옵션에 대해 설명합니다. OneLake 데이터 저장소는 Microsoft Fabric의 일부입니다. 현재 Azure Machine Learning은 폴더/파일 및 Amazon S3 바로 가기를 포함하는 Microsoft Fabric Lakehouse 아티팩트 연결을 지원합니다. Lakehouse 에 대한 자세한 내용은 Microsoft Fabric의 레이크하우스란?

OneLake 데이터 저장소를 만들려면

엔드포인트
패브릭 작업 영역 이름 또는 GUID
아티팩트 이름 또는 GUID

Microsoft Fabric 인스턴스의 정보입니다. 이 세 스크린샷은 Microsoft Fabric 인스턴스에서 이러한 필수 정보 리소스를 검색하는 방법을 설명합니다.

OneLake 작업 영역 이름

Microsoft Fabric 인스턴스에서 이 스크린샷과 같이 작업 영역 정보를 찾을 수 있습니다. GUID 값 또는 "친숙한 이름"을 사용하여 Azure Machine Learning OneLake 데이터 저장소를 만들 수 있습니다.

OneLake 엔드포인트

이 스크린샷은 Microsoft Fabric 인스턴스에서 엔드포인트 정보를 찾는 방법을 보여줍니다.

OneLake 아티팩트 이름

이 스크린샷은 Microsoft Fabric 인스턴스에서 아티팩트 정보를 찾는 방법을 보여줍니다. 스크린샷은 GUID 값 또는 "친숙한 이름"을 사용하여 Azure Machine Learning OneLake 데이터 저장소를 만드는 방법도 보여 줍니다.

OneLake 데이터 저장소 만들기

from azure.ai.ml.entities import OneLakeDatastore, OneLakeArtifact
from azure.ai.ml import MLClient

ml_client = MLClient.from_config()

store = OneLakeDatastore(
    name="onelake_example_id",
    description="Datastore pointing to an Microsoft fabric artifact.",
    one_lake_workspace_name="AzureML_Sample_OneLakeWS",
    endpoint="msit-onelake.dfs.fabric.microsoft.com"
    artifact = OneLakeArtifact(
        name="AzML_Sample_LH",
        type="lake_house"
    )
)

ml_client.create_or_update(store)

from azure.ai.ml.entities import AzureDataLakeGen1Datastore
from azure.ai.ml.entities._datastore.credentials import ServicePrincipalCredentials
from azure.ai.ml import MLClient

ml_client = MLClient.from_config()

rom azure.ai.ml.entities import OneLakeDatastore, OneLakeArtifact
from azure.ai.ml import MLClient

ml_client = MLClient.from_config()

store = OneLakeDatastore(
    name="onelake_example_sp",
    description="Datastore pointing to an Microsoft fabric artifact.",
    one_lake_workspace_name="AzureML_Sample_OneLakeWS",
    endpoint="msit-onelake.dfs.fabric.microsoft.com"
    artifact = OneLakeArtifact(
    name="AzML_Sample_LH",
    type="lake_house"
    )
    credentials=ServicePrincipalCredentials(
        tenant_id= "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
        client_id= "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
        client_secret= "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
    ),
)

ml_client.create_or_update(store)

다음 YAML 파일(값 업데이트)을 만듭니다.

# my_onelake_datastore.yml
$schema: http://azureml/sdk-2-0/OneLakeDatastore.json
name: onelake_example_id
type: one_lake
description: Credential-less datastore pointing to an OneLake Lakehouse.
one_lake_workspace_name: "AzureML_Sample_OneLakeWS"
endpoint: "msit-onelake.dfs.fabric.microsoft.com"
artifact:
  type: lake_house
  name: "AzML_Sample_LH"

CLI에서 Azure Machine Learning 데이터 저장소를 만듭니다.

az ml datastore create --file my_onelake_datastore.yml

다음 YAML 파일(값 업데이트)을 만듭니다.

# my_onelakesp_datastore.yml
$schema: http://azureml/sdk-2-0/OneLakeDatastore.json
name: onelake_example_id
type: one_lake
description: Credential-less datastore pointing to an OneLake Lakehouse.
one_lake_workspace_name: "AzureML_Sample_OneLakeWS"
endpoint: "msit-onelake.dfs.fabric.microsoft.com"
artifact:
  type: lake_house
  name: "AzML_Sample_LH"
credentials:
  tenant_id: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
  client_id: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
  client_secret: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

CLI에서 Azure Machine Learning 데이터 저장소를 만듭니다.

az ml datastore create --file my_onelakesp_datastore.yml

데이터 저장소 만들기

필수 구성 요소

Azure Blob 데이터 저장소 만들기

Azure Data Lake Gen2 데이터 저장소 만들기

Azure Files 데이터 저장소 만들기

Azure Data Lake Gen1 데이터 저장소 만들기

OneLake(Microsoft Fabric) 데이터 저장소 만들기(미리 보기)

OneLake 작업 영역 이름

OneLake 엔드포인트

OneLake 아티팩트 이름

OneLake 데이터 저장소 만들기

다음 단계

추가 리소스