Azure Data Lake Storage Gen2 Azure Data Lake Storage Gen2

Azure Data Lake Storage Gen2(也称为 ADLS Gen2)是用于大数据分析的下一代 data lake 解决方案。Azure Data Lake Storage Gen2 (also known as ADLS Gen2) is a next-generation data lake solution for big data analytics. Azure Data Lake Storage Gen2 将 Azure Data Lake Storage Gen1 功能(文件系统语义、文件级安全性和扩展)构建到 Azure Blob 存储中,并具有低成本分层存储、高可用性和灾难恢复功能。Azure Data Lake Storage Gen2 builds Azure Data Lake Storage Gen1 capabilities—file system semantics, file-level security, and scale—into Azure Blob storage, with its low-cost tiered storage, high availability, and disaster recovery features.

有四种方法可以访问 Azure Data Lake Storage Gen2:There are four ways of accessing Azure Data Lake Storage Gen2:

  1. 传递 Azure Active Directory 凭据,也称为凭据传递Pass your Azure Active Directory credentials, also known as credential passthrough.
  2. 使用服务主体和 OAuth 2.0 将 Azure Data Lake Storage Gen2 文件系统装载到 DBFS。Mount an Azure Data Lake Storage Gen2 filesystem to DBFS using a service principal and OAuth 2.0.
  3. 直接使用服务主体。Use a service principal directly.
  4. 直接使用 Azure Data Lake Storage Gen2 存储帐户访问密钥。Use the Azure Data Lake Storage Gen2 storage account access key directly.

本文介绍如何通过使用 Databricks Runtime 内置 Azure Blob File System (ABFS) 驱动程序来访问 Azure Data Lake Storage Gen2。This article explains how to access Azure Data Lake Storage Gen2 using the Azure Blob File System (ABFS) driver built into Databricks Runtime. 还涵盖可以访问 Azure Data Lake Storage Gen2 的所有方式、常见问题和已知问题。It covers all the ways you can access Azure Data Lake Storage Gen2, frequently asked questions, and known issues.

创建 Azure Data Lake Storage Gen2 帐户并初始化文件系统 Create an Azure Data Lake Storage Gen2 account and initialize a filesystem

如果要使用 Azure Data Lake Storage 凭据传递或装载 Azure Data Lake Storage Gen2 文件系统,并且尚未创建 Azure Data Lake Storage Gen2 帐户和初始化文件系统,请执行以下操作:If you want to use Azure Data Lake Storage credential passthrough or mount the Azure Data Lake Storage Gen2 filesystem, and you have not created an Azure Data Lake Storage Gen2 account and initialized a filesystem, do the following:

  1. 创建 Azure Data Lake Storage Gen2 存储帐户并启用分层命名空间,这有助于改进用于分析引擎和框架所熟悉的文件系统性能、POSIX ACL 和文件系统语义。Create your Azure Data Lake Storage Gen2 storage account, enabling the hierarchical namespace, which provides improved filesystem performance, POSIX ACLs, and filesystem semantics that are familiar to analytics engines and frameworks.

    重要

    • 为 Azure Data Lake Storage Gen2 帐户启用分层命名空间时,不需要通过 Azure 门户创建任何 Blob 容器。When the hierarchical namespace is enabled for an Azure Data Lake Storage Gen2 account, you do not need to create any Blob containers through the Azure portal.
    • 启用分层命名空间时,Azure Blob 存储 API 不可用。When the hierarchical namespace is enabled, Azure Blob storage APIs are not available. 请参阅此已知问题说明See this Known issue description. 例如,不能使用 wasbwasbs 方案来访问 blob.core.windows.net 终结点。For example, you cannot use the wasb or wasbs scheme to access the blob.core.windows.net endpoint.
    • 如果启用分层命名空间,则 Azure Blob 存储和 Azure Data Lake Storage Gen2 REST API 之间将不存在数据或操作的互操作性。If you enable the hierarchical namespace there is no interoperability of data or operations between Azure Blob storage and Azure Data Lake Storage Gen2 REST APIs.
  2. 必须先初始化文件系统,然后才能访问它。Initialize a filesystem before you can access it. 如果尚未从 Azure 门户中对其进行初始化,请在笔记本的第一个单元格中输入以下内容:If you haven’t already initialized it from within the Azure portal, enter the following in the first cell of a notebook:

     spark.conf.set(
       "fs.azure.account.key.<storage-account-name>.dfs.core.windows.net",
       dbutils.secrets.get(scope="<scope-name>",key="<storage-account-access-key-name>"))
     spark.conf.set("fs.azure.createRemoteFileSystemDuringInitialization", "true")
     dbutils.fs.ls("abfss://<file-system-name>@<storage-account-name>.dfs.core.windows.net/")
     spark.conf.set("fs.azure.createRemoteFileSystemDuringInitialization", "false")
    

    其中, <storage-account-name> 为你的存储帐户的名称, dbutils.secrets.get(scope="<scope-name>",key="<storage-account-access-key-name>") 检索已存储为机密范围内的机密的存储帐户访问密钥, <file-system-name> 它是要在 Azure Data Lake Storage Gen2 文件系统中创建的文件系统的名称。where <storage-account-name> is the name of your storage account, dbutils.secrets.get(scope="<scope-name>",key="<storage-account-access-key-name>") retrieves your storage account access key that has been stored as a secret in a secret scope, and <file-system-name> is the name for the filesystem to create in the Azure Data Lake Storage Gen2 filesystem.

    此示例使用 Azure 存储帐户访问密钥对存储帐户进行身份验证。This example uses an Azure storage account access key to authenticate to the storage account. 如果使用其他身份验证方法(如 credential passthrough),请删除第一条语句。If you’re using another authentication method such as credential passthrough then remove the first statement.

    每个文件系统只需运行一次,而不需每次都运行笔记本或附加到新群集。You need to run this only once per filesystem, not each time you run the notebook or attach to a new cluster.

    重要

    Azure Data Lake Storage Gen2 filesystem 将验证所有提供的配置密钥,无论它们是否用于装载或直接访问。Azure Data Lake Storage Gen2 filesystem validates all provided configuration keys regardless of whether they will be used for a mount or direct access.

使用你的 Azure Active Directory 凭据自动访问 Access automatically with your Azure Active Directory credentials

可以配置 Azure Databricks 群集,以便使用用于登录到 Azure Databricks 的相同 Azure Active Directory (Azure AD) 标识自动对 Azure Data Lake Storage Gen2 进行身份验证。You can configure your Azure Databricks cluster to let you authenticate automatically to Azure Data Lake Storage Gen2 using the same Azure Active Directory (Azure AD) identity that you use to log into Azure Databricks. 为群集启用 Azure Data Lake Storage 凭据传递时,在该群集上运行的命令可以在 Azure Data Lake Storage Gen2 中读取和写入数据,无需配置用于访问存储的服务主体凭据。When you enable your cluster for Azure Data Lake Storage credential passthrough, commands that you run on that cluster will be able to read and write your data in Azure Data Lake Storage Gen2 without requiring you to configure service principal credentials for access to storage.

有关完整设置和使用说明,请参阅使用 Azure Active Directory 凭据传递保护对 Azure Data Lake Storage 的访问For complete setup and usage instructions, see Secure access to Azure Data Lake Storage using Azure Active Directory credential passthrough.

创建并向服务主体授予权限Create and grant permissions to service principal

如果所选访问方法需要具有足够权限的服务主体,而你没有这样的服务主体,请按照以下步骤操作:If your selected access method requires a service principal with adequate permissions, and you do not have one, follow these steps:

  1. 创建可访问资源的 Azure AD 应用程序和服务主体。Create an Azure AD application and service principal that can access resources. 请注意以下属性:Note the following properties:
    • application-id:唯一标识应用程序的 ID。application-id: An ID that uniquely identifies the application.
    • directory-id:唯一标识 Azure AD 实例的 ID。directory-id: An ID that uniquely identifies the Azure AD instance.
    • storage-account-name:存储帐户的名称。storage-account-name: The name of the storage account.
    • service-credential:一个字符串,应用程序用来证明其身份。service-credential: A string that the application uses to prove its identity.
  2. 注册服务主体,并在 Azure Data Lake Storage Gen2 帐户上授予正确的角色分配,如存储 Blob 数据参与者。Register the service principal, granting the correct role assignment, such as Storage Blob Data Contributor, on the Azure Data Lake Storage Gen2 account.

使用服务主体和 OAuth 2.0 装载 Azure Data Lake Storage Gen2 帐户 Mount an Azure Data Lake Storage Gen2 account using a service principal and OAuth 2.0

可以使用服务主体和 OAuth 2.0 将 Azure Data Lake Storage Gen2 帐户装载到 DBFS,并进行身份验证。You can mount an Azure Data Lake Storage Gen2 account to DBFS, authenticating using a service principal and OAuth 2.0. 此装载是指向一个数据湖存储的指针,因此数据永远不会在本地同步。The mount is a pointer to data lake storage, so the data is never synced locally.

重要

  • 仅支持使用 OAuth 凭据装载 Azure Data Lake Storage Gen2。Mounting an Azure Data Lake Storage Gen2 is supported only using OAuth credentials. 不支持使用帐户访问密钥进行装载。Mounting with an account access key is not supported.
  • Azure Databricks 工作区中的所有用户都有权访问已装载的 Azure Data Lake Storage Gen2 帐户。All users in the Azure Databricks workspace have access to the mounted Azure Data Lake Storage Gen2 account. 用于访问 Azure Data Lake Storage Gen2 帐户的服务客户端应仅授予对该 Azure Data Lake Storage Gen2 帐户的访问权限;不应授予它对 Azure 中其他资源的访问权限。The service client that you use to access the Azure Data Lake Storage Gen2 account should be granted access only to that Azure Data Lake Storage Gen2 account; it should not be granted access to other resources in Azure.
  • 通过群集创建装入点后,该群集的用户可立即访问装入点。Once a mount point is created through a cluster, users of that cluster can immediately access the mount point. 若要在另一个正在运行的群集中使用装入点,则必须在运行的群集上运行 dbutils.fs.refreshMounts(),使新创建的装入点可供使用。To use the mount point in another running cluster, you must run dbutils.fs.refreshMounts() on that running cluster to make the newly created mount point available for use.

Azure Data Lake Storage Gen2 文件系统 Mount Azure Data Lake Storage Gen2 filesystem

  1. 若要装载 Azure Data Lake Storage Gen2 文件系统或其内部文件夹,请使用以下命令:To mount an Azure Data Lake Storage Gen2 filesystem or a folder inside it, use the following command:

    PythonPython

    configs = {"fs.azure.account.auth.type": "OAuth",
               "fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
               "fs.azure.account.oauth2.client.id": "<application-id>",
               "fs.azure.account.oauth2.client.secret": dbutils.secrets.get(scope="<scope-name>",key="<service-credential-key-name>"),
               "fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/<directory-id>/oauth2/token"}
    
    # Optionally, you can add <directory-name> to the source URI of your mount point.
    dbutils.fs.mount(
      source = "abfss://<file-system-name>@<storage-account-name>.dfs.core.windows.net/",
      mount_point = "/mnt/<mount-name>",
      extra_configs = configs)
    

    ScalaScala

    val configs = Map(
      "fs.azure.account.auth.type" -> "OAuth",
      "fs.azure.account.oauth.provider.type" -> "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
      "fs.azure.account.oauth2.client.id" -> "<application-id>",
      "fs.azure.account.oauth2.client.secret" -> dbutils.secrets.get(scope="<scope-name>",key="<service-credential-key-name>"),
      "fs.azure.account.oauth2.client.endpoint" -> "https://login.microsoftonline.com/<directory-id>/oauth2/token")
    
    // Optionally, you can add <directory-name> to the source URI of your mount point.
    dbutils.fs.mount(
      source = "abfss://<file-system-name>@<storage-account-name>.dfs.core.windows.net/",
      mountPoint = "/mnt/<mount-name>",
      extraConfigs = configs)
    

    wherewhere

    • <mount-name> 是 DBFS 路径,用于表示 Data Lake Store 或其中的文件夹(在 source 中指定)将在 DBFS 中装载的位置。<mount-name> is a DBFS path that represents where the Data Lake Store or a folder inside it (specified in source) will be mounted in DBFS.
    • dbutils.secrets.get(scope="<scope-name>",key="<service-credential-key-name>") 检索作为机密已存储在机密范围中的服务凭据。dbutils.secrets.get(scope="<scope-name>",key="<service-credential-key-name>") retrieves your service credential that has been stored as a secret in a secret scope.
  2. 访问 Azure Data Lake Storage Gen2 文件系统中的文件,就像访问 DBFS 中的文件一样;例如:Access files in your Azure Data Lake Storage Gen2 filesystem as if they were files in DBFS; for example:

    PythonPython

    df = spark.read.text("/mnt/%s/...." % <mount-name>)
    df = spark.read.text("dbfs:/mnt/<mount-name>/....")
    

    ScalaScala

    val df = spark.read.text("/mnt/<mount-name>/....")
    val df = spark.read.text("dbfs:/mnt/<mount-name>/....")
    

卸载装入点Unmount a mount point

若要卸载装入点,请使用以下命令:To unmount a mount point, use the following command:

dbutils.fs.unmount("/mnt/<mount-name>")

直接使用服务主体和 OAuth 2.0 进行访问 Access directly with service principal and OAuth 2.0

可以使用服务主体通过 OAuth 2.0 直接访问 Azure Data Lake Storage Gen2 存储帐户(而不是使用 DBFS 装载)。You can access an Azure Data Lake Storage Gen2 storage account directly (as opposed to mounting with DBFS) with OAuth 2.0 using the service principal. 你可以直接访问服务主体有权访问的任何 Azure Data Lake Storage Gen2 存储帐户。You can directly access any Azure Data Lake Storage Gen2 storage account that the service principal has permissions on. 可以在同一 Spark 会话中添加多个存储帐户和服务主体。You can add multiple storage accounts and service principals in the same Spark session.

设置凭据Set credentials

设置凭据的方式取决于在访问 Azure Data Lake Storage Gen2 时计划使用的 API:数据帧、数据集或 RDD。The way you set credentials depends on which API you plan to use when accessing Azure Data Lake Storage Gen2: DataFrame, Dataset, or RDD.

数据帧或数据集 APIDataFrame or DataSet API

如果你使用的是 Spark 数据帧或数据集 API,我们建议你在笔记本的会话配置中设置帐户凭据:If you are using Spark DataFrame or Dataset APIs, we recommend that you set your account credentials in your notebook’s session configs:

spark.conf.set("fs.azure.account.auth.type.<storage-account-name>.dfs.core.windows.net", "OAuth")
spark.conf.set("fs.azure.account.oauth.provider.type.<storage-account-name>.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
spark.conf.set("fs.azure.account.oauth2.client.id.<storage-account-name>.dfs.core.windows.net", "<application-id>")
spark.conf.set("fs.azure.account.oauth2.client.secret.<storage-account-name>.dfs.core.windows.net", dbutils.secrets.get(scope="<scope-name>",key="<service-credential-key-name>"))
spark.conf.set("fs.azure.account.oauth2.client.endpoint.<storage-account-name>.dfs.core.windows.net", "https://login.microsoftonline.com/<directory-id>/oauth2/token")

其中,dbutils.secrets.get(scope="<scope-name>",key="<service-credential-key-name>") 用于检索已作为机密存储在机密范围中的服务凭据。where dbutils.secrets.get(scope="<scope-name>",key="<service-credential-key-name>") retrieves your service credential that has been stored as a secret in a secret scope.

RDD APIRDD API

如果使用 RDD API 访问 Azure Data Lake Storage Gen2,则无法使用 spark.conf.set(...) 访问 Hadoop 配置选项集。If you are using the RDD API to access Azure Data Lake Storage Gen2 you cannot access Hadoop configuration options set using spark.conf.set(...). 因此,必须使用以下方法之一设置凭据:Therefore you must set the credentials using one of the following methods:

  • 创建群集时,将 Hadoop 配置选项指定为 Spark 选项。Specify the Hadoop configuration options as Spark options when you create the cluster. 必须将 spark.hadoop. 前缀添加到相应的 Hadoop 配置键,以便将它们传播到用于 RDD 作业的 Hadoop 配置:You must add the spark.hadoop. prefix to the corresponding Hadoop configuration keys to propagate them to the Hadoop configurations that are used for your RDD jobs:

    fs.azure.account.auth.type.<storage-account-name>.dfs.core.windows.net OAuth
    fs.azure.account.oauth.provider.type.<storage-account-name>.dfs.core.windows.net org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider
    fs.azure.account.oauth2.client.id.<storage-account-name>.dfs.core.windows.net <application-id>
    fs.azure.account.oauth2.client.secret.<storage-account-name>.dfs.core.windows.net <service-credential>
    fs.azure.account.oauth2.client.endpoint.<storage-account-name>.dfs.core.windows.net https://login.microsoftonline.com/<directory-id>/oauth2/token
    
  • Scala 用户可在 spark.sparkContext.hadoopConfiguration 中设置凭据:Scala users can set the credentials in spark.sparkContext.hadoopConfiguration:

    spark.sparkContext.hadoopConfiguration.set("fs.azure.account.auth.type.<storage-account-name>.dfs.core.windows.net", "OAuth")
    spark.sparkContext.hadoopConfiguration.set("fs.azure.account.oauth.provider.type.<storage-account-name>.dfs.core.windows.net",  "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
    spark.sparkContext.hadoopConfiguration.set("fs.azure.account.oauth2.client.id.<storage-account-name>.dfs.core.windows.net", "<application-id>")
    spark.sparkContext.hadoopConfiguration.set("fs.azure.account.oauth2.client.secret.<storage-account-name>.dfs.core.windows.net", dbutils.secrets.get(scope="<scope-name>",key="<service-credential-key-name>"))
    spark.sparkContext.hadoopConfiguration.set("fs.azure.account.oauth2.client.endpoint.<storage-account-name>.dfs.core.windows.net", "https://login.microsoftonline.com/<directory-id>/oauth2/token")
    

    其中,dbutils.secrets.get(scope="<scope-name>",key="<service-credential-key-name>") 用于检索已作为机密存储在机密范围中的服务凭据。where dbutils.secrets.get(scope="<scope-name>",key="<service-credential-key-name>") retrieves your service credential that has been stored as a secret in a secret scope.

警告

这些凭据可供访问群集的所有用户使用。These credentials are available to all users who access the cluster.

设置凭据后,可以使用标准 Spark 和 Databricks API 从存储帐户读取。Once your credentials are set up, you can use standard Spark and Databricks APIs to read from the storage account. 例如: 。For example:

val df = spark.read.parquet("abfss://<file-system-name>@<storage-account-name>.dfs.core.windows.net/<directory-name>")

dbutils.fs.ls("abfss://<file-system-name>@<storage-account-name>.dfs.core.windows.net/<directory-name>")

直接使用存储帐户访问密钥进行访问 Access directly using the storage account access key

可使用存储帐户访问密钥访问 Azure Data Lake Storage Gen2 存储帐户。You can access an Azure Data Lake Storage Gen2 storage account using the storage account access key.

设置凭据Set your credentials

设置凭据的方式取决于在访问 Azure Data Lake Storage Gen2 时计划使用的 API:数据帧、数据集或 RDD。The way you set credentials depends on which API you plan to use when accessing Azure Data Lake Storage Gen2: DataFrame, Dataset, or RDD.

数据帧或数据集 APIDataFrame or DataSet API

如果你使用的是 Spark 数据帧或数据集 API,我们建议你在笔记本的会话配置中设置帐户凭据:If you are using Spark DataFrame or Dataset APIs, we recommend that you set your account credentials in your notebook’s session configs:

spark.conf.set(
  "fs.azure.account.key.<storage-account-name>.dfs.core.windows.net",
  dbutils.secrets.get(scope="<scope-name>",key="<storage-account-access-key-name>"))

其中 dbutils.secrets.get(scope="<scope-name>",key="<storage-account-access-key-name>") 用于检索存储帐户访问密钥,该密钥已作为机密存储在机密范围中。where dbutils.secrets.get(scope="<scope-name>",key="<storage-account-access-key-name>") retrieves your storage account access key that has been stored as a secret in a secret scope.

RDD APIRDD API

如果使用 RDD API 访问 Azure Data Lake Storage Gen2,则无法使用 spark.conf.set(...) 访问 Hadoop 配置选项集。If you are using the RDD API to access Azure Data Lake Storage Gen2 you cannot access Hadoop configuration options set using spark.conf.set(...). 因此,必须使用以下方法之一设置凭据:Therefore you must set the credentials using one of the following methods:

  • 创建群集时,将 Hadoop 配置选项指定为 Spark 选项。Specify the Hadoop configuration options as Spark options when you create the cluster. 必须将 spark.hadoop. 前缀添加到相应的 Hadoop 配置键,以便将它们传播到用于 RDD 作业的 Hadoop 配置:You must add the spark.hadoop. prefix to the corresponding Hadoop configuration keys to propagate them to the Hadoop configurations that are used for your RDD jobs:

    # Using an account access key
    spark.hadoop.fs.azure.account.key.<storage-account-name>.dfs.core.windows.net <storage-account-access-key-name>
    
  • Scala 用户可在 spark.sparkContext.hadoopConfiguration 中设置凭据:Scala users can set the credentials in spark.sparkContext.hadoopConfiguration:

    // Using an account access key
    spark.sparkContext.hadoopConfiguration.set(
      "fs.azure.account.key.<storage-account-name>.dfs.core.windows.net",
      dbutils.secrets.get(scope="<scope-name>",key="<storage-account-access-key-name>")
    )
    

    其中 dbutils.secrets.get(scope="<scope-name>",key="<storage-account-access-key-name>") 用于检索存储帐户访问密钥,该密钥已作为机密存储在机密范围中。where dbutils.secrets.get(scope="<scope-name>",key="<storage-account-access-key-name>") retrieves your storage account access key that has been stored as a secret in a secret scope.

警告

这些凭据可供访问群集的所有用户使用。These credentials are available to all users who access the cluster.

设置凭据后,可以使用标准 Spark 和 Databricks API 从存储帐户读取。Once your credentials are set up, you can use standard Spark and Databricks APIs to read from the storage account. 例如,For example,

val df = spark.read.parquet("abfss://<file-system-name>@<storage-account-name>.dfs.core.windows.net/<directory-name>")

dbutils.fs.ls("abfss://<file-system-name>@<storage-account-name>.dfs.core.windows.net/<directory-name>")

以下笔记本演示直接使用装载操作访问 Azure Data Lake Storage Gen2。The following notebook demonstrates accessing Azure Data Lake Storage Gen2 directly and with a mount.

ADLS Gen2 服务主体笔记本ADLS Gen2 service principal notebook

获取笔记本Get notebook

常见问题 (FAQ)Frequently asked questions (FAQ)

ABFS 是否支持共享访问签名 (SAS) 令牌身份验证Does ABFS support Shared Access Signature (SAS) token authentication?

ABFS 不支持 SAS 令牌身份验证,但 Azure Data Lake Storage Gen2 服务本身确实支持 SAS 密钥。ABFS does not support SAS token authentication, but the Azure Data Lake Storage Gen2 service itself does support SAS keys.

我可以使用 abfs 方案访问 Azure Data Lake Storage Gen2 吗?Can I use the abfs scheme to access Azure Data Lake Storage Gen2?

是的。Yes. 但建议尽可能使用 abfss 方案,该方案使用 SSL 加密访问。However, we recommend that you use the abfss scheme, which uses SSL encrypted access, wherever possible. 需要在 OAuth 或基于 Azure Active Directory 的身份验证中使用 abfss,因为任何传递令牌的 Azure AD 方面自然都需要使用安全传输。You are required to use abfss with OAuth or Azure Active Directory-based authentication because any Azure AD aspects that have tokens passed around naturally need to use secure transfer.

当我访问启用了分层命名空间的 Azure Data Lake Storage Gen2 帐户时,我遇到了一个 java.io.FileNotFoundException 错误,并且错误消息包含 FilesystemNotFoundWhen I accessed an Azure Data Lake Storage Gen2 account with the hierarchical namespace enabled, I experienced a java.io.FileNotFoundException error, and the error message includes FilesystemNotFound.

如果错误消息包含以下信息,则这是因为你的命令正在尝试访问通过 Azure 门户创建的 Blob 存储容器:If the error message includes the following information, it is because your command is trying to access a Blob storage container created through the Azure portal:

StatusCode=404
StatusDescription=The specified filesystem does not exist.
ErrorCode=FilesystemNotFound
ErrorMessage=The specified filesystem does not exist.

启用分层命名空间后,不需要通过 Azure 门户创建容器。When a hierarchical namespace is enabled, you do not need to create containers through Azure portal. 如果看到此问题,请通过 Azure 门户删除 Blob 容器。If you see this issue, delete the Blob container through Azure portal. 几分钟后,你就可以访问该容器。After a few minutes, you will be able to access the container. 或者,可以更改 abfss URI 以使用其他容器,只要此容器不是通过 Azure 门户创建的。Alternatively, you can change your abfss URI to use a different container, as long as this container is not created through Azure portal.

我在尝试装载 Azure Data Lake Storage Gen2 系统文件时,观察到错误 This request is not authorized to perform this operation using this permissionI observe the error This request is not authorized to perform this operation using this permission when I try to mount an Azure Data Lake Storage Gen2 filesystem.

如果未授予用于 Azure Data Lake Storage Gen2 的服务主体适当的角色分配,则会发生此错误。This error occurs if the service principal you are using for Azure Data Lake Storage Gen2 is not granted the appropriate role assignment. 请参阅使用 Azure Active Directory 凭据自动访问See Access automatically with your Azure Active Directory credentials.

已知问题Known issues

请参阅 Microsoft 文档中的 Azure Data Lake Storage Gen2 的已知问题See Known issues with Azure Data Lake Storage Gen2 in the Microsoft documentation.