Azure Data Lake Storage Gen2 簡介Introduction to Azure Data Lake Storage Gen2

Azure Data Lake Storage Gen2 是一組巨量資料分析的專屬功能,內建於 Azure Blob 儲存體‎Azure Data Lake Storage Gen2 is a set of capabilities dedicated to big data analytics, built on Azure Blob storage. Data Lake Storage Gen2 是融合我們現有的兩項儲存體服務 (Azure Blob 儲存體和 Azure Data Lake Storage Gen1) 功能的結果。Data Lake Storage Gen2 is the result of converging the capabilities of our two existing storage services, Azure Blob storage and Azure Data Lake Storage Gen1. Azure Data Lake Storage Gen1 的功能 (例如檔案系統語意、目錄及檔案層級安全性和級別) 結合了 Azure Blob 儲存體的低成本、分層式儲存體、高可用性/災害復原功能。Features from Azure Data Lake Storage Gen1, such as file system semantics, directory, and file level security and scale are combined with low-cost, tiered storage, high availability/disaster recovery capabilities from Azure Blob storage.

針對企業巨量資料分析所設計Designed for enterprise big data analytics

Data Lake Storage Gen2 讓 Azure 儲存體成為在 Azure 上打造企業 Data Lake 的基礎。Data Lake Storage Gen2 makes Azure Storage the foundation for building enterprise data lakes on Azure. Data Lake Storage Gen2 從一開始就設計為服務數 PB 的資訊,同時可以維持數百 GB 的輸送量,可讓您輕鬆地管理大量資料。Designed from the start to service multiple petabytes of information while sustaining hundreds of gigabits of throughput, Data Lake Storage Gen2 allows you to easily manage massive amounts of data.

Data Lake Storage Gen2 的基礎部分是新增至 Blob 儲存體的階層命名空間A fundamental part of Data Lake Storage Gen2 is the addition of a hierarchical namespace to Blob storage. 階層命名空間會將物件/檔案組織成目錄階層,讓資料存取更有效率。The hierarchical namespace organizes objects/files into a hierarchy of directories for efficient data access. 一般物件存放區命名慣例是在名稱中使用斜線來模仿階層式目錄結構。A common object store naming convention uses slashes in the name to mimic a hierarchical directory structure. Data Lake Storage Gen2 會使這樣的結構成真。This structure becomes real with Data Lake Storage Gen2. 重新命名或刪除目錄等操作會成為目錄中單一不可部分完成的中繼資料作業,而不是列舉及處理共用目錄名稱前置詞的所有物件。Operations such as renaming or deleting a directory become single atomic metadata operations on the directory rather than enumerating and processing all objects that share the name prefix of the directory.

Data Lake Storage Gen2 建置於 Blob 儲存體,並以下列方式增強效能、管理和安全性:Data Lake Storage Gen2 builds on Blob storage and enhances performance, management, and security in the following ways:

  • 效能經過最佳化,因為您不需要複製或轉換資料作為分析的必要條件。Performance is optimized because you do not need to copy or transform data as a prerequisite for analysis. 相較於 Blob 儲存體上的一般命名空間,階層命名空間大幅提高目錄管理作業的效能,從而提高整體作業效能。Compared to the flat namespace on Blob storage, the hierarchical namespace greatly improves the performance of directory management operations, which improves overall job performance.

  • 管理更容易,因為您可以透過目錄和子目錄整理和操作檔案。Management is easier because you can organize and manipulate files through directories and subdirectories.

  • 安全性是強制的,因為您可以在目錄或個人檔案上定義 POSIX 權限。Security is enforceable because you can define POSIX permissions on directories or individual files.

此外,Azure Data Lake Storage Gen2 非常符合成本效益,因為其建立在低成本的 Azure Blob 儲存體上。Also, Data Lake Storage Gen2 is very cost effective because it is built on top of the low-cost Azure Blob storage. 額外功能進一步降低了在 Azure 上執行巨量資料分析的擁有權總成本。The additional features further lower the total cost of ownership for running big data analytics on Azure.

Data Lake Storage Gen2 的主要功能Key features of Data Lake Storage Gen2

  • Hadoop 相容存取:Data Lake Storage Gen2 可讓您管理及存取資料,就如同使用 Hadoop 分散式檔案系統 (HDFS) 一樣。Hadoop compatible access: Data Lake Storage Gen2 allows you to manage and access data just as you would with a Hadoop Distributed File System (HDFS). 全新 ABFS 驅動程式可在所有 Apache Hadoop 環境中使用,包括 Azure HDInsight Azure DatabricksAzure Synapse Analytics,以存取 Data Lake Storage Gen2 中儲存的資料。The new ABFS driver is available within all Apache Hadoop environments, including Azure HDInsight, Azure Databricks, and Azure Synapse Analytics to access data stored in Data Lake Storage Gen2.

  • POSIX 權限的超集合:Data Lake Gen2 的安全性模型可支援 ACL 和 POSIX 權限,以及一些 Data Lake Storage Gen2 特有的額外細微性。A superset of POSIX permissions: The security model for Data Lake Gen2 supports ACL and POSIX permissions along with some extra granularity specific to Data Lake Storage Gen2. 這些設定可透過儲存體總管或 Hive 和 Spark 這類架構來配置。Settings may be configured through Storage Explorer or through frameworks like Hive and Spark.

  • 成本效益:Data Lake Storage Gen2 提供低成本儲存體容量和異動功能。Cost effective: Data Lake Storage Gen2 offers low-cost storage capacity and transactions. 隨著資料在整個生命週期中進行轉換,計費率會有所更改,透過 Azure Blob 儲存體生命週期等內建功能將成本降到最低。As data transitions through its complete lifecycle, billing rates change keeping costs to a minimum via built-in features such as Azure Blob storage lifecycle.

  • 最佳化的驅動程式:ABFS 驅動程式已針對巨量資料分析完成特別最佳化Optimized driver: The ABFS driver is optimized specifically for big data analytics. 相應的 REST API 透過端點 呈現。The corresponding REST APIs are surfaced through the endpoint


無論您是透過 Data Lake Storage Gen2 或 Blob 儲存體介面存取,Azure 儲存體都可以隨設計調整。Azure Storage is scalable by design whether you access via Data Lake Storage Gen2 or Blob storage interfaces. 而且能夠儲存和使用數 EB 的資料It is able to store and serve many exabytes of data. 這樣的儲存量可用於在每秒高輸入/輸出作業 (IOPS) 時以每秒 GB (Gbps) 為單位測量的輸送量。This amount of storage is available with throughput measured in gigabits per second (Gbps) at high levels of input/output operations per second (IOPS). 除了持續性之外,處理作業是在近常數的每個要求延遲時執行的,這些延遲是在服務、帳戶及檔案層級上所測得。Beyond just persistence, processing is executed at near-constant per-request latencies that are measured at the service, account, and file levels.

符合成本效益Cost effectiveness

在 Azure Blob 儲存體上建立 Data Lake Storage Gen2 的眾多好處之一,在於儲存容量和異動的成本低。One of the many benefits of building Data Lake Storage Gen2 on top of Azure Blob storage is the low cost of storage capacity and transactions. Data Lake Storage Gen2 和其他雲端儲存體服務不同之處,是在執行分析之前不需要移動或轉換儲存在其中的資料。Unlike other cloud storage services, data stored in Data Lake Storage Gen2 is not required to be moved or transformed prior to performing analysis. 如需定價的詳細資訊,請參閱 Azure 儲存體定價For more information about pricing, see Azure Storage pricing.

此外,例如階層式命名空間等功能可大幅提升許多分析作業的整體效能。Additionally, features such as the hierarchical namespace significantly improve the overall performance of many analytics jobs. 效能提升即表示處理數量相同的資料時,所需的計算能力較少,因此可降低端對端分析工作的擁有權總成本 (TCO)。This improvement in performance means that you require less compute power to process the same amount of data, resulting in a lower total cost of ownership (TCO) for the end-to-end analytics job.

一項服務,多個概念One service, multiple concepts

Data Lake Storage Gen2 是巨量資料分析的額外功能,建置在 Azure Blob 儲存體的基礎之上。Data Lake Storage Gen2 is an additional capability for big data analytics, built on top of Azure Blob storage. 雖然利用現有的 Blobs 平台元件來建立及操作 Data Lake 進行分析有許多優點,但是它會導致用許多概念描述相同、共用的事項。While there are many benefits in leveraging existing platform components of Blobs to create and operate data lakes for analytics, it does lead to multiple concepts describing the same, shared things.

下列是以不同概念描述的對等實體。The following are the equivalent entities, as described by different concepts. 除非加以指定,否則這些實體是直接同義:Unless specified otherwise these entities are directly synonymous:

概念Concept 最上層組織Top Level Organization 較低層級組織Lower Level Organization 資料容器Data Container
Blobs – 一般用途物件儲存體Blobs – General purpose object storage 容器Container 虛擬目錄 (僅限 SDK – 不提供不可部分完成操作)Virtual directory (SDK only – does not provide atomic manipulation) BlobBlob
Azure Data Lake Storage Gen2 – 分析儲存體Azure Data Lake Storage Gen2 – Analytics Storage 容器Container 目錄Directory 檔案File

支援的 Blob 儲存體功能Supported Blob storage features

Blob 儲存體功能 (例如 診斷記錄、 存取層和  Blob 儲存體生命週期管理原則) 現已可與具有階層式命名空間的帳戶搭配運作。Blob storage features such as diagnostic loggingaccess tiers, and Blob Storage lifecycle management policies now work with accounts that have a hierarchical namespace. 因此,您可以在 Blob 儲存體帳戶上啟用階層式命名空間,而不會失去這些功能的存取權。Therefore, you can enable hierarchical namespaces on your Blob storage accounts without losing access to these features.

如需支援的 Blob 儲存體功能清單,請參閱 Azure Data Lake Storage Gen2 中提供的 Blob 儲存體功能For a list of supported Blob storage features, see Blob Storage features available in Azure Data Lake Storage Gen2.

支援的 Azure 服務整合Supported Azure service integrations

Data Lake Storage gen2 支援數個 Azure 服務,可用於內嵌資料、執行分析,以及建立視覺表示法。Data Lake Storage gen2 supports several Azure services that you can use to ingest data, perform analytics, and create visual representations. 如需支援的 Azure 服務清單,請參閱支援 Azure Data Lake Storage Gen2 的 Azure 服務For a list of supported Azure services, see Azure services that support Azure Data Lake Storage Gen2.

支援的開放原始碼平台Supported open source platforms

數個開放原始碼平台支援 Data Lake Storage Gen2。Several open source platforms support Data Lake Storage Gen2. 如需完整清單,請參閱支援 Azure Data Lake Storage Gen2 的開放原始碼平台For a complete list, see Open source platforms that support Azure Data Lake Storage Gen2.

另請參閱See also