什麼是 Azure Data Lake 儲存體 Gen1?What is Azure Data Lake Storage Gen1?


Azure Data Lake Storage Gen2 現已全面上市。Azure Data Lake Storage Gen2 is now generally available. 我們建議您現在就開始使用。We recommend that you start using it today. 如需詳細資訊, 請參閱產品頁面For more information, see the product page.

Azure Data Lake Storage Gen1 是容納巨量資料分析工作負載的企業級超大規模存放庫。Azure Data Lake Storage Gen1 is an enterprise-wide hyper-scale repository for big data analytic workloads. Azure Data Lake 可讓您在單一位置擷取任何大小、類型和擷取速度的資料,以便進行運作和探究分析。Azure Data Lake enables you to capture data of any size, type, and ingestion speed in one single place for operational and exploratory analytics.

使用與 WebHDFS 相容的 REST API,可以從 Hadoop (HDInsight 叢集所提供) 存取 Data Lake Storage Gen1。Data Lake Storage Gen1 can be accessed from Hadoop (available with HDInsight cluster) using the WebHDFS-compatible REST APIs. 它是設計來分析儲存的資料和而調整的資料分析案例的效能。It's designed to enable analytics on the stored data and is tuned for performance for data analytics scenarios. Data Lake 儲存體 Gen1 包括所有的企業級功能: 安全性、 管理性、 延展性、 可靠性和可用性。Data Lake Storage Gen1 includes all enterprise-grade capabilities: security, manageability, scalability, reliability, and availability.

Azure Data Lake

主要功能Key capabilities

Data Lake Storage Gen1 的一些重要功能包括下列項目。Some of the key capabilities of Data Lake Storage Gen1 include the following.

專為 Hadoop 而建置Built for Hadoop

Data Lake 儲存體 Gen1 是 Apache Hadoop 檔案系統與 Hadoop 分散式檔案系統 (HDFS) 相容,並搭配 Hadoop 生態系統。Data Lake Storage Gen1 is an Apache Hadoop file system that's compatible with Hadoop Distributed File System (HDFS), and works with the Hadoop ecosystem. 採用 WebHDFS API 的現有 HDInsight 應用程式或服務可以輕易地與 Data Lake Storage Gen 1 整合。Your existing HDInsight applications or services that use the WebHDFS API can easily integrate with Data Lake Storage Gen1. Data Lake Storage Gen1 也會公開適用於應用程式的 WebHDFS 相容 REST 介面。Data Lake Storage Gen1 also exposes a WebHDFS-compatible REST interface for applications.

您可以輕鬆地分析資料儲存在 Data Lake 儲存體 Gen1 中使用 Hadoop 分析架構,例如 MapReduce 或 Hive。You can easily analyze data stored in Data Lake Storage Gen1 using Hadoop analytic frameworks such as MapReduce or Hive. 您可以佈建 Azure HDInsight 叢集,並將其設定為直接存取儲存在 Data Lake 儲存體 Gen1 中的資料。You can provision Azure HDInsight clusters and configure them to directly access data stored in Data Lake Storage Gen1.

無限制的儲存空間、PB 檔案Unlimited storage, petabyte files

Data Lake 儲存體 Gen1 提供無限制的儲存體,而且可以儲存各種不同的分析資料。Data Lake Storage Gen1 provides unlimited storage and can store a variety of data for analytics. 它不會造成對於帳戶大小、 檔案大小或可以儲存在 data lake 的資料數量的任何限制。It doesn't impose any limits on account sizes, file sizes, or the amount of data that can be stored in a data lake. 個別檔案的範圍可以從 kb 到 pb 的大小。Individual files can range from kilobyte to petabytes in size. 藉由製作多個,資料會永久儲存。Data is stored durably by making multiple copies. 沒有任何限制的資料可以儲存在 data lake 中的時間為準。There is no limit on the duration of time for which the data can be stored in the data lake.

針對巨量資料分析調整效能Performance-tuned for big data analytics

Data Lake 儲存體 Gen1 專為執行大型分析系統需要龐大輸送量來查詢及分析大量資料。Data Lake Storage Gen1 is built for running large-scale analytic systems that require massive throughput to query and analyze large amounts of data. 資料湖會將檔案的各個部分散於數個個別的儲存體伺服器。The data lake spreads parts of a file over a number of individual storage servers. 這可改善以平行方式讀取檔案以便執行資料分析時的輸送量。This improves the read throughput when reading the file in parallel for performing data analytics.

企業需求:高度可用且安全Enterprise ready: Highly available and secure

Data Lake Storage Gen1 提供符合業界標準的可用性與可靠性。Data Lake Storage Gen1 provides industry-standard availability and reliability. 您的資料資產可藉由製作備援複本來長期儲存,以防範任何未預期的失敗。Your data assets are stored durably by making redundant copies to guard against any unexpected failures.

Data Lake Storage Gen1 也可對預存資料提供企業級安全性。Data Lake Storage Gen1 also provides enterprise-grade security for the stored data. 如需詳細資訊,請參閱在 Azure Data Lake Storage Gen1 中保護資料For more information, see Securing data in Azure Data Lake Storage Gen1.

所有的資料All data

Data Lake 儲存體 Gen1 可以原生格式,儲存任何資料而不需要任何先前的轉換。Data Lake Storage Gen1 can store any data in its native format, without requiring any prior transformations. 載入資料前,Data Lake Storage Gen1 不需要定義結構描述,而是留待個別的分析架構在分析時解譯資料及定義結構描述。Data Lake Storage Gen1 does not require a schema to be defined before the data is loaded, leaving it up to the individual analytic framework to interpret the data and define a schema at the time of the analysis. 能夠儲存任意大小和格式的檔案,可讓 Data Lake 儲存體 Gen1 處理結構化、 半結構化和非結構化資料。The ability to store files of arbitrary sizes and formats makes it possible for Data Lake Storage Gen1 to handle structured, semi-structured, and unstructured data.

Data Lake Storage Gen1 的資料容器基本上是資料夾與檔案。Data Lake Storage Gen1 containers for data are essentially folders and files. 您對儲存的資料,使用 Sdk、 Azure 入口網站和 Azure Powershell。You operate on the stored data using SDKs, the Azure portal, and Azure Powershell. 如果您將資料放入使用這些介面的存放區,並使用適當的容器,您可以儲存任何類型的資料。If you put your data into the store using these interfaces and using the appropriate containers, you can store any type of data. Data Lake Storage Gen1 不會根據其儲存的資料類型來對資料執行任何特殊處理。Data Lake Storage Gen1 does not perform any special handling of data based on the type of data it stores.

保護資料Securing data

Data Lake 儲存體 Gen1 會使用 Azure Active Directory (Azure AD) 驗證和存取控制清單 (Acl) 來管理存取您的資料。Data Lake Storage Gen1 uses Azure Active Directory (Azure AD) for authentication, and access control lists (ACLs) to manage access to your data.

功能Feature 描述Description
驗證Authentication Data Lake 儲存體 Gen1 與 Azure AD 整合進行身分識別和存取管理儲存在 Data Lake 儲存體 Gen1 中的所有資料。Data Lake Storage Gen1 integrates with Azure AD for identity and access management for all the data stored in Data Lake Storage Gen1. 因為整合,所以資料湖儲存體 Gen1 優點,從所有 Azure AD 功能,例如多重要素驗證、 條件式存取、 角色型存取控制、 應用程式使用量監視、 安全性監視和警示等等。Because of the integration, Data Lake Storage Gen1 benefits from all Azure AD feature such as multi-factor authentication, Conditional Access, role-based access control, application usage monitoring, security monitoring and alerting, and so on. Data Lake Storage Gen1 支援 OAuth 2.0 通訊協定以便在 REST 介面中進行驗證。Data Lake Storage Gen1 supports the OAuth 2.0 protocol for authentication within the REST interface. 請參閱Data Lake 儲存體 Gen1 驗證See Data Lake Storage Gen1 authentication.
存取控制Access control Data Lake Storage Gen1 透過支援 WebHDFS 通訊協定所公開的 POSIX 樣式權限,以提供存取控制。Data Lake Storage Gen1 provides access control by supporting POSIX-style permissions exposed by the WebHDFS protocol. 您可以啟用的根資料夾、 子資料夾和個別檔案的 Acl。You can enable ACLs on the root folder, on subfolders, and on individual files. 如需有關 Acl 的 Data Lake 儲存體 Gen1 內容中的運作方式的詳細資訊,請參閱 < Data Lake 儲存體 Gen1 中的存取控制For more information about how ACLs work in the context of Data Lake Storage Gen1, see Access control in Data Lake Storage Gen1.
加密Encryption Data Lake 儲存體 Gen1 也提供儲存在帳戶中的資料加密。Data Lake Storage Gen1 also provides encryption for data that's stored in the account. 您會在建立 Data Lake Storage Gen1 帳戶時指定加密設定。You specify the encryption settings while creating a Data Lake Storage Gen1 account. 您可以選擇將資料加密,或選擇不使用加密。You can choose to have your data encrypted or opt for no encryption. 如需詳細資訊,請參閱 Data Lake Storage Gen1 的加密For more information, see Encryption in Data Lake Storage Gen1. 如需有關如何提供加密相關組態的指示,請參閱開始使用 Data Lake 儲存體 Gen1 使用 Azure 入口網站For instructions on how to provide encryption-related configuration, see Get started with Data Lake Storage Gen1 using the Azure portal.

如需有關如何在 Data Lake Storage Gen1 中保護資料的指示,請參閱在 Azure Data Lake Storage Gen1 中保護資料For instructions on how to secure data in Data Lake Storage Gen1, see Securing data in Azure Data Lake Storage Gen1.

應用程式相容性Application compatibility

Data Lake Storage Gen1 與 Hadoop 生態系統中的大部分開放原始碼元件相容。Data Lake Storage Gen1 is compatible with most open-source components in the Hadoop ecosystem. 它也會與其他 Azure 服務整合。It also integrates well with other Azure services. 若要了解如何使用 Data Lake 儲存體 Gen1 與開放原始碼元件和其他 Azure 服務,請使用下列連結:To learn more about how you can use Data Lake Storage Gen1 with open-source components and other Azure services, use the following links:

Data Lake 儲存體 Gen1 檔案系統Data Lake Storage Gen1 file system

可以透過檔案系統 AzureDataLakeFilesystem 存取 data Lake 儲存體 Gen1 (adl: / /) (適用於 HDInsight 叢集) 的 Hadoop 環境中。Data Lake Storage Gen1 can be accessed via the filesystem AzureDataLakeFilesystem (adl://) in Hadoop environments (available with HDInsight cluster). 應用程式和服務使用 adl: / / 可以利用目前不適用於 WebHDFS 進一步效能最佳化。Applications and services that use adl:// can take advantage of further performance optimizations that aren't currently available in WebHDFS. 如此一來,Data Lake 儲存體 Gen1 可讓您彈性地是使用最佳效能的建議選擇使用 adl: / / 或直接使用 WebHDFS API 繼續維護現有的程式碼。As a result, Data Lake Storage Gen1 gives you the flexibility to either make use of the best performance with the recommended option of using adl:// or maintain existing code by continuing to use the WebHDFS API directly. Azure HDInsight 充分利用 AzureDataLakeFilesystem 來提供 Data Lake Storage Gen1 的最佳效能。Azure HDInsight fully leverages the AzureDataLakeFilesystem to provide the best performance on Data Lake Storage Gen1.

您可以使用 adl://<data_lake_storage_gen1_name>.azuredatalakestore.net,在 Data Lake Storage Gen1 中存取您的資料。You can access your data in Data Lake Storage Gen1 using adl://<data_lake_storage_gen1_name>.azuredatalakestore.net. 如需如何存取 Data Lake 儲存體 Gen1 中的資料的詳細資訊,請參閱檢視的已儲存資料的屬性For more information about how to access the data in Data Lake Storage Gen1, see View properties of the stored data.

後續步驟Next steps