Comparing Azure Data Lake Storage Gen1 and Azure Blob Storage

Note

Azure Data Lake Storage Gen1 is now retired. See the retirement announcement here.Data Lake Storage Gen1 resources are no longer accessible. If you require special assistance, please contact us.

The table in this article summarizes the differences between Azure Data Lake Storage Gen1 and Azure Blob Storage along some key aspects of big data processing. Azure Blob Storage is a general purpose, scalable object store that is designed for a wide variety of storage scenarios. Azure Data Lake Storage Gen1 is a hyper-scale repository that is optimized for big data analytics workloads.

Category Azure Data Lake Storage Gen1 Azure Blob Storage
Purpose Optimized storage for big data analytics workloads General purpose object store for a wide variety of storage scenarios, including big data analytics
Use Cases Batch, interactive, streaming analytics and machine learning data such as log files, IoT data, click streams, large datasets Any type of text or binary data, such as application back end, backup data, media storage for streaming and general purpose data. Additionally, full support for analytics workloads; batch, interactive, streaming analytics and machine learning data such as log files, IoT data, click streams, large datasets
Key Concepts Data Lake Storage Gen1 account contains folders, which in turn contains data stored as files Storage account has containers, which in turn has data in the form of blobs
Structure Hierarchical file system Object store with flat namespace
API REST API over HTTPS REST API over HTTP/HTTPS
Server-side API WebHDFS-compatible REST API Azure Blob Storage REST API
Hadoop File System Client Yes Yes
Data Operations - Authentication Based on Microsoft Entra identities Based on shared secrets - Account Access Keys and Shared Access Signature Keys.
Data Operations - Authentication Protocol OpenID Connect. Calls must contain a valid JWT (JSON web token) issued by Microsoft Entra ID. Hash-based Message Authentication Code (HMAC). Calls must contain a Base64-encoded SHA-256 hash over a part of the HTTP request.
Data Operations - Authorization POSIX Access Control Lists (ACLs). ACLs based on Microsoft Entra identities can be set at the file and folder level. For account-level authorization – Use Account Access Keys
For account, container, or blob authorization - Use Shared Access Signature Keys
Data Operations - Auditing Available. See here for information. Available
Encryption data at rest
  • Transparent, Server side
    • With service-managed keys
    • With customer-managed keys in Azure KeyVault
  • Transparent, Server side
    • With service-managed keys
    • With customer-managed keys in Azure KeyVault (preview)
  • Client-side encryption
Management operations (for example, Account Create) Azure role-based access control (Azure RBAC) for account management Azure role-based access control (Azure RBAC) for account management
Developer SDKs .NET, Java, Python, Node.js .NET, Java, Python, Node.js, C++, Ruby, PHP, Go, Android, iOS
Analytics Workload Performance Optimized performance for parallel analytics workloads. High Throughput and IOPS. Optimized performance for parallel analytics workloads.
Size limits No limits on account sizes, file sizes, or number of files For specific limits, see Scalability targets for standard storage accounts and Scalability and performance targets for Blob storage. Larger account limits available by contacting Azure Support
Geo-redundancy Locally redundant (multiple copies of data in one Azure region) Locally redundant (LRS), zone redundant (ZRS), globally redundant (GRS), read-access globally redundant (RA-GRS). See here for more information
Service state Generally available Generally available
Regional availability See here Available in all Azure regions
Price See Pricing See Pricing