Comparing Azure Data Lake Storage Gen1 and Azure Blob Storage
Note
Azure Data Lake Storage Gen1 is now retired. See the retirement announcement here.Data Lake Storage Gen1 resources are no longer accessible. If you require special assistance, please contact us.
The table in this article summarizes the differences between Azure Data Lake Storage Gen1 and Azure Blob Storage along some key aspects of big data processing. Azure Blob Storage is a general purpose, scalable object store that is designed for a wide variety of storage scenarios. Azure Data Lake Storage Gen1 is a hyper-scale repository that is optimized for big data analytics workloads.
Category | Azure Data Lake Storage Gen1 | Azure Blob Storage |
---|---|---|
Purpose | Optimized storage for big data analytics workloads | General purpose object store for a wide variety of storage scenarios, including big data analytics |
Use Cases | Batch, interactive, streaming analytics and machine learning data such as log files, IoT data, click streams, large datasets | Any type of text or binary data, such as application back end, backup data, media storage for streaming and general purpose data. Additionally, full support for analytics workloads; batch, interactive, streaming analytics and machine learning data such as log files, IoT data, click streams, large datasets |
Key Concepts | Data Lake Storage Gen1 account contains folders, which in turn contains data stored as files | Storage account has containers, which in turn has data in the form of blobs |
Structure | Hierarchical file system | Object store with flat namespace |
API | REST API over HTTPS | REST API over HTTP/HTTPS |
Server-side API | WebHDFS-compatible REST API | Azure Blob Storage REST API |
Hadoop File System Client | Yes | Yes |
Data Operations - Authentication | Based on Microsoft Entra identities | Based on shared secrets - Account Access Keys and Shared Access Signature Keys. |
Data Operations - Authentication Protocol | OpenID Connect. Calls must contain a valid JWT (JSON web token) issued by Microsoft Entra ID. | Hash-based Message Authentication Code (HMAC). Calls must contain a Base64-encoded SHA-256 hash over a part of the HTTP request. |
Data Operations - Authorization | POSIX Access Control Lists (ACLs). ACLs based on Microsoft Entra identities can be set at the file and folder level. | For account-level authorization – Use Account Access Keys For account, container, or blob authorization - Use Shared Access Signature Keys |
Data Operations - Auditing | Available. See here for information. | Available |
Encryption data at rest |
|
|
Management operations (for example, Account Create) | Azure role-based access control (Azure RBAC) for account management | Azure role-based access control (Azure RBAC) for account management |
Developer SDKs | .NET, Java, Python, Node.js | .NET, Java, Python, Node.js, C++, Ruby, PHP, Go, Android, iOS |
Analytics Workload Performance | Optimized performance for parallel analytics workloads. High Throughput and IOPS. | Optimized performance for parallel analytics workloads. |
Size limits | No limits on account sizes, file sizes, or number of files | For specific limits, see Scalability targets for standard storage accounts and Scalability and performance targets for Blob storage. Larger account limits available by contacting Azure Support |
Geo-redundancy | Locally redundant (multiple copies of data in one Azure region) | Locally redundant (LRS), zone redundant (ZRS), globally redundant (GRS), read-access globally redundant (RA-GRS). See here for more information |
Service state | Generally available | Generally available |
Regional availability | See here | Available in all Azure regions |
Price | See Pricing | See Pricing |