Compare storage options for use with Azure HDInsight clusters

Article
06/12/2023

You can choose between a few different Azure storage services when creating HDInsight clusters:

This article provides an overview of these storage types and their unique features.

Storage types and features

The following table summarizes the Azure Storage services that are supported with different versions of HDInsight:

Storage service	Account type	Namespace Type	Supported services	Supported performance tiers	Supported access tiers	HDInsight Version	Cluster type
Azure Data Lake Storage Gen2	General-purpose V2	Hierarchical (filesystem)	Blob	Standard	Hot, Cool, Archive	3.6+	All except Spark 2.1 and 2.2
Azure Storage	General-purpose V2	Object	Blob	Standard	Hot, Cool, Archive	3.6+	All
Azure Storage	General-purpose V1	Object	Blob	Standard	N/A	All	All
Azure Storage	Blob Storage**	Object	Block Blob	Standard	Hot, Cool, Archive	All	All
Azure Data Lake Storage Gen1	N/A	Hierarchical (filesystem)	N/A	N/A	N/A	3.6 Only	All except HBase
Azure Storage	Block Blob	Object	Block Blob	Premium	N/A	3.6+	Only HBase with accelerated writes
Azure Data Lake Storage Gen2	Block Blob	Hierarchical (filesystem)	Block Blob	Premium	N/A	3.6+	Only HBase with accelerated writes

**For HDInsight clusters, only secondary storage accounts can be of type BlobStorage and Page Blob isn't a supported storage option.

For more information on Azure Storage account types, see Azure storage account overview

For more information on Azure Storage access tiers, see Azure Blob storage: Premium (preview), Hot, Cool, and Archive storage tiers

You can create clusters using combinations of services for primary and optional secondary storage. The following table summarizes the cluster storage configurations that are currently supported in HDInsight:

HDInsight Version	Primary Storage	Secondary Storage	Supported
3.6 & 4.0	General Purpose V1, General Purpose V2	General Purpose V1, General Purpose V2, BlobStorage(Block Blobs)	Yes
3.6 & 4.0	General Purpose V1, General Purpose V2	Data Lake Storage Gen2	No
3.6 & 4.0	Data Lake Storage Gen2*	Data Lake Storage Gen2	Yes
3.6 & 4.0	Data Lake Storage Gen2*	General Purpose V1, General Purpose V2, BlobStorage(Block Blobs)	Yes
3.6 & 4.0	Data Lake Storage Gen2	Data Lake Storage Gen1	No
3.6	Data Lake Storage Gen1	Data Lake Storage Gen1	Yes
3.6	Data Lake Storage Gen1	General Purpose V1, General Purpose V2, BlobStorage(Block Blobs)	Yes
3.6	Data Lake Storage Gen1	Data Lake Storage Gen2	No
4.0	Data Lake Storage Gen1	Any	No
4.0	General Purpose V1, General Purpose V2	Data Lake Storage Gen1	No

*=This could be one or multiple Data Lake Storage Gen2, as long as they're all setup to use the same managed identity for cluster access.

Note

Data Lake Storage Gen2 primary storage is not supported for Spark 2.1 or 2.2 clusters.

Data replication

Azure HDInsight does not store customer data. The primary means of storage for a cluster are its associated storage accounts. You can attach your cluster to an existing storage account, or create a new storage account during the cluster creation process. If a new account is created, it will be created as a locally redundant storage (LRS) account, and will satisfy in-region data residency requirements including those specified in the Trust Center.

You can validate that HDInsight is properly configured to store data in a single region by ensuring that the storage account associated with your HDInsight is LRS or another storage option mentioned on Trust Center.

Note

Upgrading the primary or secondary storage account of a running cluster with Azure Data Lake Storage Gen2 capabilities is not supported. To change the storage type of an existing HDInsight cluster to Data Lake Storage Gen2, you will need to recreate the cluster and select an hierarchical namespace enabled storage account.

Compare storage options for use with Azure HDInsight clusters

Storage types and features

Data replication

Next steps

Feedback

Additional resources