Add additional storage accounts to HDInsight

Learn how to use script actions to add additional Azure Storage accounts to HDInsight. The steps in this document add a storage account to an existing HDInsight cluster. This article applies to storage accounts (not the default cluster storage account), and not additional storage such as Azure Data Lake Storage Gen1 and Azure Data Lake Storage Gen2.

Important

The information in this document is about adding additional storage account(s) to a cluster after it has been created. For information on adding storage accounts during cluster creation, see Set up clusters in HDInsight with Apache Hadoop, Apache Spark, Apache Kafka, and more.

Prerequisites

How it works

During processing, the script does the following actions:

  • If the storage account already exists in the core-site.xml configuration for the cluster, the script exits and no further actions are done.

  • Verifies that the storage account exists and can be accessed using the key.

  • Encrypts the key using the cluster credential.

  • Adds the storage account to the core-site.xml file.

  • Stops and restarts the Apache Oozie, Apache Hadoop YARN, Apache Hadoop MapReduce2, and Apache Hadoop HDFS services. Stopping and starting these services allows them to use the new storage account.

Warning

Using a storage account in a different location than the HDInsight cluster is not supported.

Add storage account

Use Script Action to apply the changes with the following considerations:

Property Value
Bash script URI https://hdiconfigactions.blob.core.windows.net/linuxaddstorageaccountv01/add-storage-account-v01.sh
Node type(s) Head
Parameters ACCOUNTNAME ACCOUNTKEY -p (optional)
  • ACCOUNTNAME is the name of the storage account to add to the HDInsight cluster.
  • ACCOUNTKEY is the access key for ACCOUNTNAME.
  • -p is optional. If specified, the key isn't encrypted and is stored in the core-site.xml file as plain text.

Verification

When viewing the HDInsight cluster in the Azure portal, selecting the Storage Accounts entry under Properties doesn't display storage accounts added through this script action. Azure PowerShell and Azure CLI don't display the additional storage account either. The storage information isn't displayed because the script only modifies the core-site.xml configuration for the cluster. This information isn't used when retrieving the cluster information using Azure management APIs.

To verify the additional storage use one of the methods shown below:

PowerShell

The script will return the Storage Account name(s) associated with the given cluster. Replace CLUSTERNAME with the actual cluster name, and then run the script.

# Update values
$clusterName = "CLUSTERNAME"

$creds = Get-Credential -UserName "admin" -Message "Enter the cluster login credentials"

$clusterName = $clusterName.ToLower();

# getting service_config_version
$resp = Invoke-WebRequest -Uri "https://$clusterName.azurehdinsight.net/api/v1/clusters/$clusterName`?fields=Clusters/desired_service_config_versions/HDFS" `
    -Credential $creds -UseBasicParsing
$respObj = ConvertFrom-Json $resp.Content

$configVersion=$respObj.Clusters.desired_service_config_versions.HDFS.service_config_version

$resp = Invoke-WebRequest -Uri "https://$clusterName.azurehdinsight.net/api/v1/clusters/$clusterName/configurations/service_config_versions?service_name=HDFS&service_config_version=$configVersion" `
    -Credential $creds
$respObj = ConvertFrom-Json $resp.Content

# extract account names
$value = ($respObj.items.configurations | Where type -EQ "core-site").properties | Get-Member -membertype properties | Where Name -Like "fs.azure.account.key.*"
foreach ($name in $value ) { $name.Name.Split(".")[4]}

Apache Ambari

  1. From a web browser, navigate to https://CLUSTERNAME.azurehdinsight.net, where CLUSTERNAME is the name of your cluster.

  2. Navigate to HDFS > Configs > Advanced > Custom core-site.

  3. Observe the keys that begin with fs.azure.account.key. The account name will be a part of the key as seen in this sample image:

    verification through Apache Ambari

Remove storage account

  1. From a web browser, navigate to https://CLUSTERNAME.azurehdinsight.net, where CLUSTERNAME is the name of your cluster.

  2. Navigate to HDFS > Configs > Advanced > Custom core-site.

  3. Remove the following keys:

    • fs.azure.account.key.<STORAGE_ACCOUNT_NAME>.blob.core.windows.net
    • fs.azure.account.keyprovider.<STORAGE_ACCOUNT_NAME>.blob.core.windows.net

After removing these keys and saving the configuration, you need to restart Oozie, Yarn, MapReduce2, HDFS, and Hive one by one.

Known issues

Storage firewall

If you choose to secure your storage account with the Firewalls and virtual networks restrictions on Selected networks, be sure to enable the exception Allow trusted Microsoft services... so that HDInsight can access your storage account.

Unable to access storage after changing key

If you change the key for a storage account, HDInsight can no longer access the storage account. HDInsight uses a cached copy of key in the core-site.xml for the cluster. This cached copy must be updated to match the new key.

Running the script action again doesn't update the key, as the script checks to see if an entry for the storage account already exists. If an entry already exists, it doesn't make any changes.

To work around this problem:

  1. Remove the storage account.
  2. Add the storage account.

Important

Rotating the storage key for the primary storage account attached to a cluster is not supported.

Poor performance

If the storage account is in a different region than the HDInsight cluster, you may experience poor performance. Accessing data in a different region sends network traffic outside the regional Azure data center. And across the public internet, which can introduce latency.

Additional charges

If the storage account is in a different region than the HDInsight cluster, you may notice additional egress charges on your Azure billing. An egress charge is applied when data leaves a regional data center. This charge is applied even if the traffic is destined for another Azure data center in a different region.

Next steps

You've learned how to add additional storage accounts to an existing HDInsight cluster. For more information on script actions, see Customize Linux-based HDInsight clusters using script action