Create Apache Hadoop cluster with secure transfer storage accounts in Azure HDInsight
The Secure transfer required feature enhances the security of your Azure Storage account by enforcing all requests to your account through a secure connection. This feature and the wasbs scheme are only supported by HDInsight cluster version 3.6 or newer.
Before you begin this tutorial, you must have:
- Azure subscription: To create a free one-month trial account, browse to azure.microsoft.com/free.
- An Azure Storage account with secure transfer enabled. For the instructions, see Create a storage account and Require secure transfer.
- A Blob container on the storage account.
Billing for HDInsight clusters is prorated per minute, whether you use them or not. Be sure to delete your cluster after you finish using it. See how to delete an HDInsight cluster.
In this section, you create a Hadoop cluster in HDInsight using an Azure Resource Manager template. The template is located in GitHub. Resource Manager template experience is not required for following this tutorial. For other cluster creation methods and understanding the properties used in this tutorial, see Create HDInsight clusters.
Click the following image to sign in to Azure and open the Resource Manager template in the Azure portal.
Follow the instructions to create the cluster with the following specifications:
- Specify HDInsight version 3.6. Version 3.6 or newer is required.
- Specify a secure transfer enabled storage account.
- Use short name for the storage account.
Both the storage account and the blob container must be created beforehand.
For the instructions, see Create cluster.
If you use script action to provide your own configuration files, you must use wasbs in the following settings:
- fs.defaultFS (core-site)
Add additional storage accounts
There are several options to add additional secure transfer enabled storage accounts:
- Modify the Azure Resource Manager template in the last section.
- Create a cluster using the Azure portal and specify linked storage account.
- Use script action to add additional secure transfer enabled storage accounts to an existing HDInsight cluster. For more information, see Add additional storage accounts to HDInsight.
In this tutorial, you have learned how to create an HDInsight cluster, and enable secure transfer to the storage accounts.
To learn more about analyzing data with HDInsight, see the following articles:
- To learn more about using Apache Hive with HDInsight, including how to perform Hive queries from Visual Studio, see Use Apache Hive with HDInsight.
- To learn about Apache Pig, a language used to transform data, see Use Apache Pig with HDInsight.
- To learn about Apache Hadoop MapReduce, a way to write programs that process data on Hadoop, see Use Apache Hadoop MapReduce with HDInsight.
- To learn about using the HDInsight Tools for Visual Studio to analyze data on HDInsight, see Get started using Visual Studio Apache Hadoop tools for HDInsight.
To learn more about how HDInsight stores data or how to get data into HDInsight, see the following articles:
- For information on how HDInsight uses Azure Storage, see Use Azure Storage with HDInsight.
- For information on how to upload data to HDInsight, see Upload data to HDInsight.
To learn more about creating or managing an HDInsight cluster, see the following articles:
- To learn about managing your Linux-based HDInsight cluster, see Manage HDInsight clusters using Apache Ambari.
- To learn more about the options you can select when creating an HDInsight cluster, see Creating HDInsight on Linux using custom options.
If you are familiar with Linux, and Apache Hadoop, but want to know specifics about Hadoop on the HDInsight, see Working with HDInsight on Linux. This article provides information such as:
We'd love to hear your thoughts. Choose the type you'd like to provide:
Our feedback system is built on GitHub Issues. Read more on our blog.