Create Apache Hadoop cluster with secure transfer storage accounts in Azure HDInsight
The Secure transfer required feature enhances the security of your Azure Storage account by enforcing all requests to your account through a secure connection. This feature and the wasbs scheme are only supported by HDInsight cluster version 3.6 or newer.
Prerequisites
Before you begin this article, you must have:
- Azure subscription: To create a free one-month trial account, browse to azure.microsoft.com/free.
- An Azure Storage account with secure transfer enabled. For the instructions, see Create a storage account and Require secure transfer. Enabling secure storage transfer after creating a cluster requires additional steps not covered in this article.
- A Blob container on the storage account.
Create cluster
Warning
Billing for HDInsight clusters is prorated per minute, whether you use them or not. Be sure to delete your cluster after you finish using it. See how to delete an HDInsight cluster.
In this section, you create a Hadoop cluster in HDInsight using an Azure Resource Manager template. The template is located in GitHub. Resource Manager template experience isn't required for following this article. For other cluster creation methods and understanding the properties used in this article, see Create HDInsight clusters.
Click the following image to sign in to Azure and open the Resource Manager template in the Azure portal.
Follow the instructions to create the cluster with the following specifications:
Specify HDInsight version 3.6. Version 3.6 or newer is required.
Specify a secure transfer enabled storage account.
Use short name for the storage account.
Both the storage account and the blob container must be created beforehand.
For the instructions, see Create cluster.
If you use script action to provide your own configuration files, you must use wasbs in the following settings:
- fs.defaultFS (core-site)
- spark.eventLog.dir
- spark.history.fs.logDirectory
Add additional storage accounts
There are several options to add additional secure transfer enabled storage accounts:
- Modify the Azure Resource Manager template in the last section.
- Create a cluster using the Azure portal and specify linked storage account.
- Use script action to add additional secure transfer enabled storage accounts to an existing HDInsight cluster. For more information, see Add additional storage accounts to HDInsight.
Next steps
In this article, you've learned how to create an HDInsight cluster, and enable secure transfer to the storage accounts.
To learn more about analyzing data with HDInsight, see the following articles:
- To learn more about using Apache Hive with HDInsight, including how to perform Hive queries from Visual Studio, see Use Apache Hive with HDInsight.
- To learn about Apache Hadoop MapReduce, a way to write programs that process data on Hadoop, see Use Apache Hadoop MapReduce with HDInsight.
- To learn about using the HDInsight Tools for Visual Studio to analyze data on HDInsight, see Get started using Visual Studio Apache Hadoop tools for HDInsight.
To learn more about how HDInsight stores data or how to get data into HDInsight, see the following articles:
- For information on how HDInsight uses Azure Storage, see Use Azure Storage with HDInsight.
- For information on how to upload data to HDInsight, see Upload data to HDInsight.
To learn more about creating or managing an HDInsight cluster, see the following articles:
To learn about managing your Linux-based HDInsight cluster, see Manage HDInsight clusters using Apache Ambari.
To learn more about the options you can select when creating an HDInsight cluster, see Creating HDInsight on Linux using custom options.
If you're familiar with Linux, and Apache Hadoop, but want to know specifics about Hadoop on the HDInsight, see Working with HDInsight on Linux. This article provides information such as:
- URLs for services hosted on the cluster, such as Apache Ambari and WebHCat
- The location of Apache Hadoop files and examples on the local file system
- The use of Azure Storage (WASB) instead of Apache Hadoop HDFS as the default data store
Feedback
Loading feedback...