Some Frequently Asked Questions on Microsoft Azure HDInsight


We have seen some common questions on HDInsight when interacting with customers and partners. On this blog post, we are going to help answer some of those common questions.

1. What is Microsoft Azure HDInsight?

HDInsight is a Hadoop-based service from Microsoft that brings a 100 percent Apache Hadoop solution to the cloud. Through deep integration into BI tools such as PowerPivot, Power view, HDInsight enables end users to easily gain insight into big data. HDInsight also makes it very easy to deploy a Hadoop cluster within minutes with a few clicks. It also makes available programmatic interfaces like Powershell, and .Net SDK for customized cluster provisioning.  Please visit the landing page of Microsoft Azure HDInsight Service for more information - here. Also, here is a nice article written by Dan that gives an architectural overview of Microsoft Azure HDInsight here.

2. Where can I find product documentation on Microsoft Azure HDInsight?

Please visit our product documentation page at , where you will find working samples and demos to provision and interact with HDInsight. Please visit our Big Data support page at to read some articles on HDInsight and also Hadoop core topics.

3. How do I get Support for HDInsight?

There are several options to get support for HDInsight. The technical product and billing support options available are detailed here. You can also access our online forums here to collaborate with the rest of the community on Azure HDInsight related questions.A service dashboard is available here, which shows the current health of all Azure services including Microsoft Azure HDInsight.

4. What versions of Hadoop and HDP are available on Microsoft Azure HDInsight?

The version page here is updated with the new features on the cluster versions provided by HDInsight.

5. How do I provision an HDInsight cluster?

The blog post here outlines the two common approaches for provisioning HDInsight : Management Portal, and PowerShell. It is also possible to create a customized HDInsight cluster with some custom configuration options to suit your need. This blog post here from Azim outlines a PowerShell approach for creating a customized cluster.

6. Where does HDInsight store data, and metadata?

HDInsight supports both HDFS and Windows Azure Storage BLOB (WASB) for data storage. However, using WASB is recommended due to several benefits like data reuse and sharing, archiving, storage cost, and elastic scale out as described in more detail here . By default, metastore for Hive and Oozie is provisioned on Azure SQL Database.

7. Is the WASB storage account retained even when the HDInsight cluster is dropped?

Yes, WASB storage account is left behind even when the HDInsight cluster is dropped. This is real nice because the same Azure storage account can be attached to another HDInsight cluster for reusing the data in there. This helps with saving compute hours when the cluster is not needed.

8. Is the metastore database on Azure SQL Database retained when the HDInsight cluster is dropped?

Short answer is no. By default, the metastore databases on Azure SQL database are dropped when the HDInsight cluster is dropped. However, there is an explicit option that you can choose when you provision the HDInsight cluster using custom create option to ask for a custom metastore as shown on the screenshot below.

If you have explicitly defined a custom metastore, when provisioning the HDInsight cluster, then the metastore database is also left behind when the cluster is dropped.

9. Is there a connect site for requesting or voting on feature requests for Microsoft Azure HDInsight?

Yes, you can vote for feature requests or add your feature request here for Microsoft Azure HDInsight.

10. Is there a System Center Management Pack available for HDInsight?

Yes, it is available for download here

11. How do I connect BI tools to HDInsight for gaining insights into the data?

You can download Microsoft Hive ODBC driver from here to connect from Excel, or PowerPivot to gain insight. Please find the walkthrough on how to connect Excel to HDInsight using Hive ODBC driver here

12. How do I access logs on HDInsight?

Hadoop service logs, and templeton logs are stored on Windows Azure Storage BLOB account among other things. Brian's blog here goes over logging on HDInsight in more detail.

13. How can an HDInsight cluster be upgraded?

Upgrading an HDInsight cluster is very simple to do. If you have plugged in an Azure BLOB storage account(s) and a custom metastore to HDInsight, all you need to do is to drop the existing HDInsight cluster and create a new HDInsight cluster on the version you need and plug in the existing Azure BLOB Storage account(s) and the metastore to get an upgraded HDInsight cluster!. Note that this is possible only if the data and metastore is externalized to the HDInsight compute cluster by using the Azure BLOB storage for file system and SQL Azure database for metastores.

14. What are the different options available to move data into a Windows Azure Storage BLOB account?

If you are looking to move a large amount of data, you can use the Microsoft Azure Import/Export Service to transfer data to Azure BLOB storage. More details on that are available here. For smaller incremental data, uploads can be scheduled into Azure BLOB storage using one of these tools as detailed on this article here. Also, ExpressRoute enables a faster, private connection into Azure. A technical overview on ExpressRoute can be found here.

15. Is there a local development platform available for HDInsight?

Yes, HDInsight emulator provides a local development platform and comes with the same components from the Hadoop ecosystem as Azure HDInsight. it is available for download here. Some samples for working with the emulator are available here

This concludes the list of some common questions for this post. Hope you find this helpful!

Thank you.

Dharshana Bharadwaj (@dharshb)

Thanks to JasonH for reviewing this!