Work in the Hadoop ecosystem on HDInsight from a Windows PC

Learn about development and management options on the Windows PC for working in the Hadoop ecosystem on HDInsight.

HDInsight is based on Apache Hadoop and Hadoop components, open-source technologies developed on Linux. HDInsight version 3.4 and higher uses the Ubuntu Linux distribution as the underlying OS for the cluster. However, you can work with HDInsight from a Windows client or Windows development environment.

Use PowerShell for deployment and management tasks

Azure PowerShell is a scripting environment that you can use to control and automate deployment and management tasks in HDInsight from Windows.

Examples of tasks you can do with PowerShell:

Follow steps to install and configure Azure Powershell to get the latest version. If you have scripts that need to be modified to use the new cmdlets for Azure Resource Manager, see Migrate to Azure Resource Manager-based development tools for HDInsight clusters.

Utilities you can run in a browser

The following utilities have a web UI that runs in a browser:

Data Lake (Hadoop) Tools for Visual Studio

Use Data Lake Tools for Visual Studio to deploy and manage Storm topologies. Data Lake Tools also installs the SCP.NET SDK, which allows you to develop C# Storm topologies with Visual Studio.

Before you go to the following examples, install and try Data Lake Tools for Visual Studio.

Examples of tasks you can do with Visual Studio and Data Lake Tools for Visual Studio:

Visual Studio and the .NET SDK

You can use Visual Studio with the .NET SDK to manage clusters and develop big data applications. You can use other IDEs for the following tasks, but examples are shown in Visual Studio.

Examples of tasks you can do with the .NET SDK in Visual Studio:

TIP If you're running .NET solutions with Windows-based HDInsight clusters, it's a good time to plan a migration to Linux-based clusters. For more information, see Migrate .NET solution for Windows-based HDInsight to Linux-based HDInsight.

Intellij IDEA and Eclipse IDE for Spark clusters

Both Intellij IDEA and the Eclipse IDE can be used to:

  • Develop and submit a Scala Spark application on an HDInsight Spark cluster.
  • Access Spark cluster resources.
  • Develop and run a Scala Spark application locally.

These articles show how:

Notebooks on Spark for data scientists

Apache Spark clusters in HDInsight include Zeppelin notebooks and kernels that can be used with Jupyter notebooks.

Run Linux-based tools and technologies on Windows

If you encounter a situation where you must use a tool or technology that is only available on Linux, consider the following options:

  • Bash (beta) on Windows 10 provides a Linux subsystem on Windows. Bash allows you to directly run Linux utilities without having to maintain a dedicated Linux installation. Install and run the Bash beta on Windows 10
  • Docker for Windows provides access to many Linux-based tools, and can be run directly from Windows. For example, you can use Docker to run the Beeline client for Hive directly from Windows. You can also use Docker to run a local Jupyter notebook and remotely connect to Spark on HDInsight. Get started with Docker for Windows
  • MobaXTerm allows you to graphically browse the cluster file system over an SSH connection.

Next steps

If you're new to working in Linux-based clusters, see the follow articles: