Connect to HDInsight (Hadoop) using SSH
Learn how to use Secure Shell (SSH) to securely connect to HDInsight. HDInsight can use Linux (Ubuntu) as the operating system for nodes within the cluster. SSH can be used to connect to the head and edge nodes of a Linux-based cluster and run commands directly on those nodes.
The following table contains the address and port information needed when connecting to HDInsight using SSH:
||22||Edge node (if one exists)|
<edgenodename> with the name of the edge node. For more information on using edge nodes, see Use edge nodes in HDInsight.
<clustername> with the name of your HDInsight cluster.
We recommend always connecting to the edge node if you have one. The head nodes host services that are critical to the health of the cluster. The edge node runs only what you put on it.
Most operating systems provide the
ssh client. Microsoft Windows does not provide an SSH client by default. An SSH client for Windows is available in each of the following packages:
Bash on Ubuntu on Windows 10: The
sshcommand is provided through the Bash on Windows command line.
Git (https://git-scm.com/): The
sshcommand is provided through the GitBash command line.
GitHub Desktop (https://desktop.github.com/) The
sshcommand is provided through the Git Shell command line. GitHub Desktop can be configured to use Bash, the Windows Command Prompt, or PowerShell as the command line for the Git Shell.
OpenSSH (https://github.com/PowerShell/Win32-OpenSSH/wiki/Install-Win32-OpenSSH): The PowerShell team is porting OpenSSH to Windows, and provides test releases.
The OpenSSH package includes the SSH server component,
sshd. This component starts an SSH server on your system, allowing others to connect to it. Do not configure this component or open port 22 unless you want to host an SSH server on your system. It is not required to communicate with HDInsight.
There are also several graphical SSH clients, such as PuTTY (http://www.chiark.greenend.org.uk/~sgtatham/putty/) and MobaXterm (http://mobaxterm.mobatek.net/). While these clients can be used to connect to HDInsight, the process of connecting to a server is different than using the
ssh utility. For more information, see the documentation of the graphical client you are using.
SSH keys use Public-key cryptography to secure the cluster. SSH keys are more secure than passwords, and provide an easy way to secure your HDInsight cluster.
If your SSH account is secured using a key, the client must provide the matching private key when you connect:
Most clients can be configured to use a default key. For example, the
sshclient looks for a private key at
~/.ssh/id_rsaon Linux and Unix environments.
You can specify the path to a private key. With the
-iparameter is used to specify the path to private key. For example,
ssh -i ~/.ssh/hdinsight firstname.lastname@example.org.
If you have multiple private keys for use with different servers, utilities such as ssh-agent (https://en.wikipedia.org/wiki/Ssh-agent) can be used to automatically select the key to use.
If you secure your private key with a passphrase, you must enter the passphrase when using the key. Utilities such as
ssh-agent can cache the password for your convenience.
Create an SSH key pair
ssh-keygen command to create public and private key files. The following command generates a 2048-bit RSA key pair that can be used with HDInsight:
ssh-keygen -t rsa -b 2048
You are prompted for information during the key creation process. For example, where the keys are stored or whether to use a passphrase. After the process completes, two files are created; a public key and a private key.
The public key is used to create an HDInsight cluster. The public key has an extension of
The private key is used to authenticate your client to the HDInsight cluster.
You can secure your keys using a passphrase. This is effectively a password on your private key. Even if someone obtains your private key, they must have the passphrase to use the key.
Create HDInsight using the public key
|Creation method||How to use the public key|
|Azure portal||Uncheck Use same password as cluster login, and then select Public Key as the SSH authentication type. Finally, select the public key file or paste the text contents of the file in the SSH public key field.
|Azure PowerShell||Use the
|Azure CLI 1.0||Use the
|Resource Manager Template||For an example of using SSH keys with a template, see Deploy HDInsight on Linux with SSH key. The
SSH accounts can be secured using a password. When you connect to HDInsight using SSH, you are prompted to enter the password.
We do not recommend using password authentication for SSH. Passwords can be guessed and are vulnerable to brute force attacks. Instead, we recommend that you use SSH keys for authentication.
Create HDInsight using a password
|Creation method||How to specify the password|
|Azure portal||By default, the SSH user account has the same password as the cluster login account. To use a different password, uncheck Use same password as cluster login, and then enter the password in the SSH password field.
|Azure PowerShell||Use the
|Azure CLI 1.0||Use the
|Resource Manager Template||For an example of using a password with a template, see Deploy HDInsight on Linux with SSH password. The
Change the SSH password
For information on changing the SSH user account password, see the Change passwords section of the Manage HDInsight document.
If you are using a domain-joined HDInsight cluster, you must use the
kinit command after connecting with SSH. This command prompts you for a domain user and password, and authenticates your session with the Azure Active Directory domain associated with the cluster.
For more information, see Configure domain-joined HDInsight.
Connect to worker and Zookeeper nodes
The worker nodes and Zookeeper nodes are not directly accessible from the internet, but they can be accessed from the cluster head nodes or edge nodes. The following are the general steps to connect to other nodes:
Use SSH to connect to a head or edge node:
From the SSH connection to the head or edge node, use the
sshcommand to connect to a worker node in the cluster:
To retrieve a list of the domain names of the nodes in the cluster, see the examples in the Manage HDInsight by using the Ambari REST API document.
If the SSH account is secured using a password, you are asked to enter the password and the connection is established.
If the SSH account is secured using SSH keys, you must make sure that your local environment is configured for SSH agent forwarding.
Another way to directly access all nodes in the cluster is to install HDInsight into an Azure Virtual Network. Then, you can join your remote machine to the same virtual network and directly access all nodes in the cluster.
For more information, see Use a virtual network with HDInsight.
Configure SSH agent forwarding
The following steps assume a Linux/UNIX based system, and work with Bash on Windows 10. If these steps do not work for your system, you may need to consult the documentation for your SSH client.
Using a text editor, open
~/.ssh/config. If this file doesn't exist, you can create it by entering
touch ~/.ssh/configat a command line.
Add the following text to the
Host <edgenodename>.<clustername>-ssh.azurehdinsight.net ForwardAgent yes
Replace the Host information with the address of the node you connect to using SSH. The previous example uses the edge node. This entry configures SSH agent forwarding for the specified node.
Test SSH agent forwarding by using the following command from the terminal:
This command returns information similar to the following text:
If nothing is returned, then
ssh-agentis not running. See the agent startup scripts information at Using ssh-agent with ssh (http://mah.everybody.org/docs/ssh) or consult your SSH client documentation for specific steps on installing and configuring
Once you have verified that ssh-agent is running, use the following to add your SSH private key to the agent:
If your private key is stored in a different file, replace
~/.ssh/id_rsawith the path to the file.
Connect to the cluster edge node or head nodes using SSH. Then use the SSH command to connect to a worker or zookeeper node. The connection is established using the forwarded key.