使用 SSH 連線到 HDInsight (Apache Hadoop)Connect to HDInsight (Apache Hadoop) using SSH

了解如何使用安全殼層 (SSH) 安全地連線到 Apache Hadoop on Azure HDInsight。Learn how to use Secure Shell (SSH) to securely connect to Apache Hadoop on Azure HDInsight. 如需透過虛擬網路連接的詳細資訊,請參閱Azure HDInsight 虛擬網路架構規劃 Azure HDInsight 叢集的虛擬網路部署For information on connecting through a virtual network, see Azure HDInsight virtual network architecture and Plan a virtual network deployment for Azure HDInsight clusters.

下表包含使用 SSH 用戶端連接到 HDInsight 時所需的位址和埠資訊:The following table contains the address and port information needed when connecting to HDInsight using an SSH client:

位址Address 連接埠Port 連線到...Connects to...
<clustername>-ssh.azurehdinsight.net 2222 主要前端節點Primary headnode
<clustername>-ssh.azurehdinsight.net 2323 次要前端節點Secondary headnode
<clustername>-ed-ssh.azurehdinsight.net 2222 邊緣節點(HDInsight 上的 ML 服務)edge node (ML Services on HDInsight)
<edgenodename>.<clustername>-ssh.azurehdinsight.net 2222 邊緣節點(如果有邊緣節點,則為任何其他叢集類型)edge node (any other cluster type, if an edge node exists)

<clustername> 取代為您的叢集名稱。Replace <clustername> with the name of your cluster. <edgenodename> 替換為邊緣節點的名稱。Replace <edgenodename> with the name of the edge node.

如果您的叢集包含邊緣節點,我們建議您__一律使用 SSH 連線到邊緣節點__。If your cluster contains an edge node, we recommend that you always connect to the edge node using SSH. 前端節點會裝載對於 Hadoop 健康狀態至關重要的服務。The head nodes host services that are critical to the health of Hadoop. 邊緣節點則只會執行您放在上面的服務。The edge node runs only what you put on it. 如需使用邊緣節點的詳細資訊,請參閱在 HDInsight 中使用邊緣節點For more information on using edge nodes, see Use edge nodes in HDInsight.

提示

當您初次連線至 HDInsight,您的 SSH 用戶端可能會顯示警告,指出無法建立主機的真確性。When you first connect to HDInsight, your SSH client may display a warning that the authenticity of the host can't be established. 在系統提示時,選取 [是] 將主機新增至您的 SSH 用戶端信任的伺服器清單。When prompted select 'yes' to add the host to your SSH client's trusted server list.

如果您之前曾連線至相同名稱的伺服器,您可能會收到警告,指出預存的主機金鑰與伺服器的主機金鑰不符。If you have previously connected to a server with the same name, you may receive a warning that the stored host key does not match the host key of the server. 如需了解如何移除現有的伺服器名稱項目,請參閱您的 SSH 用戶端文件。Consult the documentation for your SSH client on how to remove the existing entry for the server name.

SSH 用戶端SSH clients

Linux、Unix 和 macOS 系統提供 sshscp 命令。Linux, Unix, and macOS systems provide the ssh and scp commands. ssh 用戶端通常用來建立以 Linux 或 Unix 為基礎之系統的遠端命令列工作階段。The ssh client is commonly used to create a remote command-line session with a Linux or Unix-based system. scp 用戶端用來安全地複製用戶端與遠端系統之間的檔案。The scp client is used to securely copy files between your client and the remote system.

Microsoft Windows 預設不會安裝任何 SSH 用戶端。Microsoft Windows doesn't install any SSH clients by default. sshscp 用戶端均可透過下列套件使用於 Windows︰The ssh and scp clients are available for Windows through the following packages:

另外還有數個圖形化的 SSH 用戶端,例如PuTTYMobaXtermThere are also several graphical SSH clients, such as PuTTY and MobaXterm. 雖然這些用戶端可用來連線到 HDInsight,但連線的程序與使用 ssh 公用程式時不同。While these clients can be used to connect to HDInsight, the process of connecting is different than using the ssh utility. 如需詳細資訊,請參閱您所使用之圖形化用戶端的檔。For more information, see the documentation of the graphical client you're using.

驗證:SSH 金鑰Authentication: SSH Keys

SSH 金鑰使用公開金鑰加密來驗證 SSH 會話。SSH keys use public-key cryptography to authenticate SSH sessions. SSH 金鑰比密碼更安全,並提供簡單的方式來保護 Hadoop 叢集的存取。SSH keys are more secure than passwords, and provide an easy way to secure access to your Hadoop cluster.

如果您使用金鑰來保護 SSH 帳戶,當您連線時,用戶端必須提供對應的私密金鑰︰If your SSH account is secured using a key, the client must provide the matching private key when you connect:

  • 大部分用戶端均可設定為使用__預設金鑰__。Most clients can be configured to use a default key. 例如,ssh 用戶端會在 Linux 和 Unix 環境的 ~/.ssh/id_rsa 中尋找私密金鑰。For example, the ssh client looks for a private key at ~/.ssh/id_rsa on Linux and Unix environments.

  • 您可以指定__私密金鑰的路徑__。You can specify the path to a private key. 使用 ssh 用戶端時,您可以使用 -i 參數來指定私密金鑰的路徑。With the ssh client, the -i parameter is used to specify the path to private key. 例如,ssh -i ~/.ssh/id_rsa sshuser@myedge.mycluster-ssh.azurehdinsight.netFor example, ssh -i ~/.ssh/id_rsa sshuser@myedge.mycluster-ssh.azurehdinsight.net.

  • 如果您有__多個私密金鑰__可搭配不同的伺服器使用,請考慮使用 ssh-agent (https://en.wikipedia.org/wiki/Ssh-agent) 之類的公用程式。If you have multiple private keys for use with different servers, consider using a utility such as ssh-agent (https://en.wikipedia.org/wiki/Ssh-agent). ssh-agent 公用程式可用來自動選取在建立 SSH 工作階段時所要使用的金鑰。The ssh-agent utility can be used to automatically select the key to use when establishing an SSH session.

重要

如果您使用複雜密碼來保護私密金鑰,您必須在使用金鑰時輸入複雜密碼。If you secure your private key with a passphrase, you must enter the passphrase when using the key. ssh-agent 之類的公用程式可以快取密碼以方便您使用。Utilities such as ssh-agent can cache the password for your convenience.

建立 SSH 金鑰組Create an SSH key pair

使用 ssh-keygen 命令來建立公開和私密金鑰檔案。Use the ssh-keygen command to create public and private key files. 下列命令會產生 2048 位元的 RSA 金鑰組,以供搭配 HDInsight 使用︰The following command generates a 2048-bit RSA key pair that can be used with HDInsight:

ssh-keygen -t rsa -b 2048

在金鑰建立程式期間,系統會提示您提供資訊。You're prompted for information during the key creation process. 例如,金鑰的儲存位置或是否要使用複雜密碼。For example, where the keys are stored or whether to use a passphrase. 程序完成之後,系統會建立兩個檔案:公開金鑰和私密金鑰。After the process completes, two files are created; a public key and a private key.

  • __公開金鑰__可用來建立 HDInsight 叢集。The public key is used to create an HDInsight cluster. 公開金鑰的副檔名為 .pubThe public key has an extension of .pub.

  • __私密金鑰__可用來向 HDInsight 叢集驗證用戶端。The private key is used to authenticate your client to the HDInsight cluster.

重要

您可以使用複雜密碼來保護金鑰。You can secure your keys using a passphrase. 複雜密碼實際上就是私密金鑰的密碼。A passphrase is effectively a password on your private key. 即使有人取得您的私密金鑰,他們必須有複雜密碼才能使用該金鑰。Even if someone obtains your private key, they must have the passphrase to use the key.

使用公開金鑰建立 HDInsightCreate HDInsight using the public key

建立方法Creation method 如何使用公開金鑰How to use the public key
Azure 入口網站Azure portal 取消核取 [__使用 ssh__的叢集登入密碼],然後選取 [公開金鑰] 作為 [ssh 驗證類型]。Uncheck Use cluster login password for SSH, and then select Public Key as the SSH authentication type. 最後,選取公開金鑰檔案,或將檔案的文字內容貼到 [SSH 公開金鑰] 欄位。Finally, select the public key file or paste the text contents of the file in the SSH public key field.
建立 HDInsight 叢集時的 [SSH 公開金鑰] 對話方塊SSH public key dialog in HDInsight cluster creation
Azure PowerShellAzure PowerShell 使用new-azhdinsightcluster Cmdlet 的 -SshPublicKey 參數,並以字串形式傳遞公開金鑰的內容。Use the -SshPublicKey parameter of the New-AzHdinsightCluster cmdlet and pass the contents of the public key as a string.
Azure CLIAzure CLI 使用az hdinsight create命令的 --sshPublicKey 參數,並以字串形式傳遞公開金鑰的內容。Use the --sshPublicKey parameter of the az hdinsight create command and pass the contents of the public key as a string.
Resource Manager 範本Resource Manager Template 如需對範本使用 SSH 金鑰的範例,請參閱使用 SSH 金鑰在 Linux 上部署 HDInsightFor an example of using SSH keys with a template, see Deploy HDInsight on Linux with SSH key. azuredeploy.json 檔案中的 publicKeys 元素可用來在建立叢集時將金鑰傳遞至 Azure。The publicKeys element in the azuredeploy.json file is used to pass the keys to Azure when creating the cluster.

驗證:密碼Authentication: Password

您可以使用密碼來保護 SSH 帳戶。SSH accounts can be secured using a password. 當您使用 SSH 連線到 HDInsight 時,系統會提示您輸入密碼。When you connect to HDInsight using SSH, you're prompted to enter the password.

警告

Microsoft 不建議對 SSH 使用密碼驗證。Microsoft does not recommend using password authentication for SSH. 密碼可以猜到,因此很容易遭受暴力密碼破解攻擊。Passwords can be guessed and are vulnerable to brute force attacks. 相反地,我們會建議您使用 SSH 金鑰來進行驗證Instead, we recommend that you use SSH keys for authentication.

重要

SSH 帳戶密碼會在 HDInsight 叢集建立後 70 天到期。The SSH account password expires 70 days after the HDInsight cluster is created. 如果密碼到期,您可以使用管理 HDInsight 文件中的資訊更換。If your password expires, you can change it using the information in the Manage HDInsight document.

使用密碼建立 HDInsightCreate HDInsight using a password

建立方法Creation method 如何指定密碼How to specify the password
Azure 入口網站Azure portal 根據預設,SSH 使用者帳戶會具有和叢集登入帳戶相同的密碼。By default, the SSH user account has the same password as the cluster login account. 若要使用不同的密碼,請取消核取 [__使用 ssh__的叢集登入密碼],然後在 [ ssh 密碼] 欄位中輸入密碼。To use a different password, uncheck Use cluster login password for SSH, and then enter the password in the SSH password field.
建立 HDInsight 叢集時的 [SSH 密碼] 對話方塊SSH password dialog in HDInsight cluster creation
Azure PowerShellAzure PowerShell 使用new-azhdinsightcluster Cmdlet 的 --SshCredential 參數,並傳遞包含 SSH 使用者帳戶名稱和密碼的 PSCredential 物件。Use the --SshCredential parameter of the New-AzHdinsightCluster cmdlet and pass a PSCredential object that contains the SSH user account name and password.
Azure CLIAzure CLI 使用az hdinsight create命令的 --sshPassword 參數,並提供密碼值。Use the --sshPassword parameter of the az hdinsight create command and provide the password value.
Resource Manager 範本Resource Manager Template 如需對範本使用密碼的範例,請參閱使用 SSH 密碼在 Linux 上部署 HDInsightFor an example of using a password with a template, see Deploy HDInsight on Linux with SSH password. azuredeploy.json 檔案中的 linuxOperatingSystemProfile 元素可用來在建立叢集時將 SSH 帳戶名稱和密碼傳遞至 Azure。The linuxOperatingSystemProfile element in the azuredeploy.json file is used to pass the SSH account name and password to Azure when creating the cluster.

變更 SSH 密碼Change the SSH password

如需有關變更 SSH 使用者帳戶密碼的資訊,請參閱管理 HDInsight 文件的__變更密碼__一節。For information on changing the SSH user account password, see the Change passwords section of the Manage HDInsight document.

驗證:已加入網域的 HDInsightAuthentication: Domain-joined HDInsight

如果您使用已__加入網域的 HDInsight__叢集,則在與 SSH 本機使用者連線之後,您必須使用 kinit 命令。If you're using a domain-joined HDInsight cluster, you must use the kinit command after connecting with SSH local user. 此命令會提示您輸入網域使用者和密碼,並向與叢集相關聯的 Azure Active Directory 網域驗證您的工作階段。This command prompts you for a domain user and password, and authenticates your session with the Azure Active Directory domain associated with the cluster.

您也可以在每個加入網域的節點上啟用 Kerberos 驗證(例如,前端節點、邊緣節點),以便使用網域帳戶進行 ssh。You can also enable Kerberos Authentication on each domain joined node (for example, head node, edge node) in order to ssh using the domain account. 若要進行此作業,請編輯 sshd 組態檔:To do this edit sshd config file:

sudo vi /etc/ssh/sshd_config

取消註解,並將 KerberosAuthentication 變更為 yesuncomment and change KerberosAuthentication to yes

sudo service sshd restart

在任何時間,若要確認 Kerberos 驗證是否已成功,請使用 klist 命令。At any time, in order to verify whether the Kerberos authentication was successful or not, use klist command.

如需詳細資訊,請參閱設定已加入網域的 HDInsightFor more information, see Configure domain-joined HDInsight.

連線到節點Connect to nodes

可在連接埠 22 和 23 上透過網際網路存取前端節點和邊緣節點 (如果有的話)。The head nodes and edge node (if there is one) can be accessed over the internet on ports 22 and 23.

  • 連線到__前端節點__時,請使用連接埠 22 連線到主要前端節點,以及使用連接埠 23 連線到次要前端節點。When connecting to the head nodes, use port 22 to connect to the primary head node and port 23 to connect to the secondary head node. 要使用的完整網域名稱為 clustername-ssh.azurehdinsight.net,其中 clustername 是您的叢集名稱。The fully qualified domain name to use is clustername-ssh.azurehdinsight.net, where clustername is the name of your cluster.

    # Connect to primary head node
    # port not specified since 22 is the default
    ssh sshuser@clustername-ssh.azurehdinsight.net
    
    # Connect to secondary head node
    ssh -p 23 sshuser@clustername-ssh.azurehdinsight.net
    
  • 連線到__邊緣節點__時,請使用連接埠 22。When connecting to the edge node, use port 22. 完整網域名稱為 edgenodename.clustername-ssh.azurehdinsight.net,其中 edgenodename 是您建立邊緣節點時提供的名稱。The fully qualified domain name is edgenodename.clustername-ssh.azurehdinsight.net, where edgenodename is a name you provided when creating the edge node. clustername 是叢集的名稱。clustername is the name of the cluster.

    # Connect to edge node
    ssh sshuser@edgnodename.clustername-ssh.azurehdinsight.net
    

重要

先前的範例假設您使用密碼驗證,或該憑證驗證自動發生。The previous examples assume that you are using password authentication, or that certificate authentication is occurring automatically. 如果您使用 SSH 金鑰組進行驗證,但未自動使用憑證,請使用 -i 參數來指定私密金鑰。If you use an SSH key-pair for authentication, and the certificate is not used automatically, use the -i parameter to specify the private key. 例如,ssh -i ~/.ssh/mykey sshuser@clustername-ssh.azurehdinsight.netFor example, ssh -i ~/.ssh/mykey sshuser@clustername-ssh.azurehdinsight.net.

連線之後,提示會變更以指出 SSH 使用者名稱和您所連線的節點。Once connected, the prompt changes to indicate the SSH user name and the node you're connected to. 例如,以 sshuser 身分連線到主要前端節點時,提示為 sshuser@hn0-clustername:~$For example, when connected to the primary head node as sshuser, the prompt is sshuser@hn0-clustername:~$.

連線至背景工作角色和 Apache Zookeeper 節點Connect to worker and Apache Zookeeper nodes

背景工作角色節點和 Zookeeper 節點無法直接從網際網路存取。The worker nodes and Zookeeper nodes aren't directly accessible from the internet. 從叢集前端節點或邊緣節點即可加以存取。They can be accessed from the cluster head nodes or edge nodes. 以下是連接至其他節點的一般步驟:The following are the general steps to connect to other nodes:

  1. 使用 SSH 連接前端或邊緣節點:Use SSH to connect to a head or edge node:

    ssh sshuser@myedge.mycluster-ssh.azurehdinsight.net
    
  2. 從 SSH 到前端或邊緣節點的連線,使用 ssh 命令來連接叢集中的背景工作角色節點︰From the SSH connection to the head or edge node, use the ssh command to connect to a worker node in the cluster:

    ssh sshuser@wn0-myhdi
    

    若要擷取節點名稱清單,請參閱使用 Apache Ambari REST API 管理 HDInsight 文件。To retrieve a list of the node names, see the Manage HDInsight by using the Apache Ambari REST API document.

如果使用__密碼__來保護 SSH 帳戶,請在連線時輸入密碼。If the SSH account is secured using a password, enter the password when connecting.

如果使用 __SSH 金鑰__來保護 SSH 帳戶,請確定用戶端上的 SSH 轉送已啟用。If the SSH account is secured using SSH keys, make sure that SSH forwarding is enabled on the client.

注意

若要直接存取叢集中的所有節點,另一種方式是將 HDInsight 安裝到 Azure 虛擬網路。Another way to directly access all nodes in the cluster is to install HDInsight into an Azure Virtual Network. 然後,您可以將遠端機器加入相同的虛擬網路,並直接存取叢集中的所有節點。Then, you can join your remote machine to the same virtual network and directly access all nodes in the cluster.

如需詳細資訊,請參閱規劃 HDInsight 的虛擬網路For more information, see Plan a virtual network for HDInsight.

設定 SSH 代理程式轉送Configure SSH agent forwarding

重要

以下步驟假設您使用以 Linux 或 UNIX 為基礎的系統,並使用 Bash on Windows 10 進行操作。The following steps assume a Linux or UNIX-based system, and work with Bash on Windows 10. 如果這些步驟不適用於您的系統,您可能需要參閱 SSH 用戶端的文件。If these steps do not work for your system, you may need to consult the documentation for your SSH client.

  1. 使用文字編輯器開啟 ~/.ssh/configUsing a text editor, open ~/.ssh/config. 如果該檔案不存在,您可以藉由在命令列中輸入 touch ~/.ssh/config 加以建立。If this file doesn't exist, you can create it by entering touch ~/.ssh/config at a command line.

  2. 將下列文字新增至 config 檔案。Add the following text to the config file.

    Host <edgenodename>.<clustername>-ssh.azurehdinsight.net
        ForwardAgent yes
    

    Host 資訊替換為您使用 SSH 連線到的節點位址。Replace the Host information with the address of the node you connect to using SSH. 先前的範例使用邊緣節點。The previous example uses the edge node. 這個項目會為指定節點設定 SSH 代理程式轉送。This entry configures SSH agent forwarding for the specified node.

  3. 從終端機使用下列命令,測試 SSH 代理程式轉送:Test SSH agent forwarding by using the following command from the terminal:

    echo "$SSH_AUTH_SOCK"
    

    此命令會傳回類似以下文字的資訊:This command returns information similar to the following text:

    /tmp/ssh-rfSUL1ldCldQ/agent.1792
    

    如果沒有傳回任何內容,ssh-agent 則不會執行。If nothing is returned, then ssh-agent isn't running. 如需詳細資訊,請參閱透過 ssh 使用 ssh-agent (http://mah.everybody.org/docs/ssh)) 的代理程式啟動指令碼資訊,或參閱 SSH 用戶端文件。For more information, see the agent startup scripts information at Using ssh-agent with ssh (http://mah.everybody.org/docs/ssh) or consult your SSH client documentation.

  4. 一旦您確認ssh 代理程式正在執行,請使用下列程式將您的 ssh 私密金鑰新增至代理程式:Once you've verified that ssh-agent is running, use the following to add your SSH private key to the agent:

    ssh-add ~/.ssh/id_rsa
    

    如果您的私密金鑰儲存在不同的檔案,請將 ~/.ssh/id_rsa 取代為檔案的路徑。If your private key is stored in a different file, replace ~/.ssh/id_rsa with the path to the file.

  5. 使用 SSH 連線到叢集的邊緣節點或前端節點。Connect to the cluster edge node or head nodes using SSH. 然後,使用 SSH 命令連線到背景工作角色或 Zookeeper 節點。Then use the SSH command to connect to a worker or zookeeper node. 該連線會使用轉送的金鑰來建立。The connection is established using the forwarded key.

複製檔案Copy files

scp 公用程式可用於雙向複製叢集中個別節點的檔案。The scp utility can be used to copy files to and from individual nodes in the cluster. 例如,下列命令可將 test.txt 目錄從本機系統複製到主要前端節點:For example, the following command copies the test.txt directory from the local system to the primary head node:

scp test.txt sshuser@clustername-ssh.azurehdinsight.net:

因為未在 : 之後指定路徑,所以此檔案會置於 sshuser 主目錄中。Since no path is specified after the :, the file is placed in the sshuser home directory.

下列範例可將 test.txt 檔案從主要前端節點上的 sshuser 主目錄複製到本機系統:The following example copies the test.txt file from the sshuser home directory on the primary head node to the local system:

scp sshuser@clustername-ssh.azurehdinsight.net:test.txt .

重要

scp 只能存取叢集內個別節點的檔案系統。scp can only access the file system of individual nodes within the cluster. 它不能用來存取叢集的 HDFS 相容儲存體中的資料。It cannot be used to access data in the HDFS-compatible storage for the cluster.

當您需要從 SSH 工作階段上傳資源以供使用時,請使用 scpUse scp when you need to upload a resource for use from an SSH session. 例如,上傳 Python 指令碼,然後從 SSH 工作階段執行指令碼。For example, upload a Python script and then run the script from an SSH session.

如需直接將資料載入 HDFS 相容儲存體的資訊,請參閱下列文件:For information on directly loading data into the HDFS-compatible storage, see the following documents:

後續步驟Next steps