クイックスタート:ARM テンプレートを使用して Azure HDInsight 内に Apache Hadoop クラスターを作成する

[アーティクル]
09/15/2023

このクイックスタートでは、Azure Resource Manager テンプレート (ARM テンプレート) を使用して、Azure HDInsight に Apache Hadoop クラスターを作成します。 Hadoop は本来、クラスターでのビッグデータセットの分散処理および分析のためのオープンソースフレームワークでした。 Hadoop エコシステムには、Apache Hive、Apache HBase、Spark、Kafka、その他の多くの関連するソフトウェアおよびユーティリティが含まれます。

Azure Resource Manager テンプレートは JavaScript Object Notation (JSON) ファイルであり、プロジェクトのインフラストラクチャと構成が定義されています。このテンプレートでは、宣言型の構文が使用されています。デプロイしようとしているものを、デプロイを作成する一連のプログラミングコマンドを記述しなくても記述できます。

現在、HDInsight には 7 種類のクラスターが用意されています。クラスターの種類はそれぞれ異なるコンポーネントセットをサポートしていますが、 Hive は全種類のクラスターでサポートされています。 HDInsight でサポートされているコンポーネントの一覧については、「HDInsight で提供される Hadoop クラスターバージョンの新機能」を参照してください。

環境が前提条件を満たしていて、ARM テンプレートの使用に慣れている場合は、 [Azure へのデプロイ] ボタンを選択します。 Azure portal でテンプレートが開きます。

前提条件

Azure サブスクリプションをお持ちでない場合は、開始する前に無料アカウントを作成してください。

テンプレートを確認する

このクイックスタートで使用されるテンプレートは Azure クイックスタートテンプレートからのものです。

{
  "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
  "contentVersion": "1.0.0.0",
  "metadata": {
    "_generator": {
      "name": "bicep",
      "version": "0.26.54.24096",
      "templateHash": "1839820966662864707"
    }
  },
  "parameters": {
    "clusterName": {
      "type": "string",
      "metadata": {
        "description": "The name of the HDInsight cluster to create."
      }
    },
    "clusterType": {
      "type": "string",
      "allowedValues": [
        "hadoop",
        "intractivehive",
        "hbase",
        "storm",
        "spark"
      ],
      "metadata": {
        "description": "The type of the HDInsight cluster to create."
      }
    },
    "clusterLoginUserName": {
      "type": "string",
      "metadata": {
        "description": "These credentials can be used to submit jobs to the cluster and to log into cluster dashboards."
      }
    },
    "clusterLoginPassword": {
      "type": "securestring",
      "minLength": 10,
      "metadata": {
        "description": "The password must be at least 10 characters in length and must contain at least one digit, one upper case letter, one lower case letter, and one non-alphanumeric character except (single-quote, double-quote, backslash, right-bracket, full-stop). Also, the password must not contain 3 consecutive characters from the cluster username or SSH username."
      }
    },
    "sshUserName": {
      "type": "string",
      "metadata": {
        "description": "These credentials can be used to remotely access the cluster. The username cannot be admin."
      }
    },
    "sshPassword": {
      "type": "securestring",
      "minLength": 6,
      "maxLength": 72,
      "metadata": {
        "description": "SSH password must be 6-72 characters long and must contain at least one digit, one upper case letter, and one lower case letter.  It must not contain any 3 consecutive characters from the cluster login name"
      }
    },
    "location": {
      "type": "string",
      "defaultValue": "[resourceGroup().location]",
      "metadata": {
        "description": "Location for all resources."
      }
    },
    "HeadNodeVirtualMachineSize": {
      "type": "string",
      "defaultValue": "Standard_E4_v3",
      "allowedValues": [
        "Standard_A4_v2",
        "Standard_A8_v2",
        "Standard_E2_v3",
        "Standard_E4_v3",
        "Standard_E8_v3",
        "Standard_E16_v3",
        "Standard_E20_v3",
        "Standard_E32_v3",
        "Standard_E48_v3"
      ],
      "metadata": {
        "description": "This is the headnode Azure Virtual Machine size, and will affect the cost. If you don't know, just leave the default value."
      }
    },
    "WorkerNodeVirtualMachineSize": {
      "type": "string",
      "defaultValue": "Standard_E4_v3",
      "allowedValues": [
        "Standard_A4_v2",
        "Standard_A8_v2",
        "Standard_E2_v3",
        "Standard_E4_v3",
        "Standard_E8_v3",
        "Standard_E16_v3",
        "Standard_E20_v3",
        "Standard_E32_v3",
        "Standard_E48_v3"
      ],
      "metadata": {
        "description": "This is the workdernode Azure Virtual Machine size, and will affect the cost. If you don't know, just leave the default value."
      }
    }
  },
  "variables": {
    "defaultStorageAccount": {
      "name": "[uniqueString(resourceGroup().id)]",
      "type": "Standard_LRS"
    }
  },
  "resources": [
    {
      "type": "Microsoft.Storage/storageAccounts",
      "apiVersion": "2021-08-01",
      "name": "[variables('defaultStorageAccount').name]",
      "location": "[parameters('location')]",
      "sku": {
        "name": "[variables('defaultStorageAccount').type]"
      },
      "kind": "StorageV2",
      "properties": {}
    },
    {
      "type": "Microsoft.HDInsight/clusters",
      "apiVersion": "2021-06-01",
      "name": "[parameters('clusterName')]",
      "location": "[parameters('location')]",
      "properties": {
        "clusterVersion": "4.0",
        "osType": "Linux",
        "clusterDefinition": {
          "kind": "[parameters('clusterType')]",
          "configurations": {
            "gateway": {
              "restAuthCredential.isEnabled": true,
              "restAuthCredential.username": "[parameters('clusterLoginUserName')]",
              "restAuthCredential.password": "[parameters('clusterLoginPassword')]"
            }
          }
        },
        "storageProfile": {
          "storageaccounts": [
            {
              "name": "[replace(replace(concat(reference(resourceId('Microsoft.Storage/storageAccounts', variables('defaultStorageAccount').name), '2021-08-01').primaryEndpoints.blob), 'https:', ''), '/', '')]",
              "isDefault": true,
              "container": "[parameters('clusterName')]",
              "key": "[listKeys(resourceId('Microsoft.Storage/storageAccounts', variables('defaultStorageAccount').name), '2021-08-01').keys[0].value]"
            }
          ]
        },
        "computeProfile": {
          "roles": [
            {
              "name": "headnode",
              "targetInstanceCount": 2,
              "hardwareProfile": {
                "vmSize": "[parameters('HeadNodeVirtualMachineSize')]"
              },
              "osProfile": {
                "linuxOperatingSystemProfile": {
                  "username": "[parameters('sshUserName')]",
                  "password": "[parameters('sshPassword')]"
                }
              }
            },
            {
              "name": "workernode",
              "targetInstanceCount": 2,
              "hardwareProfile": {
                "vmSize": "[parameters('WorkerNodeVirtualMachineSize')]"
              },
              "osProfile": {
                "linuxOperatingSystemProfile": {
                  "username": "[parameters('sshUserName')]",
                  "password": "[parameters('sshPassword')]"
                }
              }
            }
          ]
        }
      },
      "dependsOn": [
        "[resourceId('Microsoft.Storage/storageAccounts', variables('defaultStorageAccount').name)]"
      ]
    }
  ],
  "outputs": {
    "storage": {
      "type": "object",
      "value": "[reference(resourceId('Microsoft.Storage/storageAccounts', variables('defaultStorageAccount').name), '2021-08-01')]"
    },
    "cluster": {
      "type": "object",
      "value": "[reference(resourceId('Microsoft.HDInsight/clusters', parameters('clusterName')), '2021-06-01')]"
    }
  }
}

テンプレートでは、次の 2 つの Azure リソースが定義されています。

Microsoft.Storage/storageAccounts: Azure のストレージアカウントを作成します。
Microsoft.HDInsight/cluster: HDInsight クラスターを作成します。

テンプレートのデプロイ

下の [Azure に配置する] ボタンを選択して Azure にサインインし、ARM テンプレートを開きます。

次の値を入力または選択します。

プロパティ	説明
サブスクリプション	ドロップダウンリストから、このクラスターに使用する Azure サブスクリプションを選択します。
Resource group	ドロップダウンリストから既存のリソースグループを選択するか、 [新規作成] を選択します。
場所	この値には、リソースグループに使用される場所が自動入力されます。
クラスター名	グローバルに一意の名前を入力します。このテンプレートの場合、使用できるのは小文字と数字のみです。
クラスターの種類	[hadoop] を選択します。
[Cluster Login User Name](クラスターログインユーザー名)	ユーザー名を指定します。既定値は `admin` です。
[クラスターログインパスワード]	パスワードを指定します。パスワードは 10 文字以上で、数字、大文字、小文字、英数字以外の文字 (' ` " を除く) が少なくとも 1 つずつ含まれる必要があります。
SSH ユーザー名	ユーザー名を指定します。既定値は `sshuser` です。
SSH パスワード	パスワードを指定します。

一部のプロパティは、テンプレートにハードコーディングされています。これらの値はテンプレートから構成することができます。これらのプロパティについて詳しくは、HDInsight での Apache Hadoop クラスターの作成に関するページをご覧ください。

注意

指定する値は一意である必要があり、名前付けガイドラインに従う必要があります。テンプレートでは、検証チェックは実行されません。指定した値が既に使用されている場合、またはガイドラインに従ってない場合、テンプレートを送信した後にエラーが発生します。

HDInsight Linux gets started Resource Manager template on portal.

「使用条件」をご確認ください。次に [上記の使用条件に同意する] を選択し、 [購入] を選択します。ご自分のデプロイの進行状況が通知されます。クラスターの作成には約 20 分かかります。

デプロイされているリソースを確認する

クラスターが作成されると、 [リソースに移動] リンクを含むデプロイ成功通知を受け取ります。ご自分の [リソースグループ] ページに、ご自分の新しい HDInsight クラスターと、そのクラスターに関連付けられている既定のストレージが一覧表示されます。各クラスターには、Azure Blob Storage アカウント、Azure Data Lake Storage Gen1、または Azure Data Lake Storage Gen2 依存関係があります。このアカウントを、既定のストレージアカウントと呼びます。 HDInsight クラスターとその既定のストレージアカウントは、同じ Azure リージョンに配置されている必要があります。クラスターを削除しても、ストレージアカウントは削除されません。

注意

その他のクラスター作成方法や、このクイックスタートで使用されているプロパティについては、HDInsight クラスターの作成に関するページを参照してください。

リソースをクリーンアップする

このクイックスタートを完了したら、必要に応じてクラスターを削除できます。 HDInsight を使用すると、データは Azure Storage に格納されるため、クラスターは、使用されていない場合に安全に削除できます。また、HDInsight クラスターは、使用していない場合でも課金されます。クラスターの料金は Storage の料金の何倍にもなるため、クラスターを使用しない場合は削除するのが経済的にも合理的です。

注意

すぐに次のチュートリアルに進み、HDInsight で Hadoop を使用して ETL 操作を実行する方法を学習する場合は、クラスターを実行したままにしておいてかまいません。これは、そのチュートリアルでは Hadoop クラスターを再度作成する必要があるからです。ただし、すぐに次のチュートリアルに進まない場合は、クラスターを今すぐ削除する必要があります。

Azure portal からお使いのクラスターに移動し、 [削除] を選択します。

HDInsight delete cluster from portal.

リソースグループ名を選び、リソースグループページを開いて、 [リソースグループの削除] を選ぶこともできます。リソースグループを削除すると、HDInsight クラスターと既定のストレージアカウントの両方が削除されます。

次のステップ

このクイックスタートでは、ARM テンプレートを使用して HDInsight で Apache Hadoop クラスターを作成する方法を学習しました。次の記事では、HDInsight で Hadoop を使用して抽出、変換、読み込み (ETL) 操作を実行する方法を学習します。

HDInsight で対話型クエリを使用してデータの抽出、変換、読み込みを行う

クイック スタート:ARM テンプレートを使用して Azure HDInsight 内に Apache Hadoop クラスターを作成する