Tutorial: Create an Apache Kafka REST proxy enabled cluster in HDInsight using Azure CLI

In this tutorial, you learn how to create an Apache Kafka REST proxy enabled cluster in Azure HDInsight using the Azure CLI. Azure HDInsight is a managed, full-spectrum, open-source analytics service for enterprises. Apache Kafka is an open-source, distributed streaming platform. It's often used as a message broker, as it provides functionality similar to a publish-subscribe message queue. Kafka REST Proxy enables you to interact with your Kafka cluster via a REST API over HTTP. The Azure CLI is Microsoft's cross-platform command-line experience for managing Azure resources.

The Apache Kafka API can only be accessed by resources inside the same virtual network. You can access the cluster directly using SSH. To connect other services, networks, or virtual machines to Apache Kafka, you must first create a virtual network and then create the resources within the network. For more information, see Connect to Apache Kafka using a virtual network.

In this tutorial, you learn:

  • Prerequisites for Kafka REST proxy
  • Create an Apache Kafka cluster using Azure CLI

If you don't have an Azure subscription, create a free account before you begin.

Prerequisites

Create an Apache Kafka cluster

  1. Sign in to your Azure subscription.

    az login
    
    # If you have multiple subscriptions, set the one to use
    # az account set --subscription "SUBSCRIPTIONID"
    
  2. Set environment variables. The use of variables in this tutorial is based on Bash. Slight variations will be needed for other environments.

    Variable Description
    resourceGroupName Replace RESOURCEGROUPNAME with the name for your new resource group.
    location Replace LOCATION with a region where the cluster will be created. For a list of valid locations, use the az account list-locations command
    clusterName Replace CLUSTERNAME with a globally unique name for your new cluster.
    storageAccount Replace STORAGEACCOUNTNAME with a name for your new storage account.
    httpPassword Replace PASSWORD with a password for the cluster login, admin.
    sshPassword Replace PASSWORD with a password for the secure shell username, sshuser.
    securityGroupName Replace SECURITYGROUPNAME with the client Microsoft Entra security group name for Kafka REST Proxy. The variable will be passed to the --kafka-client-group-name parameter for az-hdinsight-create.
    securityGroupID Replace SECURITYGROUPID with the client Microsoft Entra security group ID for Kafka REST Proxy. The variable will be passed to the --kafka-client-group-id parameter for az-hdinsight-create.
    storageContainer Storage container the cluster will use, leave as-is for this tutorial. This variable will be set with the name of the cluster.
    workernodeCount Number of worker nodes in the cluster, leave as-is for this tutorial. To guarantee high availability, Kafka requires a minimum of 3 worker nodes
    clusterType Type of HDInsight cluster, leave as-is for this tutorial.
    clusterVersion HDInsight cluster version, leave as-is for this tutorial. Kafka REST Proxy requires a minimum cluster version of 4.0.
    componentVersion Kafka version, leave as-is for this tutorial. Kafka REST Proxy requires a minimum component version of 2.1.

    Update the variables with desired values. Then enter the CLI commands to set the environment variables.

    export resourceGroupName=RESOURCEGROUPNAME
    export location=LOCATION
    export clusterName=CLUSTERNAME
    export storageAccount=STORAGEACCOUNTNAME
    export httpPassword='PASSWORD'
    export sshPassword='PASSWORD'
    export securityGroupName=SECURITYGROUPNAME
    export securityGroupID=SECURITYGROUPID
    
    export storageContainer=$(echo $clusterName | tr "[:upper:]" "[:lower:]")
    export workernodeCount=3
    export clusterType=kafka
    export clusterVersion=4.0
    export componentVersion=kafka=2.1
    
  3. Create the resource group by entering the command below:

     az group create \
        --location $location \
        --name $resourceGroupName
    
  4. Create an Azure Storage account by entering the command below:

    # Note: kind BlobStorage is not available as the default storage account.
    az storage account create \
        --name $storageAccount \
        --resource-group $resourceGroupName \
        --https-only true \
        --kind StorageV2 \
        --location $location \
        --sku Standard_LRS
    
  5. Extract the primary key from the Azure Storage account and store it in a variable by entering the command below:

    export storageAccountKey=$(az storage account keys list \
        --account-name $storageAccount \
        --resource-group $resourceGroupName \
        --query [0].value -o tsv)
    
  6. Create an Azure Storage container by entering the command below:

    az storage container create \
        --name $storageContainer \
        --account-key $storageAccountKey \
        --account-name $storageAccount
    
  7. Create the HDInsight cluster. Before entering the command, note the following parameters:

    1. Required parameters for Kafka clusters:

      Parameter Description
      --type The value must be Kafka.
      --workernode-data-disks-per-node The number of data disks to use per worker node. HDInsight Kafka is only supported with data disks. This tutorial uses a value of 2.
    2. Required parameters for Kafka REST proxy:

      Parameter Description
      --kafka-management-node-size The size of the node. This tutorial uses the value Standard_D4_v2.
      --kafka-client-group-id The client Microsoft Entra security group ID for Kafka REST Proxy. The value is passed from the variable $securityGroupID.
      --kafka-client-group-name The client Microsoft Entra security group name for Kafka REST Proxy. The value is passed from the variable $securityGroupName.
      --version The HDInsight cluster version must be at least 4.0. The value is passed from the variable $clusterVersion.
      --component-version The Kafka version must be at least 2.1. The value is passed from the variable $componentVersion.

      If you would like to create the cluster without REST proxy, eliminate --kafka-management-node-size, --kafka-client-group-id, and --kafka-client-group-name from the az hdinsight create command.

    3. If you have an existing virtual network, add the parameters --vnet-name and --subnet, and their values.

    Enter the following command to create the cluster:

    az hdinsight create \
        --name $clusterName \
        --resource-group $resourceGroupName \
        --type $clusterType \
        --component-version $componentVersion \
        --http-password $httpPassword \
        --http-user admin \
        --location $location \
        --ssh-password $sshPassword \
        --ssh-user sshuser \
        --storage-account $storageAccount \
        --storage-account-key $storageAccountKey \
        --storage-container $storageContainer \
        --version $clusterVersion \
        --workernode-count $workernodeCount \
        --workernode-data-disks-per-node 2 \
        --kafka-management-node-size "Standard_D4_v2" \
        --kafka-client-group-id $securityGroupID \
        --kafka-client-group-name "$securityGroupName"
    

    It may take several minutes for the cluster creation process to complete. Usually around 15.

Clean up resources

After you complete the article, you may want to delete the cluster. With HDInsight, your data is stored in Azure Storage, so you can safely delete a cluster when it isn't in use. You're also charged for an HDInsight cluster, even when it's not in use. Since the charges for the cluster are many times more than the charges for storage, it makes economic sense to delete clusters when they aren't in use.

Enter all or some of the following commands to remove resources:

# Remove cluster
az hdinsight delete \
    --name $clusterName \
    --resource-group $resourceGroupName

# Remove storage container
az storage container delete \
    --account-name $storageAccount  \
    --name $storageContainer

# Remove storage account
az storage account delete \
    --name $storageAccount  \
    --resource-group $resourceGroupName

# Remove resource group
az group delete \
    --name $resourceGroupName

Next steps

Now that you've successfully created an Apache Kafka REST proxy enabled cluster in Azure HDInsight using Azure CLI, use Python code to interact with the REST proxy: