Use Azure Kubernetes Service with Apache Kafka on HDInsight

Learn how to use Azure Kubernetes Service (AKS) with Apache Kafka on HDInsight cluster. The steps in this document use a Node.js application hosted in AKS to verify connectivity with Kafka. This application uses the kafka-node package to communicate with Kafka. It uses Socket.io for event driven messaging between the browser client and the back-end hosted in AKS.

Apache Kafka is an open-source distributed streaming platform that can be used to build real-time streaming data pipelines and applications. Azure Kubernetes Service manages your hosted Kubernetes environment, and makes it quick and easy to deploy containerized applications. Using an Azure Virtual Network, you can connect the two services.

Note

The focus of this document is on the steps required to enable Azure Kubernetes Service to communicate with Kafka on HDInsight. The example itself is just a basic Kafka client to demonstrate that the configuration works.

Prerequisites

This document assumes that you are familiar with creating and using the following Azure services:

  • Kafka on HDInsight
  • Azure Kubernetes Service
  • Azure Virtual Networks

This document also assumes that you have walked through the Azure Kubernetes Service tutorial. This tutorial creates a container service, creates a Kubernetes cluster, a container registry, and configures the kubectl utility.

Architecture

Network topology

Both HDInsight and AKS use an Azure Virtual Network as a container for compute resources. To enable communication between HDInsight and AKS, you must enable communication between their networks. The steps in this document use Virtual Network Peering to the networks. Other connections, such as VPN, should also work. For more information on peering, see the Virtual network peering document.

The following diagram illustrates the network topology used in this document:

HDInsight in one virtual network, AKS in another, and the networks connected using peering

Important

Name resolution is not enabled between the peered networks, so IP addressing is used. By default, Kafka on HDInsight is configured to return host names instead of IP addresses when clients connect. The steps in this document modify Kafka to use IP advertising instead.

Create an Azure Kubernetes Service (AKS)

If you do not already have an AKS cluster, use one of the following documents to learn how to create one:

Note

AKS creates a virtual network during installation. This network is peered to the one created for HDInsight in the next section.

Configure virtual network peering

  1. From the Azure portal, select Resource groups, and then find the resource group that contains the virtual network for your AKS cluster. The resource group name is MC_<resourcegroup>_<akscluster>_<location>. The resourcegroup and akscluster entries are the name of the resource group you created the cluster in, and the name of the cluster. The location is the location that the cluster was created in.

  2. In the resource group, select the Virtual network resource.

  3. Select Address space. Note the address space listed.

  4. To create a virtual network for HDInsight, select + Create a resource, Networking, and then Virtual network.

    Important

    When entering the values for the new virtual network, you must use an address space that does not overlap the one used by the AKS cluster network.

    Use the same Location for the virtual network that you used for the AKS cluster.

    Wait until the virtual network has been created before going to the next step.

  5. To configure the peering between the HDInsight network and the AKS cluster network, select the virtual network and then select Peerings. Select + Add and use the following values to populate the form:

    • Name: Enter a unique name for this peering configuration.

    • Virtual network: Use this field to select the virtual network for the AKS cluster.

      Leave all other fields at the default value, then select OK to configure the peering.

  6. To configure the peering between the AKS cluster network and the HDInsight network, select the AKS cluster virtual network, and then select Peerings. Select + Add and use the following values to populate the form:

    • Name: Enter a unique name for this peering configuration.

    • Virtual network: Use this field to select the virtual network for the HDInsight cluster.

      Leave all other fields at the default value, then select OK to configure the peering.

Install Apache Kafka on HDInsight

When creating the Kafka on HDInsight cluster, you must join the virtual network created earlier for HDInsight. For more information on creating a Kafka cluster, see the Create an Apache Kafka cluster document.

Important

When creating the cluster, you must use the Advanced settings to join the virtual network that you created for HDInsight.

Configure Apache Kafka IP Advertising

Use the following steps to configure Kafka to advertise IP addresses instead of domain names:

  1. Using a web browser, go to https://CLUSTERNAME.azurehdinsight.net. Replace CLUSTERNAME with the name of the Kafka on HDInsight cluster.

    When prompted, use the HTTPS user name and password for the cluster. The Ambari Web UI for the cluster is displayed.

  2. To view information on Kafka, select Kafka from the list on the left.

    Service list with Kafka highlighted

  3. To view Kafka configuration, select Configs from the top middle.

    Configs links for Kafka

  4. To find the kafka-env configuration, enter kafka-env in the Filter field on the upper right.

    Kafka configuration, for kafka-env

  5. To configure Kafka to advertise IP addresses, add the following text to the bottom of the kafka-env-template field:

    # Configure Kafka to advertise IP addresses instead of FQDN
    IP_ADDRESS=$(hostname -i)
    echo advertised.listeners=$IP_ADDRESS
    sed -i.bak -e '/advertised/{/advertised@/!d;}' /usr/hdp/current/kafka-broker/conf/server.properties
    echo "advertised.listeners=PLAINTEXT://$IP_ADDRESS:9092" >> /usr/hdp/current/kafka-broker/conf/server.properties
    
  6. To configure the interface that Kafka listens on, enter listeners in the Filter field on the upper right.

  7. To configure Kafka to listen on all network interfaces, change the value in the listeners field to PLAINTEXT://0.0.0.0:9092.

  8. To save the configuration changes, use the Save button. Enter a text message describing the changes. Select OK once the changes have been saved.

    Save configuration button

  9. To prevent errors when restarting Kafka, use the Service Actions button and select Turn On Maintenance Mode. Select OK to complete this operation.

    Service actions, with turn on maintenance highlighted

  10. To restart Kafka, use the Restart button and select Restart All Affected. Confirm the restart, and then use the OK button after the operation has completed.

    Restart button with restart all affected highlighted

  11. To disable maintenance mode, use the Service Actions button and select Turn Off Maintenance Mode. Select OK to complete this operation.

Test the configuration

At this point, Kafka and Azure Kubernetes Service are in communication through the peered virtual networks. To test this connection, use the following steps:

  1. Create a Kafka topic that is used by the test application. For information on creating Kafka topics, see the Create an Apache Kafka cluster document.

  2. Download the example application from https://github.com/Blackmist/Kafka-AKS-Test.

  3. Edit the index.js file and change the following lines:

    • var topic = 'mytopic': Replace mytopic with the name of the Kafka topic used by this application.

    • var brokerHost = '176.16.0.13:9092: Replace 176.16.0.13 with the internal IP address of one of the broker hosts for your cluster.

      To find the internal IP address of the broker hosts (workernodes) in the cluster, see the Apache Ambari REST API document. Pick IP address of one of the entries where the domain name begins with wn.

  4. From a command line in the src directory, install dependencies and use Docker to build an image for deployment:

    docker build -t kafka-aks-test .
    

    Note

    Packages required by this application are checked into the repository, so you do not need to use the npm utility to install them.

  5. Log in to your Azure Container Registry (ACR) and find the loginServer name:

    az acr login --name <acrName>
    az acr list --resource-group myResourceGroup --query "[].{acrLoginServer:loginServer}" --output table
    

    Note

    If you don't know your Azure Container Registry name, or are unfamiliar with using the Azure CLI to work with the Azure Kubernetes Service, see the AKS tutorials.

  6. Tag the local kafka-aks-test image with the loginServer of your ACR. Also add :v1 to the end to indicate the image version:

    docker tag kafka-aks-test <acrLoginServer>/kafka-aks-test:v1
    
  7. Push the image to the registry:

    docker push <acrLoginServer>/kafka-aks-test:v1
    

    This operation takes several minutes to complete.

  8. Edit the Kubernetes manifest file (kafka-aks-test.yaml) and replace microsoft with the ACR loginServer name retrieved in step 4.

  9. Use the following command to deploy the application settings from the manifest:

    kubectl create -f kafka-aks-test.yaml
    
  10. Use the following command to watch for the EXTERNAL-IP of the application:

    kubectl get service kafka-aks-test --watch
    

    Once an external IP address is assigned, use CTRL + C to exit the watch

  11. Open a web browser and enter the external IP address for the service. You arrive at a page similar to the following image:

    Image of the web page

  12. Enter text into the field and then select the Send button. The data is sent to Kafka. Then the Kafka consumer in the application reads the message and adds it to the Messages from Kafka section.

    Warning

    You may receive multiple copies of a message. This problem usually happens when you refresh your browser after connecting, or open multiple browser connections to the application.

Next steps

Use the following links to learn how to use Apache Kafka on HDInsight: