Q: What type of authentication does HDInsight support for Apache Kafka?

Using Enterprise Security Package (ESP) , you can get topic-level security for their Kafka clusters. See Tutorial: Configure Apache Kafka policies in HDInsight with Enterprise Security Package (Preview) , for more information.

Question 1

What Kafka versions are supported by HDInsight?

Accepted Answer

Find more information about HDInsight officially supported component versions in What are the Apache Hadoop components and versions available with HDInsight?. We recommend always using the latest version to ensure the best possible performance and user experience.

Question 2

What resources are provided in an HDInsight Kafka cluster and what resources am I charged for?

Accepted Answer

A HDInsight Kafka cluster includes the following resources:

Head nodes
Zookeeper nodes
Broker (worker) nodes
Azure Managed Disks attached to the broker nodes
Gateway nodes

All of these resources are charged based on our HDInsight pricing model, except for gateway nodes. You are not charged for gateway nodes.

For a more detailed description of various node types, see Azure HDInsight virtual network architecture. Pricing is based on per minute node usage. Prices vary depending on node size, number of nodes, type of managed disk used, and region.

Question 3

Do Apache Kafka APIs work with HDInsight?

Accepted Answer

Yes, HDInsight uses native Kafka APIs. Your client application code doesn't need to change. See Tutorial: Use the Apache Kafka Producer and Consumer APIs to see how you can use Java-based producer/consumer APIs with your cluster.

Question 4

Can I change cluster configurations?

Accepted Answer

Yes, through the Ambari portal. Each component in the portal has a configs section, which can be used to change component configurations. Some changes may require broker restarts.

Question 5

What type of authentication does HDInsight support for Apache Kafka?

Accepted Answer

Using Enterprise Security Package (ESP), you can get topic-level security for their Kafka clusters. See Tutorial: Configure Apache Kafka policies in HDInsight with Enterprise Security Package (Preview), for more information.

Question 6

Is my data encrypted? Can I use my own keys?

Accepted Answer

All Kafka messages on the managed disks are encrypted with Azure Storage Service Encryption (SSE). Data-in-transit (for example, data being transmitted from clients to brokers and the other way around) isn't encrypted by default. It's possible to encrypt such traffic by setting up TLS on your own. Additionally, HDInsight allows you to manage their own keys to encrypt the data at rest. See Customer-managed key disk encryption, for more information.

Question 7

How do I connect clients to my cluster?

Accepted Answer

For Kafka clients to communicate with Kafka brokers, they must be able to reach the brokers over the network. For HDInsight clusters, the Virtual Network (VNet) is the security boundary. Hence, the easiest way to connect clients to your HDInsight cluster is to create clients within the same VNet as the cluster. Other scenarios include:

To connect clients in a different Azure VNet – Peer the cluster VNet and the client VNet and configure the cluster for IP Advertising. When using IP advertising, Kafka clients must use Broker IP addresses to connect with the brokers, instead of Fully Qualified Domain Names (FQDNs).
Connecting on-premises clients – Using a VPN network and setting up custom DNS servers as described in Plan a virtual network for Azure HDInsight.
Creating a public endpoint for your Kafka service – If your enterprise security requirements allow it, you can deploy a public endpoint for your Kafka brokers, or a self-managed open-source REST end point with a public endpoint.

Question 8

Can I add more disk space on an existing cluster?

Accepted Answer

To increase the amount of space available for Kafka messages, you can increase the number of nodes. Currently, adding more disks to an existing cluster isn't supported.

Question 9

Can a Kafka cluster work with Databricks?

Accepted Answer

Yes, Kafka clusters can work with Databricks so long as they are in the same VNet. To use a Kafka cluster with Databricks, create a VNet with an HDInsight Kafka cluster in it, then specify that VNet when you create your Databricks workspace and use VNet injection. For more information, see Deploy Azure Databricks in your Azure Virtual Network (VNet Injection). You need to provide the bootstrap broker names of the Kafka cluster when creating the Databricks workspace. For information on retrieving the Kafka broker names, see Get the Apache Zookeeper and Broker host information.

Question 10

How can I have maximum data durability?

Accepted Answer

Data durability allows you to achieve the lowest risk of message loss. In order to achieve maximum data durability, we recommend the following settings:

use a minimum replication factor of 3 in most regions
use a minimum replication factor of 4 in regions with only two fault domains
disable unclean leader elections
set min.insync.replicas to 2 or more - this changes the number of replicas, which must be completely in sync with the leader before a write can proceed
set the acks property to all - this property requires all replicas to acknowledge all messages

Configuring Kafka for higher data consistency affects the availability of brokers to produce requests.

Question 11

Can I replicate my data to multiple clusters?

Accepted Answer

Yes, data can be replicated to multiple clusters using Kafka MirrorMaker. See details on setting up MirrorMaker can be found in Mirror Apache Kafka topics. Additionally, there are other self-managed open-source technologies and vendors that can help achieve replication to multiple clusters such as Brooklin.

Question 12

Can I upgrade my cluster? How should I upgrade my cluster?

Accepted Answer

We don't currently support in-place cluster version upgrades. To update your cluster to a higher Kafka version, create a new cluster with the version that you want and migrate your Kafka clients to use the new cluster.

Question 13

How do I monitor my Kafka cluster?

Accepted Answer

Use Azure monitor to analyze your Kafka logs.

Frequently asked questions about Apache Kafka in Azure HDInsight