Configure outbound network traffic for Azure HDInsight clusters using Firewall
This article provides the steps for you to secure outbound traffic from your HDInsight cluster using Azure Firewall. The steps below assume that you're configuring an Azure Firewall for an existing cluster. If you're deploying a new cluster and behind a firewall, create your HDInsight cluster and subnet first and then follow the steps in this guide.
Azure HDInsight clusters are normally deployed in your own virtual network. The cluster has dependencies on services outside of that virtual network that require network access to function properly.
There are several dependencies that require inbound traffic. The inbound management traffic can't be sent through a firewall device. The source addresses for this traffic are known and are published here. You can also create Network Security Group (NSG) rules with this information to secure inbound traffic to the clusters.
The HDInsight outbound traffic dependencies are almost entirely defined with FQDNs, which don't have static IP addresses behind them. The lack of static addresses means that Network Security Groups (NSGs) can't be used to lock down the outbound traffic from a cluster. The addresses change often enough that one can't set up rules based on the current name resolution and use that to set up NSG rules.
The solution to securing outbound addresses is to use a firewall device that can control outbound traffic based on domain names. Azure Firewall can restrict outbound HTTP and HTTPS traffic based on the FQDN of the destination or FQDN tags.
Configuring Azure Firewall with HDInsight
A summary of the steps to lock down egress from your existing HDInsight with Azure Firewall are:
- Create a firewall.
- Add application rules to the firewall
- Add network rules to the firewall.
- Create a routing table.
Create new subnet
Create a subnet named AzureFirewallSubnet in the virtual network where your cluster exists.
Create a new firewall for your cluster
Create a firewall named Test-FW01 using the steps in Deploy the firewall from Tutorial: Deploy and configure Azure Firewall using the Azure portal.
Configure the firewall with application rules
Create an application rule collection that allows the cluster to send and receive important communications.
Select the new firewall Test-FW01 from the Azure portal.
Navigate to Settings > Rules > Application rule collection > + Add application rule collection.
On the Add application rule collection screen, provide the following information:
Property Value Name FwAppRule Priority 200 Action Allow
FQDN tags section
Name Source address FQDN tag Notes Rule_1 * WindowsUpdate and HDInsight Required for HDI services
Target FQDNs section
Name Source addresses Protocol:Port Target FQDNS Notes Rule_2 * https:443 login.windows.net Allows Windows login activity Rule_3 * https:443 login.microsoftonline.com Allows Windows login activity Rule_4 * https:443,http:80 storage_account_name.blob.core.windows.net Replace
storage_account_namewith your actual storage account name. If your cluster is backed by WASB, then add a rule for WASB. To use ONLY https connections, make sure "secure transfer required" is enabled on the storage account.
Configure the firewall with network rules
Create the network rules to correctly configure your HDInsight cluster.
Continuing from the prior step, navigate to Network rule collection > + Add network rule collection.
On the Add network rule collection screen, provide the following information:
Property Value Name FwNetRule Priority 200 Action Allow
IP Addresses section
Name Protocol Source addresses Destination addresses Destination ports Notes Rule_1 UDP * * 123 Time service Rule_2 Any * DC_IP_Address_1, DC_IP_Address_2 * If you're using Enterprise Security Package (ESP), then add a network rule in the IP Addresses section that allows communication with AAD-DS for ESP clusters. You can find the IP addresses of the domain controllers on the AAD-DS section in the portal Rule_3 TCP * IP Address of your Data Lake Storage account * If you're using Azure Data Lake Storage, then you can add a network rule in the IP Addresses section to address an SNI issue with ADLS Gen1 and Gen2. This option will route the traffic to firewall, which might result in higher costs for large data loads but the traffic will be logged and auditable in firewall logs. Determine the IP address for your Data Lake Storage account. You can use a powershell command such as
[System.Net.DNS]::GetHostAddresses("STORAGEACCOUNTNAME.blob.core.windows.net")to resolve the FQDN to an IP address.
Rule_4 TCP * * 12000 (Optional) If you're using Log Analytics, then create a network rule in the IP Addresses section to enable communication with your Log Analytics workspace.
Service Tags section
Name Protocol Source Addresses Service Tags Destination Ports Notes Rule_7 TCP * SQL 1433 Configure a network rule in the Service Tags section for SQL that will allow you to log and audit SQL traffic, unless you configured Service Endpoints for SQL Server on the HDInsight subnet, which will bypass the firewall.
Create and configure a route table
Create a route table with the following entries:
All IP addresses from Health and management services: All regions with a next hop type of Internet.
Two IP addresses for the region where the cluster is created from Health and management services: Specific regions with a next hop type of Internet.
One Virtual Appliance route for IP address 0.0.0.0/0 with the next hop being your Azure Firewall private IP address.
For example, to configure the route table for a cluster created in the US region of "East US", use following steps:
Select your Azure firewall Test-FW01. Copy the Private IP address listed on the Overview page. For this example, we'll use a sample address of 10.0.2.4.
Then navigate to All services > Networking > Route tables and Create Route Table.
From your new route, navigate to Settings > Routes > + Add. Add the following routes:
|Route name||Address prefix||Next hop type||Next hop address|
Complete the route table configuration:
Assign the route table you created to your HDInsight subnet by selecting Subnets under Settings.
Select + Associate.
On the Associate subnet screen, select the virtual network that your cluster was created into and the Subnet you used for your HDInsight cluster.
Edge-node or custom application traffic
The above steps will allow the cluster to operate without issues. You still need to configure dependencies to accommodate your custom applications running on the edge-nodes, if applicable.
Application dependencies must be identified and added to the Azure Firewall or the route table.
Routes must be created for the application traffic to avoid asymmetric routing issues.
If your applications have other dependencies, they need to be added to your Azure Firewall. Create Application rules to allow HTTP/HTTPS traffic and Network rules for everything else.
Logging and scale
Azure Firewall can send logs to a few different storage systems. For instructions on configuring logging for your firewall, follow the steps in Tutorial: Monitor Azure Firewall logs and metrics.
Once you've completed the logging setup, if you're logging data to Log Analytics, you can view blocked traffic with a query such as the following:
AzureDiagnostics | where msg_s contains "Deny" | where TimeGenerated >= ago(1h)
Integrating your Azure Firewall with Azure Monitor logs is useful when first getting an application working when you aren't aware of all of the application dependencies. You can learn more about Azure Monitor logs from Analyze log data in Azure Monitor
Access to the cluster
After having the firewall set up successfully, you can use the internal endpoint (
https://CLUSTERNAME-int.azurehdinsight.net) to access the Ambari from inside the VNET.
To use the public endpoint (
https://CLUSTERNAME.azurehdinsight.net) or ssh endpoint (
CLUSTERNAME-ssh.azurehdinsight.net), make sure you have the right routes in the route table and NSG rules to avoid the asymmetric routing issue explained here. Specifically in this case, you need to allow the client IP address in the Inbound NSG rules and also add it to the user-defined route table with the next hop set as
internet. If this isn't set up correctly, you'll see a timeout error.
Configure another network virtual appliance
The following information is only required if you wish to configure a network virtual appliance (NVA) other than Azure Firewall.
The previous instructions help you configure Azure Firewall for restricting outbound traffic from your HDInsight cluster. Azure Firewall is automatically configured to allow traffic for many of the common important scenarios. If you want to use another network virtual appliance, you'll need to manually configure a number of additional features. Keep the following in mind as you configure your network virtual appliance:
- Service Endpoint capable services should be configured with service endpoints.
- IP Address dependencies are for non-HTTP/S traffic (both TCP and UDP traffic).
- FQDN HTTP/HTTPS endpoints can be placed in your NVA device.
- Wildcard HTTP/HTTPS endpoints are dependencies that can vary based on a number of qualifiers.
- Assign the route table that you create to your HDInsight subnet.
Service endpoint capable dependencies
|Azure Active Directory|
IP address dependencies
|*:123||NTP clock check. Traffic is checked at multiple endpoints on port 123|
|IPs published here||These are HDInsight service|
|AAD-DS private IPs for ESP clusters|
|*:16800 for KMS Windows Activation|
|*12000 for Log Analytics|
FQDN HTTP/HTTPS dependencies
The list below only gives a few of the most important FQDNs. You can get the full list of FQDNs for configuring your NVA in this file.