question

DarrenSlaughter-7065 avatar image
1 Vote"
DarrenSlaughter-7065 asked DarrenSlaughter-7065 commented

Cannot communicate with AKS cluster DNS name

Hi. Hoping you can asist with an AKS comms issue. I am new to Kubernetes. I can successfully deploy an AKS private cluster using Terraform, from a self-hosted Azure DevOps agent, but when the Terraform attempts to add Kubernetes namespaces, it fails to connect to the cluster DNS name on port 443, however it can communicate to the private IP address of the cluster on 443.

The Terraform works 100% when run locally, however fails when run from the ADO Agent:
Error: Post "https://<MYCLUSTERNAME>.privatelink.northeurope.azmk8s.io:443/api/v1/namespaces": dial tcp: lookup <MYCLUSTERNAME>.privatelink.northeurope.azmk8s.io: no such host

Test-NetConnection to the FQDN on 443 fails, however, Test-NetConnection to the private IP address on 443 passes

Is there a specific Azure IP range/subnet, that I need to open 443 from the ADO Agent in order to reach the AKS Cluster?

I have tried some manual steps to test connectivity:
az login
-- I can login successfully via CLI
az aks get-credentials --name <MYCLUSTERNAME> --resource-group <CLUSTERRESOURCEGROUP>
-- Credentials successfully loaded into .kube/config
kubectl get nodes
-- I login with the Microsoft Device Code login, but then receive an error: Unable to connect to the server: dial tcp: lookup <MYCLUSTERNAME>.privatelink.northeurope.azmk8s.io: no such host

Any advice will be appreciated.

Thanks
Darren


azure-kubernetes-service
· 4
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Not really a solution, but I managed to get some movement with this by:
Following the creation of the AKS cluster, I would add a host file entry of the AKS Cluster private IP on the Azure DevOps Agent, and then re-run the Terraform pipeline again in order to create namespaces, since then the AKS cluster would be reachable by FQDN.

I am continuing to troubleshoot / seek more automated solutions, that don't involve host files, or multiple pipeline runs...

Thanks
Darren

1 Vote 1 ·
srbose-msft avatar image srbose-msft DarrenSlaughter-7065 ·

@DarrenSlaughter-7065 , Thank you for your response. Please do let us know if need help with the line of solution you are following.

0 Votes 0 ·

@DarrenSlaughter-7065 , I wanted to quickly follow up if there were any updates. Please "Accept as Answer" if it helped, so that it can help others in the community looking for help on similar topics.

0 Votes 0 ·

In order to resolve this issue, one needs to be able to create a custom subdomain for the Private DNS Zone that gets generated.

Currently Azure automatically generates a GUID that it inserts in front of the .privatelink.northeurope.azmk8s.io url. The MS documentation mentions a switch (--fqdn-subdomain) when using AZ CLI, however the same ability does not seem to exist in the AzureRM Terraform provider (v2.60.0)

If a custom domain could be created. then on-prem DNS servers could have conditional forwarders set to point to that custom domain.

With the GUID in the url, this changes everytime a cluster is deployed as well, whereas a custom domain name would be able to persist destroys/deploys (with just a unique AKS cluster name preceeding the domain)

Example:
my-aks-cluster01.mycompanyname.privatelink.northeurope.azmk8s.io
my-aks-cluster02.mycompanyname.privatelink.northeurope.azmk8s.io

0 Votes 0 ·
srbose-msft avatar image
1 Vote"
srbose-msft answered

@DarrenSlaughter-7065 , Thank you for the question.

A simple solution in your situation would be to use AKS Run Command Feature (Preview)

AKS run command allows you to remotely invoke commands in an AKS cluster through the AKS API. This feature provides an API that allows you to, for example, execute just-in-time commands from a remote laptop for a private cluster. This can greatly assist with quick just-in-time access to a private cluster when the client machine is not on the cluster private network while still retaining and enforcing the same RBAC controls and private API server.

Please find the instructions to register the RunCommandPreview Feature here.

Here are a few examples of how to use the feature.


Hope this helps!

Please "Accept as Answer" if it helped, so that it can help others in the community looking for help on similar topics.


5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

DarrenSlaughter-7065 avatar image
0 Votes"
DarrenSlaughter-7065 answered srbose-msft commented

Hi @srbose-msft

I have a new/fresh subscription, and trying to use same Terraform code to deploy an AKS cluster, and hitting the same problem, where the public DNS name: <MYCLUSTERNAME>-SHORTUUID.LONGUUID.privatelink.northeurope.azmk8s.io is not resolvable during the terraform apply, which means the rest of the terraform fails (trying to create namespaces on the cluster, since it cannot resolve with the newly created DNS zone.

I followed your instruction to enable the AKS Run Command Function (as per MS documentation: https://docs.microsoft.com/en-us/azure/aks/private-clusters#aks-run-command-preview). I was able to successfully register the extension:
109015-image.png


However, when I try the simple example command from the MS documentation fails, with:
109093-image.png

Hoping you can advise how I can create the namespaces, during the terraform creation of the cluster.

Thank you



Full error from Terraform here:

 Error: waiting for creation of Managed Kubernetes Cluster "MYCLUSTERNAME" (Resource Group "MYCLUSTERRESOURCEGROUP"): Code="CreateVMSSAgentPoolFailed" Message="Agents are unable to resolve Kubernetes API server name. It's likely custom DNS server is not correctly configured, please see https://aka.ms/aks/private-cluster#hub-and-spoke-with-custom-dns for more information. Details: Code=\"VMExtensionProvisioningError\" Message=\"VM has reported a failure when processing extension 'vmssCSE'. Error message: \\\"Enable failed: failed to execute command: command terminated with exit status=52\\n[stdout]\\n{ \\\"ExitCode\\\": \\\"52\\\", \\\"Output\\\": \\\"Thu Jun 24 13:18:08 UTC 2021,aks-default-11336002-vmss000000\\\\nConnection to mcr.microsoft.com 443 port [tcp/https] succeeded!\\\\n? kubelet.service - Kubelet\\\\n Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: enabled)\\\\n Active: active (running) since Thu 2021-06-24 13:18:29 UTC; 3min 22s ago\\\\n Main PID: 3076 (kubelet)\\\\n Tasks: 13 (limit: 4915)\\\\n CGroup: /system.slice/kubelet.service\\\\n +-3076 /usr/local/bin/kubelet --enable-server --node-labels=kubernetes.azure.com/role=agent,agentpool=default,storageprofile=managed,storagetier=Standard_LRS,kubernetes.azure.com/os-sku=Ubuntu,kubernetes.azure.com/cluster=MYCLUSTERNAME-NODES-RG,kubernetes.azure.com/mode=system,kubernetes.azure.com/node-image-version=AKSUbuntu-1804gen2containerd-2021.06.02 --v=2 --container-runtime=remote --runtime-request-timeout=15m --container-runtime-endpoint=unix:///run/containerd/containerd.sock --volume-plugin-dir=/etc/kubernetes/volumeplugins --address=0.0.0.0 --anonymous-auth=false --authentication-token-webhook=true --authorization-mode=Webhook --azure-container-registry-config=/etc/kubernetes/azure.json --cgroups-per-qos=true --client-ca-file=/etc/kubernetes/certs/ca.crt --cloud-config=/etc/kubernetes/azure.json --cloud-provider=azure --cluster-dns=100.1.0.10 --cluster-domain=cluster.local --dynamic-config-dir=/var/lib/kubelet --enforce-node-allocatable=pods --event-qps=0 --eviction-hard=memory.available<750Mi,nodefs.available<10%!,(MISSING)nodefs.inodesFree<5%!f(MISSING)eature-gates=RotateKubeletServerCertificate=true --image-gc-high-threshold=85 --image-gc-low-threshold=80 --image-pull-progress-deadline=30m --keep-terminated-pod-volumes=false --kube-reserved=cpu=100m,memory=1843Mi --kubeconfig=/var/lib/kubelet/kubeconfig --max-pods=110 --network-plugin=cni --node-status-update-frequency=10s --non-masquerade-cidr=100.0.0.0/16 --pod-infra-container-image=mcr.microsoft.com/oss/kubernetes/pause:3.5 --pod-manifest-path=/etc/kubernetes/manifests --pod-max-pids=-1 --protect-kernel-defaults=true --read-only-port=0 --resolv-conf=/run/systemd/resolve/resolv.conf --rotate-certificates=false --streaming-connection-idle-timeout=4h --tls-cert-file=/etc/kubernetes/certs/kubeletserver.crt --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_128_GCM_SHA256 --tls-private-key-file=/etc/kubernetes/certs/kubeletserver.key\\\\n\\\\nJun 24 13:21:51 aks-default-11336002-vmss000000 kubelet[3076]: E0624 13:21:51.273190 3076 kubelet.go:2209] node \\\\\\\"aks-default-11336002-vmss000000\\\\\\\" not found\\\\nJun 24 13:21:51 aks-default-11336002-vmss000000 kubelet[3076]: E0624 13:21:51.373229 3076 kubelet.go:2209] node \\\\\\\"aks-default-11336002-vmss000000\\\\\\\" not found\\\\nJun 24 13:21:51 aks-default-11336002-vmss000000 kubelet[3076]: E0624 13:21:51.473380 \\\", \\\"Error\\\": \\\"\\\", \\\"ExecDuration\\\": \\\"224\\\" }\\n\\n[stderr]\\n\\\"\\r\\n\\r\\nMore information on troubleshooting is available at https://aka.ms/VMExtensionCSELinuxTroubleshoot \""



image.png (10.7 KiB)
image.png (20.8 KiB)
· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

@DarrenSlaughter-7065 , Can you please check which version of Azure CLI you are using. I will recommend upgrading to the latest version. How-to guide

Terraform output shows VMExtensionProvisioningError and exit status=52. Are you using a custom DNS? This is an expected behavior on private AKS cluster create with custom DNS. When creating the private cluster with custom DNS, a dns zone will be created. The dns zone has to be linked to the vnet, which will happen after the cluster is created. So creating a private cluster with custom DNS will fail at creation time and must be brought back to succeeded state in later PUT operation.

0 Votes 0 ·
LeonardoBispo-1118 avatar image
0 Votes"
LeonardoBispo-1118 answered DarrenSlaughter-7065 commented

@DarrenSlaughter-7065

Are you running the terraform in your local machine?

If so, you must run it inside the private network (using vpn or a vm). It is trying to resolve the cluster fqdn on your local machine (and you should be inside the vnet).

I hope this answer can help you to solve your problem

I spent half day to figure out this was my problem

· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

This is being run from an Azure DevOps build agent (outside of the Azure subscription/private network), as the deployment is automated in an Azure DevOps pipeline. Sadly never got this to work...

0 Votes 0 ·