question

SapozhnikovPavel-6991 avatar image
0 Votes"
SapozhnikovPavel-6991 asked srbose-msft edited

DNS doesn't remove not ready pod in AKS with Azure CNI enabled

How does AKS make not ready pod unavailable to accept requests into it? It only works if you have a service in front of that deployment correct?

I'd like to start this off by trying to explain what I had noticed in aks that is not configured with azure cni and then go on to explain what I have been seeing in aks with azure cni enabled and why the differences.

In AKS without cni enabled if I execute a curl on url on a not ready pod behind a service like this curl -I some-pod.some-service.some-namespace.svc.cluster.local:8080 what I get in the response is unresolvable hostname or something like that. Which means in my understanding that DNS doesn't have this entry. This is how in normal way aks handles not ready pods to not receives requests.

In AKS with azure cni enabled if I execute the same request on a not ready pod it is able to resolve the hostname and able to send request into the pod. However, there's one caveat is that when I try to execute a request through external private ip of that service that request doesn't reach the not ready pod which that is to be expected and seems to work right. But again when I try to execute a request like I mentioned above curl -I some-pod.some-service.some-namespace.svc.cluster.local:8080 that works but it shouldn't. Why does DNS in the case of azure cni have that value?

Is there anything I can do to configure azure cni to behave more like a default behavior of AKS where a curl request like that either will not resolve that hostname or will refuse the connection or something?

azure-kubernetes-service
· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

@SapozhnikovPavel-6991 , thank you for your question. Can you please confirm if you mean a not ready pod as a Pod on which Readiness Probe has failed/is failing?

One key observation is that the DNS records for general Pods in a Kubernetes cluster has the format pod-ip-address.my-namespace.pod.cluster-domain.example. For example, if a pod in the default namespace has the IP address 172.17.0.3, and the domain name for your cluster is cluster.local, then the Pod has a DNS name:

 172-17-0-3.default.pod.cluster.local

Any pods created by a Deployment or DaemonSet exposed by a Service have the following DNS resolution available:

 pod-ip-address.deployment-name.my-namespace.svc.cluster-domain.example.

[Reference]

0 Votes 0 ·

1 Answer

srbose-msft avatar image
0 Votes"
srbose-msft answered srbose-msft edited

@SapozhnikovPavel-6991 , Thank you for your question.

Assuming that not ready pod refer to pods with Readiness Probe failing. The kubelet uses readiness probes to know when a container is ready to start accepting traffic. A Pod is considered ready when all of its containers are ready. One use of this signal is to control which Pods are used as backends for Services. When a Pod is not ready, it is removed from Service load balancers. [Reference]

However, the logic determining the readiness of the pod might or might not have anything to do with whether the pod can serve requests and depends completely on the user.

For instance with a Pod having the following manifest:

 apiVersion: v1
 kind: Pod
 metadata:
   labels:
     test: readiness
   name: readiness-pod
 spec:
   containers:
   - name: readiness-container
     image: nginx
     readinessProbe:
       exec:
         command:
         - cat
         - /tmp/healthy
       initialDelaySeconds: 5
       periodSeconds: 5

readiness is decided based on the existence of the file /tmp/healthy irrespective of whether nginx serves the application. So on running the application and exposing it using a service readiness-svc on k run -:

 kubectl exec readiness-pod -- /bin/bash -c 'if [ -f /tmp/healthy ]; then echo "/tmp/healthy file is present";else echo "/tmp/healthy file is absent";fi'
 /tmp/healthy file is absent

 kubectl get pods -o wide
 NAME            READY   STATUS    RESTARTS   AGE    IP            NODE                                NOMINATED NODE   READINESS GATES
 readiness-pod   0/1     Running   0          11m    10.240.0.28   aks-nodepool1-29819654-vmss000000   <none>           <none>
 source-pod      1/1     Running   0          6h8m   10.240.0.27   aks-nodepool1-29819654-vmss000000   <none>           <none>

 kubectl describe svc readiness-svc
 Name:              readiness-svc
 Namespace:         default
 Labels:            test=readiness
 Annotations:       <none>
 Selector:          test=readiness
 Type:              ClusterIP
 IP Family Policy:  SingleStack
 IP Families:       IPv4
 IP:                10.0.23.194
 IPs:               10.0.23.194
 Port:              <unset>  80/TCP
 TargetPort:        80/TCP
 Endpoints:
 Session Affinity:  None
 Events:            <none>

 kubectl exec -it source-pod -- bash
 root@source-pod:/# curl -I readiness-svc.default.svc.cluster.local:80
 curl: (7) Failed to connect to readiness-svc.default.svc.cluster.local port 80: Connection refused
 root@source-pod:/# curl -I 10-240-0-28.default.pod.cluster.local:80
 HTTP/1.1 200 OK
 Server: nginx/1.21.3
 Date: Mon, 13 Sep 2021 14:50:17 GMT
 Content-Type: text/html
 Content-Length: 615
 Last-Modified: Tue, 07 Sep 2021 15:21:03 GMT
 Connection: keep-alive
 ETag: "6137835f-267"
 Accept-Ranges: bytes

Thus, we can see that when we try to connect from source-pod to the service readiness-svc.default.svc.cluster.local on port 80, connection is refused. This is because the kubelet did not find the /tmp/healthy file in the readiness-pod container to perform a cat operation, consequently marking the Pod readiness-pod not ready to serve traffic and removing it from the backend of the Service readiness-svc. However, the nginx server on the pod can still serve a web application and it will continue to do so if you connect directly to the pod.

Readiness probe failures of containers do not remove the DNS records of Pods. The DNS records of a Pod shares its lifespan with the Pod itself.

This behavior is characteristic of Kubernetes and does not change with network plugins. We have attempted to reproduce the issue and have observed same behavior with AKS clusters using kubenet and Azure CNI network plugins.


Hope this helps.

Please "Accept as Answer" if it helped, so that it can help others in the community looking for help on similar topics.





5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.