question

JimmyHeeWoonSiong-6455 avatar image
1 Vote"
JimmyHeeWoonSiong-6455 asked AnagnostopoulosTassos-4649 commented

Connect Openshift Cluster to Azure Arc. Secret "kube-aad-proxy-certificate" not found

Hi guys,

I have a ready redhat openshift cluster and try to connect openshift cluster to Azure Arc. I have tried to follow the guide provided in https://docs.microsoft.com/en-us/azure/azure-arc/kubernetes/quickstart-connect-cluster?tabs=azure-cli and successfully create providers & resource group.

However during I execute the command "az connectedk8s connect" and encounter following error:

174035-image.png

After get deployment status of kubernetes pods, I found one of the kubernetes nodes unable to create successfully:

 [crc@crc ~]$ kubectl get pod --namespace azure-arc
 NAME                                         READY   STATUS              RESTARTS      AGE
 cluster-metadata-operator-74c5b94d47-jz2mf   2/2     Running             0             6m41s
 clusterconnect-agent-57496ddf98-pxdwb        2/3     CrashLoopBackOff    6 (45s ago)   6m40s
 clusteridentityoperator-5595dbf759-npgj7     2/2     Running             0             6m40s
 config-agent-85745b6f89-ktcgn                2/2     Running             0             6m40s
 controller-manager-78cf8484c4-bkdrz          2/2     Running             0             6m40s
 extension-manager-599cd7b644-c9sqw           2/2     Running             0             6m40s
 flux-logs-agent-6cbd59f69d-8sqpj             1/1     Running             0             6m40s
 kube-aad-proxy-6ddf6b7b6d-2tpxm              0/2     ContainerCreating   0             6m41s
 metrics-agent-5d985f9b9c-t6pjd               2/2     Running             0             6m41s
 resource-sync-agent-8444f5fc44-zlx8q         2/2     Running             0             6m40s

After I get details of the error, I found pods creation error due to secret "kube-aad-proxy-certificate" not found with following events:

 [crc@crc ~]$ kubectl describe pod kube-aad-proxy-6ddf6b7b6d-2tpxm
 Error from server (NotFound): pods "kube-aad-proxy-6ddf6b7b6d-2tpxm" not found
 [crc@crc ~]$ kubectl describe pod kube-aad-proxy-6ddf6b7b6d-2tpxm -n azure-arc
 Name:           kube-aad-proxy-6ddf6b7b6d-2tpxm
 Namespace:      azure-arc
 Priority:       0
 Node:           crc-x4qnm-master-0/192.168.126.11
 Start Time:     Mon, 14 Feb 2022 20:44:22 +0800
 Labels:         app.kubernetes.io/component=kube-aad-proxy
                 app.kubernetes.io/name=azure-arc-k8s
                 pod-template-hash=6ddf6b7b6d
 Annotations:    checksum/proxysecret: 316deeb28892b1cdebfe5c12c2cd620b5b8f29289c1ffe3d4f5fc1b2e6a4ea7d
                 openshift.io/scc: kube-aad-proxy-scc
                 prometheus.io/port: 8080
                 prometheus.io/scrape: true
 Status:         Pending
 IP:             
 IPs:            <none>
 Controlled By:  ReplicaSet/kube-aad-proxy-6ddf6b7b6d
 Containers:
   kube-aad-proxy:
     Container ID:  
     Image:         mcr.microsoft.com/azurearck8s/kube-aad-proxy:1.6.1-preview
     Image ID:      
     Ports:         8443/TCP, 8080/TCP
     Host Ports:    0/TCP, 0/TCP
     Args:
       run
       --secure-port=8443
       --tls-cert-file=/etc/kube-aad-proxy/tls.crt
       --tls-private-key-file=/etc/kube-aad-proxy/tls.key
       --azure.client-id=6256c85f-0aad-4d50-b960-e6e9b21efe35
       --azure.tenant-id=c58bdaa9-7ab0-40c5-9b0f-64b2c1fe2ef1
       --azure.enforce-PoP=true
       --azure.skip-host-check=false
       -v=info
       --azure.environment=AZUREPUBLICCLOUD
     State:          Waiting
       Reason:       ContainerCreating
     Ready:          False
     Restart Count:  0
     Limits:
       cpu:     100m
       memory:  350Mi
     Requests:
       cpu:      10m
       memory:   20Mi
     Readiness:  http-get http://:8080/readiness delay=10s timeout=1s period=15s #success=1 #failure=3
     Environment Variables from:
       azure-clusterconfig  ConfigMap  Optional: false
     Environment:           <none>
     Mounts:
       /etc/kube-aad-proxy from kube-aad-proxy-tls (ro)
       /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-khrkl (ro)
   fluent-bit:
     Container ID:   
     Image:          mcr.microsoft.com/azurearck8s/fluent-bit:1.6.1
     Image ID:       
     Port:           2020/TCP
     Host Port:      0/TCP
     State:          Waiting
       Reason:       ContainerCreating
     Ready:          False
     Restart Count:  0
     Limits:
       cpu:     20m
       memory:  100Mi
     Requests:
       cpu:     5m
       memory:  25Mi
     Environment Variables from:
       azure-clusterconfig  ConfigMap  Optional: false
     Environment:
       POD_NAME:    kube-aad-proxy-6ddf6b7b6d-2tpxm (v1:metadata.name)
       AGENT_TYPE:  ConnectAgent
       AGENT_NAME:  kube-aad-proxy
     Mounts:
       /fluent-bit/etc/ from fluentbit-clusterconfig (rw)
       /var/lib/docker/containers from varlibdockercontainers (ro)
       /var/log from varlog (ro)
       /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-khrkl (ro)
 Conditions:
   Type              Status
   Initialized       True 
   Ready             False 
   ContainersReady   False 
   PodScheduled      True 
 Volumes:
   kube-aad-proxy-tls:
     Type:        Secret (a volume populated by a Secret)
     SecretName:  kube-aad-proxy-certificate
     Optional:    false
   varlog:
     Type:          HostPath (bare host directory volume)
     Path:          /var/log
     HostPathType:  
   varlibdockercontainers:
     Type:          HostPath (bare host directory volume)
     Path:          /var/lib/docker/containers
     HostPathType:  
   fluentbit-clusterconfig:
     Type:      ConfigMap (a volume populated by a ConfigMap)
     Name:      azure-fluentbit-config
     Optional:  false
   kube-api-access-khrkl:
     Type:                    Projected (a volume that contains injected data from multiple sources)
     TokenExpirationSeconds:  3607
     ConfigMapName:           kube-root-ca.crt
     ConfigMapOptional:       <nil>
     DownwardAPI:             true
     ConfigMapName:           openshift-service-ca.crt
     ConfigMapOptional:       <nil>
 QoS Class:                   Burstable
 Node-Selectors:              kubernetes.io/arch=amd64
                              kubernetes.io/os=linux
 Tolerations:                 node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                              node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                              node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
 Events:
   Type     Reason       Age                   From               Message
   ----     ------       ----                  ----               -------
   Normal   Scheduled    17m                   default-scheduler  Successfully assigned azure-arc/kube-aad-proxy-6ddf6b7b6d-2tpxm to crc-x4qnm-master-0
   Warning  FailedMount  15m                   kubelet            Unable to attach or mount volumes: unmounted volumes=[kube-aad-proxy-tls], unattached volumes=[varlibdockercontainers fluentbit-clusterconfig kube-aad-proxy-tls kube-api-access-khrkl varlog]: timed out waiting for the condition
   Warning  FailedMount  8m32s                 kubelet            Unable to attach or mount volumes: unmounted volumes=[kube-aad-proxy-tls], unattached volumes=[fluentbit-clusterconfig kube-aad-proxy-tls kube-api-access-khrkl varlog varlibdockercontainers]: timed out waiting for the condition
   Warning  FailedMount  4m2s (x3 over 13m)    kubelet            Unable to attach or mount volumes: unmounted volumes=[kube-aad-proxy-tls], unattached volumes=[kube-aad-proxy-tls kube-api-access-khrkl varlog varlibdockercontainers fluentbit-clusterconfig]: timed out waiting for the condition
   Warning  FailedMount  107s (x2 over 6m18s)  kubelet            Unable to attach or mount volumes: unmounted volumes=[kube-aad-proxy-tls], unattached volumes=[kube-api-access-khrkl varlog varlibdockercontainers fluentbit-clusterconfig kube-aad-proxy-tls]: timed out waiting for the condition
   Warning  FailedMount  59s (x16 over 17m)    kubelet            MountVolume.SetUp failed for volume "kube-aad-proxy-tls" : secret "kube-aad-proxy-certificate" not found

Add on, I attached details for clusterconnect-agent-xxx for further troubleshooting:

 [crc@crc ~]$ kubectl describe pod clusterconnect-agent-57496ddf98-wxwl4 -n azure-arc
  Name:         clusterconnect-agent-57496ddf98-wxwl4
  Namespace:    azure-arc
  Priority:     0
  Node:         crc-x4qnm-master-0/192.168.126.11
  Start Time:   Wed, 16 Feb 2022 15:49:16 +0800
  Labels:       app.kubernetes.io/component=clusterconnect-agent
                app.kubernetes.io/name=azure-arc-k8s
                pod-template-hash=57496ddf98
  Annotations:  checksum/proxysecret: 316deeb28892b1cdebfe5c12c2cd620b5b8f29289c1ffe3d4f5fc1b2e6a4ea7d
                k8s.v1.cni.cncf.io/network-status:
                  [{
                      "name": "openshift-sdn",
                      "interface": "eth0",
                      "ips": [
                          "10.217.0.180"
                      ],
                      "default": true,
                      "dns": {}
                  }]
                k8s.v1.cni.cncf.io/networks-status:
                  [{
                      "name": "openshift-sdn",
                      "interface": "eth0",
                      "ips": [
                          "10.217.0.180"
                      ],
                      "default": true,
                      "dns": {}
                  }]
                openshift.io/scc: kube-aad-proxy-scc
                prometheus.io/port: 8080
                prometheus.io/scrape: true
  Status:       Running
  IP:           10.217.0.180
  IPs:
    IP:           10.217.0.180
  Controlled By:  ReplicaSet/clusterconnect-agent-57496ddf98
  Containers:
    clusterconnect-agent:
      Container ID:   cri-o://d724fea24e4f54d6f619684ad0c7c705bc83978aa272c06962225db6841091cf
      Image:          mcr.microsoft.com/azurearck8s/clusterconnect-agent:1.6.1
      Image ID:       mcr.microsoft.com/azurearck8s/clusterconnect-agent@sha256:58a223db621a78d837b144d8d50f2faa8af65f2a8f46f24a3fc331deba28c33c
      Port:           <none>
      Host Port:      <none>
      State:          Waiting
        Reason:       CrashLoopBackOff
      Last State:     Terminated
        Reason:       Error
        Exit Code:    137
        Started:      Wed, 16 Feb 2022 16:00:19 +0800
        Finished:     Wed, 16 Feb 2022 16:00:19 +0800
      Ready:          False
      Restart Count:  7
      Environment Variables from:
        azure-clusterconfig  ConfigMap  Optional: false
      Environment:
        CONNECT_DP_ENDPOINT_OVERRIDE:       
        PROXY_VERSION:                      v2
        NOTIFICATION_DP_ENDPOINT_OVERRIDE:  
        TARGET_SERVICE_HOST:                KUBEAADPROXY_SERVICE_HOST
        TARGET_SERVICE_PORT:                KUBEAADPROXY_SERVICE_PORT
        KUBEAADPROXY_SERVICE_HOST:          kube-aad-proxy.azure-arc
        KUBEAADPROXY_SERVICE_PORT:          443
      Mounts:
        /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-d22f5 (ro)
    fluent-bit:
      Container ID:   cri-o://945fac844efcb50278f4b64554ae1af8efd77fccc22e6bf1f03b0af1125c8ba9
      Image:          mcr.microsoft.com/azurearck8s/fluent-bit:1.6.1
      Image ID:       mcr.microsoft.com/azurearck8s/fluent-bit@sha256:a60b89ca44e1b70f205ba21920b867a000828df42ba83bde343fc3e9eed0825c
      Port:           2020/TCP
      Host Port:      0/TCP
      State:          Running
        Started:      Wed, 16 Feb 2022 15:49:20 +0800
      Ready:          True
      Restart Count:  0
      Limits:
        cpu:     20m
        memory:  100Mi
      Requests:
        cpu:     5m
        memory:  25Mi
      Environment Variables from:
        azure-clusterconfig  ConfigMap  Optional: false
      Environment:
        POD_NAME:    clusterconnect-agent-57496ddf98-wxwl4 (v1:metadata.name)
        AGENT_TYPE:  ConnectAgent
        AGENT_NAME:  ClusterConnectAgent
      Mounts:
        /fluent-bit/etc/ from fluentbit-clusterconfig (rw)
        /var/lib/docker/containers from varlibdockercontainers (ro)
        /var/log from varlog (ro)
        /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-d22f5 (ro)
    clusterconnectservice-operator:
      Container ID:   cri-o://4066bf63c6a5f0f38928992986405127fcc8c76e6ba76f9fe501907e5600c1e4
      Image:          mcr.microsoft.com/azurearck8s/clusterconnectservice-operator:1.6.1
      Image ID:       mcr.microsoft.com/azurearck8s/clusterconnectservice-operator@sha256:6d8cc5f1798441ae322c5989dfdc34a5702ce0a8ca569926b1274aa147e66da0
      Port:           9443/TCP
      Host Port:      0/TCP
      State:          Running
        Started:      Wed, 16 Feb 2022 15:49:20 +0800
      Ready:          True
      Restart Count:  0
      Limits:
        cpu:     100m
        memory:  400Mi
      Requests:
        cpu:     10m
        memory:  20Mi
      Environment Variables from:
        azure-clusterconfig  ConfigMap  Optional: false
      Environment:           <none>
      Mounts:
        /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-d22f5 (ro)
  Conditions:
    Type              Status
    Initialized       True 
    Ready             False 
    ContainersReady   False 
    PodScheduled      True 
  Volumes:
    varlog:
      Type:          HostPath (bare host directory volume)
      Path:          /var/log
      HostPathType:  
    varlibdockercontainers:
      Type:          HostPath (bare host directory volume)
      Path:          /var/lib/docker/containers
      HostPathType:  
    fluentbit-clusterconfig:
      Type:      ConfigMap (a volume populated by a ConfigMap)
      Name:      azure-fluentbit-config
      Optional:  false
    kube-api-access-d22f5:
      Type:                    Projected (a volume that contains injected data from multiple sources)
      TokenExpirationSeconds:  3607
      ConfigMapName:           kube-root-ca.crt
      ConfigMapOptional:       <nil>
      DownwardAPI:             true
      ConfigMapName:           openshift-service-ca.crt
      ConfigMapOptional:       <nil>
  QoS Class:                   Burstable
  Node-Selectors:              kubernetes.io/arch=amd64
                               kubernetes.io/os=linux
  Tolerations:                 node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                               node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                               node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
  Events:
    Type     Reason          Age                 From               Message
    ----     ------          ----                ----               -------
    Normal   Scheduled       11m                 default-scheduler  Successfully assigned azure-arc/clusterconnect-agent-57496ddf98-wxwl4 to crc-x4qnm-master-0
    Normal   AddedInterface  11m                 multus             Add eth0 [10.217.0.180/23] from openshift-sdn
    Normal   Pulled          11m                 kubelet            Container image "mcr.microsoft.com/azurearck8s/fluent-bit:1.6.1" already present on machine
    Normal   Pulled          11m                 kubelet            Container image "mcr.microsoft.com/azurearck8s/clusterconnectservice-operator:1.6.1" already present on machine
    Normal   Created         11m                 kubelet            Created container clusterconnectservice-operator
    Normal   Started         11m                 kubelet            Started container clusterconnectservice-operator
    Normal   Created         11m                 kubelet            Created container fluent-bit
    Normal   Started         11m                 kubelet            Started container fluent-bit
    Normal   Pulled          10m (x4 over 11m)   kubelet            Container image "mcr.microsoft.com/azurearck8s/clusterconnect-agent:1.6.1" already present on machine
    Normal   Created         10m (x4 over 11m)   kubelet            Created container clusterconnect-agent
    Normal   Started         10m (x4 over 11m)   kubelet            Started container clusterconnect-agent
    Warning  BackOff         87s (x47 over 11m)  kubelet            Back-off restarting failed container

The clusterconnect-agent showing error in the log:

174942-screenshot-2022-02-16-at-40150-pm.png

Any help would be much appreciated. Thank you!


azure-arc
· 3
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

I have experienced identical issues lately on Azure RedHat OpenShift (ARO) version 4.8.18.

The hack below temporarily fixed the issue with clusterconnect-agent but it keeps reporting "Back-off restarting failed container" every 10 minutes.

Also I'm still unable to get over the error on kube-aad-proxy: 'MountVolume.SetUp failed for volume "kube-aad-proxy-tls" : secret "kube-aad-proxy-certificate" not found'. Multiple arc connects and pod restarts have failed identically over the last days.

Happy to see I'm not the only one :)

I had successful k8s Arc onboarding experience earlier with agent versions 1.5.9. Now using the latest 1.6.1.

0 Votes 0 ·

Hi @AnttiSaarela-5366,

Good day, I believe MS did fixed this issue on latest release. Could you retry the onboarding command on your side see whether problem persist? Thanks.

0 Votes 0 ·

We were experiencing the same issue, and it turns out that the problem lied with the configuration of our proxy server: We had not added the "https://*.his.arc.azure.com" URL (as described here) to the list of endpoints allowed by our proxy server. We were able to determine this by using oc debug node/... into a worker node, enabling the proxy server on the node and checking that indeed the above-mentioned URL (with "weu" instead of "*") was returning HTTP error "407 Proxy Authentication Required".

Once we added the https://*.his.arc.azure.com URL to the list of endpoints allowed by our proxy server, the issue was resolved. We are using ARO v. 4.8.18


0 Votes 0 ·
Sulz avatar image
0 Votes"
Sulz answered JimmyHeeWoonSiong-6455 commented

I'm having a similar issue.
However it is intermittent, sometimes works and sometimes does not when running the same connect command against the same cluster.
I had assumed it was due to proxy authentication, or network timeouts - however this does not seem to be the case.

Noting that if the clusterconnect-agent-xx pod errors within the first 10 seconds of running the command, kube-aad-proxy will never finish creating and the arc-connect will fail.

· 5
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi @Sulz, same observation from my side, may I know your side able to onboard successful for now? I have tried around +-20 times with one time successful onboard the Azure Arc. I have attached more details on clusterconnect-agent-xxx pod for further troubleshooting and hope anyone from Microsoft could investigate?

0 Votes 0 ·
Sulz avatar image Sulz JimmyHeeWoonSiong-6455 ·

G'day @JimmyHeeWoonSiong-6455,
I've had success when arc-connecting an OCP cluster version 4.9.17 rather than the latest stable release (4.9.18). Which version are you running?
Only tried the once against this version so far, will run the az connectedk8s delete command and re-connect a few times to check consistency.

The first two connects out of five were successful.

Not really a fix, but seems the clusterconnect-agent pod can be healed by adding the following environment variable:
COMPlus_EnableDiagnostics with a value of '0'.

Not sure if this really is a fix as unaware if it impacts other arc functionality.

Heres a 1 liner to apply the "fix":

 oc patch deployment clusterconnect-agent -n azure-arc -p '{"spec":{"template":{"spec":{"containers":[{"name":"clusterconnect-agent","env":[{"name":"COMPlus_EnableDiagnostics","value":"0"}]}]}}}}'

Give it a few minutes and the kube-aad-proxy pod will come up too.

0 Votes 0 ·

Dear @Sulz,
Currently I am using OCP cluster version 4.9.8, which most of the time having fail attempt. By using the oc patch command provided by you, I have started all the pods successfully without error. Just to mentioned for my case, if kube-aad-proxy pod does not startup, can just delete pod and openshift will auto generate new kube-aad-proxy pod with startup successfully.

Although it might not be the fixes, but it could be a workaround to allow pod started successfully. Thank you for sharing your finding and I shall mark this as accepted answer. If I have any input from Microsoft for the valid fixes will update here also. Thanks again!

1 Vote 1 ·
Show more comments
AnttiSaarela-5366 avatar image
0 Votes"
AnttiSaarela-5366 answered AnttiSaarela-5366 published

To add to troubleshooting details, in my Arc connected ARO case at least, the first pod with issues after running az connectedk8s connect seems to be config-agent with following error lines in the logs:

{"Message":"In clusterIdentityCRDInteraction status not populated","LogType":"ConfigAgentTrace","LogLevel":"Error", "Environment":"prod","Role":"ClusterConfigAgent" ...
{"Message":"get token from status error: status not populated","LogType":"ConfigAgentTrace","LogLevel":"Error", ...
{"Message":"2022/02/20 09:39:12 Error : Retry for given duration didn't get any results with err {status not populated}","LogType":"ConfigAgentTrace","LogLevel":"Information" ...
{"Message":"2022/02/20 09:39:12 Error in getting Token for clusterType: {ConnectedClusters}: error {Error : Retry for given duration didn't get any results with err {status not populated}}", ...
{"Message":"2022/02/20 09:39:12 Error: in getting auth header : error {Error : Retry for given duration didn't get any results with err {status not populated}}", ...
{"Message":"get token error: Error : Retry for given duration didn't get any results with err {status not populated}","LogType":"ConfigAgentTrace","LogLevel":"Error", ... ,"AgentName":"ConfigAgent","AgentVersion":"1.6.1",


This leaves the config-agent container in unready status.

containers with unready status: [config-agent]

This may or may not lead to kube-aad-proxy and clusterconnect-agent pods having their own issues down the road.

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.