question

JakubMuszyski-7200 avatar image
0 Votes"
JakubMuszyski-7200 asked JakubMuszyski-7200 edited

Arc enabled kubernetes - Envoy looses ? certificate ? after cluster restart and return connection reset on listeners

Relate to my original question (more logs and installation command there) https://stackoverflow.com/questions/68544812/arc-enabled-kubernetes-envoy-looses-certificate-after-cluster-restart-and
and https://github.com/envoyproxy/envoy/issues/17484


 connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: TLS error: CERTIFICATE_VERIFY_FAILED

after first deployment, before cluster restart
118286-image.png

few minutes after the cluster restart, when 'something' sync
118266-image.png

THE PROBLEM
You deploy arc-enabled k8s, connect the location, enable extension, all works fine, you can access your appservice via URL, nice!
Then you restart the cluster, pods are recreated, and within few minutes, all breaks.
Based on my observation, either
- app-controller sync (or does not sync) certificate/token with Azure,
- OR envoy has problem mounting XDR certificates provided by the app-service pod (they share volume)
- OR maybe service account token is not refreshed?

I guess, as this is not clear to me what is missing, I thought it's some secret, but did not catch any...
but this is strange

"envoy","msg":"error reading default cert","error":"140261821150336:error:0D06B08E:asn1 encoding routines:asn1_d2i_read_bio:not enough data


There is litte docs on Arc :/ for an open project I'd love to see the source helm charts
I've noticed you store one in the cluster secret, yet I failed to decrypt, any suggestion there?

 kubectl get secret sh.helm.release.v1.appservice-ext-node1-v1.v3   --namespace appservice-ns-node1-v1  -o=jsonpath={.data.release}   |base64 -d > /tmp/helm.base


Why envoy gives the ERROR_CONN_RESET:

https://www.envoyproxy.io/docs/envoy/latest/configuration/security/secret#config-secret-discovery-service

If a listener server certificate needs to be fetched by SDS remotely,
it will NOT be marked as active, its port will not be opened before
the certificates are fetched. If Envoy fails to fetch the certificates
due to connection failures, or bad response data, the listener will be
marked as active, and the port will be open, but the connection to the
port will be reset.


Most interesting error, found in appservice-ext-node1-v1-k8se-app-controller-85cb587976-nwr6h

 {"level":"info","ts":1627301876.563232,"logger":"controller-runtime.metrics","msg":"metrics server is starting to listen","addr":"127.0.0.1:8080"}
 {"level":"error","ts":1627301876.6502664,"logger":"envoy","msg":"Error reading default cert","error":"140453565756544:error:0D06B08E:asn1 encoding routines:asn1_d2i_read_bio:not enough data:../crypto/asn1/a_d2i_fp.c:198:\n\nexit status 1","stacktrace":"main.main\n\t/__w/k4a
 pps/k4apps/cmd/appcontroller/main.go:124\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:225"}
 {"level":"error","ts":1627301876.6528075,"logger":"envoy","msg":"error reading dapr cert","error":"secrets "dapr-trust-bundle" not found","stacktrace":"main.main\n\t/__w/k4apps/k4apps/cmd/appcontroller/main.go:124\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:225"}
 {"level":"info","ts":1627301876.7418113,"logger":"setup","msg":"starting manager"}
 I0726 12:17:56.741905 1 leaderelection.go:243] attempting to acquire leader lease appservice-ns-node1-v1/appservice-ns-node1-v1-appservice-ns-node1-v1...
 {"level":"info","ts":1627301876.7421775,"logger":"controller-runtime.manager","msg":"starting metrics server","path":"/metrics"}
 {"level":"error","ts":1627301877.841797,"logger":"envoy","msg":"error reading default cert","error":"140261821150336:error:0D06B08E:asn1 encoding routines:asn1_d2i_read_bio:not enough data:../crypto/asn1/a_d2i_fp.c:198:\n\nexit status 1","stacktrace":"github.com/microsoft/k
 4apps/pkg/envoy.(*XDSManagementServer).updateTLSCert\n\t/__w/k4apps/k4apps/pkg/envoy/envoy.go:379\ngithub.com/microsoft/k4apps/pkg/envoy.(*XDSManagementServer).watchDefaultTLSCert.func3\n\t/__w/k4apps/k4apps/pkg/envoy/envoy.go:336\nk8s.io/client-go/tools/cache.ResourceEvent
 HandlerFuncs.OnAdd\n\t/go/pkg/mod/k8s.io/client-go@v0.20.4/tools/cache/controller.go:231\nk8s.io/client-go/tools/cache.(*processorListener).run.func1\n\t/go/pkg/mod/k8s.io/client-go@v0.20.4/tools/cache/shared_informer.go:777\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.f
 unc1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.20.5/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/go/pkg/mod/k8s.io/apimachinery@v0.20.5/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/pkg/mod/k8s.io/apimachinery@v
 0.20.5/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/pkg/mod/k8s.io/apimachinery@v0.20.5/pkg/util/wait/wait.go:90\nk8s.io/client-go/tools/cache.(*processorListener).run\n\t/go/pkg/mod/k8s.io/client-go@v0.20.4/tools/cache/shared_informer.go:771\nk
 8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.20.5/pkg/util/wait/wait.go:73"}
 {"level":"info","ts":1627301877.8429492,"logger":"envoy","msg":"Processed on startup","count":2}
 {"level":"info","ts":1627301877.8429906,"logger":"envoy.stopwatch","msg":"measured: ","Initializing snapshot":1101}
 {"level":"info","ts":1627301877.8430026,"logger":"envoy","msg":"starting xds and auth server on port 9090"}
 I0726 12:18:13.706604 1 leaderelection.go:253] successfully acquired lease appservice-ns-node1-v1/appservice-ns-node1-v1-appservice-ns-node1-v1


azure-arc
image.png (256.3 KiB)
image.png (342.7 KiB)
· 4
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

@JakubMuszyski-7200 , Thank you for your question.

Azure Arc is not an Open Source Project. Hence source code of Azure Arc is not publicly available. However, Azure Arc Jumpstart has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments. Reference

To connect a Kubernetes cluster to Azure, the cluster administrator needs to deploy agents. These agents run in the azure-arc Kubernetes namespace as standard Kubernetes deployments. Reference

The secret sh.helm.release.v1.appservice-ext-node1-v1.v3 , based on your screenshot from the K9s interface, seems like part of an application that you have deployed in the namespace appservice-ns-node1-v1 on the Azure Arc-enabled Kubernetes cluster.

The architecture of the application deployed in the namespace appservice-ns-node1-v1, must be checked in order to understand if app-controller syncs (or does not sync) certificate/token with Azure.



0 Votes 0 ·

@JakubMuszyski-7200 , If envoy has a problem mounting XDR certificates provided by the app-service pod due to sequence of initialization and readiness then maybe you can try merging the pods using the concept of init Containers.

In Kubernetes, every time a pod is restarted the service account token currently valid for the service account associated with the pod is mounted onto it if automountServiceAccountToken is not set to false.



0 Votes 0 ·

Thanks for the reply
I plan to have some play with the certificates, any chance you can tell me what is the source of the mentioned "default cert"? This one seemed to be available after the first deplyment, then disappear, but I did not catch it yet (that was my plan for the next step - BUT I think someone at MS is changing things as we speak - now even the initial deployment, with a clean cluster fails.... eh...)
IAlso I'm curious about "dapr cert","error":"secrets "dapr-trust-bundle" - but I've never seen dapr-trust-bundle secret so I assume this error is normal 'false-positive'

How would you like to apply the concept of the init containers?
My guess is that I'd need to modify envoy or the app-service deployment, and since I do not have access to a source helm chart... well...
I do modify some resources 'live' already anyway, so not a problem - I'm adding a debug to envoy, or change number of replicas.... but long term - some process is overriding my changes ...

My question a bit is - what is the high-level procedure of publishing appservice with Arc, like:
1) publish app to Arc
2) app-controller push or pull (what is true) resource definition, apply to cluster
3) app-controller pull certificates, save to volume XYZ...
4)....

What credentials is the app-service using? Against what service?
Is this JWT token from the serviceAccount, or this 'mystical' default certificate...

so many unknowns...

0 Votes 0 ·

actually I'd like also to know what is the relation of the pods in the namespace

like:
does app-controller need k8se-activator to publish an app-service?
does envoy need any other pod to work (eg http-scaler or metrics-apiserver)?

When, how, why, and with what credentials, app-controller is refreshing certificates?
What should be the location and the content of the certificates?

0 Votes 0 ·
JakubMuszyski-7200 avatar image
0 Votes"
JakubMuszyski-7200 answered

Installation script: https://pastebin.com/igCc5KwR

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

JakubMuszyski-7200 avatar image
0 Votes"
JakubMuszyski-7200 answered JakubMuszyski-7200 edited

BTW, are the helm charts or azure-arc components available publicly? is it an Open source? or accessible for customers ?

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.