A few days ago, a couple of our Azure release pipelines began to fail.
The pipelines deploy to our client's self-hosted machines - both a test and production environment; neither pipelines are working anymore. We haven't changed anything in the pipeline and the client says nothing has changed in their infrastructure (taking this with a grain of salt, as we have a bunch of our own self-hosted machines where the deployments are still going through).
The error is occurring in the deployment group job when it attempts to do the artifact download. This is what the log output looks like:
2024-04-24T01:28:35.4107721Z Error: in getBuildApi, so retrying => retries pending : 4
2024-04-24T01:29:20.5325958Z Error: in getBuildApi, so retrying => retries pending : 3
2024-04-24T01:31:35.6424623Z Error: in getBuildApi, so retrying => retries pending : 2
2024-04-24T01:37:35.7312572Z Error: in getBuildApi, so retrying => retries pending : 1
2024-04-24T01:43:35.9335199Z ##[error]Failed in getBuildApi with error: Error: unable to verify the first certificate
at TLSSocket.onConnectSecure (node:_tls_wrap:1674:34)
at TLSSocket.emit (node:events:518:28)
at TLSSocket._finishInit (node:_tls_wrap:1085:8)
at ssl.onhandshakedone (node:_tls_wrap:871:12) {
code: 'UNABLE_TO_VERIFY_LEAF_SIGNATURE'
}
2024-04-24T01:43:36.0200040Z ##[error]Error: unable to verify the first certificate
So far we've tried:
- Installing all Windows updates on the target machines.
- Removing and reinstalling the Azure Agent onto the target machines with a new token.
- Ran the TLS checker and applied mitigations. https://github.com/microsoft/azure-devops-tls12
- Set the environment variable NODE_TLS_REJECT_UNAUTHORIZED=0 on the test environment to see if it'd go through. It does, but we can't leave it like this.
Any ideas on how we can fix this or further narrow down the problem? We're a bit puzzled as to why we're suddenly getting a bad certificate.
Thanks in advance!