Hi,
At one of our customers, I think we might be running into a certificate autorenewal bug on our Windows server 2019 Webservers which are configured to use SSL certificates with autorenewal.
Automatic rebinding of the certificate is enabled in IIS (via Task Scheduler) and SNI is activated for each website we host.
Deployment Scenario
On a single webserver we created a few websites.
Each website was configured with SNI enabled.
Certificate Autoenrollment was configured and applied via Group policy on the Webserver.
In IIS the 'Certificate Rebind' feature was enabled
For each website an SSL certificate was enrolled using the same custom 'Webserver' certificate template. The template had a lifetime of 1 year and a renewal at 2 months prior to end
The enrollment of all SSL certificates occurred in the same timeframe (less than 30 minutes)
Each certificate was bound to its proper website in IIS.
Renewal Time
At the moment of renewal, the group policy client side engine kicked off the autorenewal of the certificates. It should renew all certificates that will expire in less than 2 months as per the custom 'Webserver' template.
However, only one of all the certificates remained. And that one was used for rebinding to all websites, which was incorrect.
Investigation
When looking into the eventlog "CertificateServicesClient-Lifecycle-System", we noticed the events that took place to replace the old certificates.
we noticed the following with the first certificate:
Event 1006 was logged: A new certificate has been installed.
The Enroll action informed us of the thumbprint of the new certificateEvent 1001 was logged: A certificate has been replaced.
The renew action informed us of the old thumbprint and the new thumbprint.Event 1005 was logged: A certificate has been archived
The log entry shows the thumbprint of the certificate which has been replaced.
The subsequent action is to renew the following certificate. However:
No event 1006 is logged, we also see no certificate pending request at the CA.
No event 1001 is logged.
Event 1005 is logged, and the certificate is archived.
This happens for each subsequent certificate, and at the end the rebinding occurs with the single replaced certificate.
Reproducing the error
We have created a test-webserver and lowered the lifetime on the template to one day, with a renewal of 2 hours prior to expiration.
We are able to continuously reproduce the behavior.
Even when a certificate is not renewed at the same time but more than 2 hours later, no event 1006 nor 1001 is logged. The certificate not replaced but archived, and due to the lack of event 1001 automatic rebinding does not occur.
It only occurs for certificates based on the same template. During the test we also used single certificates based upon another template and those were replaced correctly.
Assumption
What we assume, based upon the tests we performed, is that in the renewal algorithm a verification occurs based upon the template oid/name.
it looks like the algorithm verifies that, if a certificate still exists in the certificate store that has a valid lifetime and is based upon the same template, other certificates that are also based upon this template do not need renewal.
Can someone have a look into this issue? Thanks!