Designing and Implementing a PKI: Part V Disaster Recovery
Chris here again. We are now going to move onto Disaster Recovery. One of the many tasks you want to complete during the planning phase is to plan for disaster recovery. When planning for disaster recovery not only is the backup/restore process important, but the actual design of the PKI can affect how resilient your PKI infrastructure is. Additionally, proper planning can alleviate the impact of a system failure.
When the system hosting Certificates Services becomes unusable due to a failure, there are a couple of consequences of that failure.
1. The CA can no longer sign its Certificate Revocation List(CRL) or delta CRL(dCRL)
2. The CA can no longer issue certificates.
3. The CA database includes a record of certificates that have been issue or revoked, and is unavailable until the CA is recovered.
Signing CRLs and Delta CRLs
CRLs and delta CRLs are used by clients to determine if a certificate has been revoked. In general, applications will fail when they cannot determine the revocation status for a certificate, though some applications have the ability to disable revocation checking while others do not.
Like certificates, CRLs and delta CRLs have a period during which they are valid. Once the CRL and/or delta CRL expires an application checking the revocation status of a certificate against the expired CRL will fail. The point of this discussion is that typically the first impact you will see when a Certification Authority fails is the inability of applications to the check revocation status of any certificates.
When you design and implement a PKI you configure the validity period of the CA’s CRL and delta CRL. This design consideration has an impact in terms of disaster recovery. The maximum time you have after a CA failure to institute your recovery process without impacting certificate validation is determined by these settings.
Example 1. You have an issuing Certification Authority and it is publishing a base CRL once every 7 days and delta CRL once every day. You have approximately 24 hours since the last delta CRL was published to either restore the CA or re-sign the delta CRL before certificate validation starts failing.
Example 2. You have an issuing Certification Authority and it signs a CRL once every 7 days, but is not configured to publish a delta CRL. In this scenario you have 7 days – (the number of days since the base CRL was signed) before validation will begin to fail due to the inability to check revocation status against a valid CRL.
There are several ways that you can minimize the impact that a CA failure will have on certificate validation.
One way is to install a clustered issuing certification authority. If the active node of the cluster fails the CA can be failed over to the second node. Clustering, however, will not protect against the failure of a shared component such as storage or a Hardware Security Module (HSM). So these devices should have methods to provide failover as well, if possible.
Another option is to increase the period the base and delta CRL publication intervals (and hence, their validity periods). This can potentially give you more time to kick off your recovery process, but if the CA fails shortly before the new base or delta CRL is about to be published increasing the publication interval has done little good. One must also realize there is a trade-off involved here. Increasing the publication interval means that it will take longer for certificate consumers to become aware that a certificate has been revoked and added to the CRL.
A more complicated strategy is to set the automatic publishing interval to a longer period, and then manually publish the CRL more often. In other words you set the CRL publication interval to 7 days, and then publish a new CRL every day. This way, if the CA fails you have 6 or 7 days recognize the problem and start your recovery process. The Windows CA does not automatically publish CRLs in this fashion, but you can set up a scheduled task on the CA server to publish the CRL every 24 hours using the command line utility, certutil.exe. The command certutil -crl will will instruct the CA to publish a new base CRL with the validity period defined in the CA configuration.
There are also some group policies that you can consider as part of your overall disaster recovery planning. If you have workstations and servers running Windows Vista, 7, Server 2008, or Server 2008 R2 there is a group policy setting that extends the period of time for which the OS will consider a given CRL valid, independent of the actual validity period of the CRL. The group policy setting is located in the following location:
Computer Configuration\Windows Settings\Security Settings\Public Key Policies\Certificate Path Validation Settings.
This setting forces the client to consider the CRL or OCSP response to be valid for longer than it actually is. Below is a screenshot of the specific settings:
In terms of recovery there is a short term workaround and a long term resolution. The short term workaround is to use a process called CRL re-signing to manually re-sign an existing CRL and extend its validity period. By doing this, you can give yourself additional time to recover the CA. CRL re-signing requires that you have a backup of the CA’s public/private key pair. I will be covering this process later in this blog posting.
The longer term fix is to restore the certification authority. This of course is not possible unless you have previously backed up the certification authority. I will also cover this later in the blog post.
CA can no longer issue certificates
Another issue that occurs when you have a CA failure is that it can no longer issue certificates. In some scenarios where certificates are issued less frequently, the inability to issue certificates may not have a business impact. In other cases, however, the impact could be considerable. For example, if a CA dedicated to issuing certificates for Network Access Protection (NAP) fails the problem would be almost immediately noticeable. NAP certificates have a lifetime of only 24 hours, so a failed CA can be a considerable problem.
One way to eliminate this issue completely is to have multiple CAs that are issuing certificates based on the same certificate templates. In this way, if one CA fails clients can still enroll for certificates on one of the other certificate authorities.
A clustered issuing certification authority is another way to mitigate against a failed CA. If one of the CAs in the cluster fails the cluster will fail over to the second node. Clustering, as mentioned earlier, will not protect against the failure of a shared component such as storage or an HSM. I’ll re-iterate the need for these devices to have methods for failover as well.
Ultimately, recovering from the inability to issue certificates can be resolved by recovering the failed certification authority or installing a new issuing certification authority to issue certificates. The preferred method would be to restore the failed certification authority since it already has information about issued certificates in its CA Database.
By default, the CA database contains a copy of every certificate issued, every certificate that has been revoked, and a copy of failed and pending requests. The CA Manager may decide, however, to clear out any expired certificates from the CA database in order to recover free space in the database.
Note: In Windows Server 2008 R2 you can configure a template such that issued certificates based on that template are not stored in the CA database. These so call “ephemeral certificates” generally have validity periods shorter than the publication interval of the issuing CA, so recording them so they can be later revoked makes little sense. Further, these short-lived certificates may be issued in great numbers and with great frequency. Storing them in the database can dramatically increase the database’s rate of growth. Certificates issued for NAP are examples of these ephemeral certificates.
If a CA is configured for key Archival and Recovery, the CA database will also contain the private keys for any certificates whose templates are configured for archival. Failure to recover the CA database in this case would result in losing all of these archived keys.
When a certificate authority fails the database is unavailable which makes it difficult to revoke certificates that were previously issued by the CA. It also makes it impossible to recover any certificates that have been archived in the database. Again, the database will be unavailable when the CA is unavailable. However, in rare circumstances it is possible that the CA database can become corrupted. Like all ESE databases, the CA database can be affected by hardware or disk issues that impact the database or log files.
One option to mitigate the database becoming unavailable due to a CA failure is to set up a clustered certification authority. Another option is to take regular backups of the CA. If the CA fails, you can then restore the CA from the backup. Below I discuss options for backing up the CA as well as for restoring the CA.
For corrupt databases, repairs can be made with esentutil.exe. However, in most case it would be preferred to restore from a backup to avoid data loss that can be incurred when using some of the functions in esentutil.exe. Esentutil.exe can repair the structure of the database, but usually at the expense of the data stored within that structure.
There are two different ways to backup the Certification Authority. The first is through a System State backup. A system state backup will back up the entire CA as well as its configuration. If the private key is stored on the CA and not on an HSM, the private key will be backed up as well. Here is additional information on System State. A system state backup should be used when you will need to restore to the same hardware.
Backing up system state in Windows Server 2003.
1. To start NT Backup, click Start then Run, type ntbackup.exe and press Enter.
2. If this is the first time you’ve run this tool, it will start the Welcome to the Backup or Restore Wizard.
3. Uncheck the Always start in wizard mode, and then click Cancel.
4. Launch NT Backup again.
5. Once NT Backup launches, select the Backup Tab, and check just System State as the item to backup.
6. Under the Backup media or file name section, select your backup media or file location where you wish to save the backup.
7. Click the Start Backup button. This will bring up the Backup Job Information dialogue box.
8. If you wish to start the backup immediately, click Start Backup.
9. If you wish to schedule the backup, click the Schedule button.
10. When prompted You must save the backup selections before you can schedule a backup. Do you want to save your current selections now? , click Yes.
11. Save the selection script.
12. After you save the selection script, the Scheduled Job Options dialogue box will open. Give the Job a name. Then click the Properties button.
13. Configure the desired schedule, and click OK. Then enter the credentials for the user that you wish the backup to run under. This account will need to either have Back up files and directories right or be a member of the Backup Operators group on the CA. Then click OK again. Click OK again, you will be prompted for the credentials again.
14. You can then click on the Schedule Jobs tab in NT Backup to check the schedule.
Restore System State in Windows Server 2003
1. On the Windows Server 2003 system on which you plan on restoring system state, open the NT Backup utility.
2. Click on the Restore and Manage Media tab.
3. Navigate to the backup of the system state, make sure that System State is checked. Under Restore files to, make sure Original location is selected, and click Start Restore.
4. You will then be prompted that Restoring System State will always overwrite current System State unless restore to an alternate location. Click OK. Then click OK, to Confirm Restore.
5. When the Restore completes, click Close.
6. You will then be prompted to restart your computer, click Yes.
Performing System State Backup Windows Server 2008 R2
1. If you have not installed Windows Backup, you will first have to install this feature. Open Server Manager, select the Features node, then click Add Features.
2. In the Add Features Wizard, select Windows Server Backup Features, then click Next, and then Install. When the installation completes, click Close.
3. You can then launch the Windows Server Backup tool, by clicking Start, then Administrative Tools, then Windows Server Backup.
4. Also, to use Windows Server Backup, you have to have an additional drive or a network location to backup to. In other words you cannot save the backup on the system drive.
5. The wizard allows you to configure a one-time backup, or schedule a backup.
6. To schedule a backup, click Backup Schedule… , under the Actions sections of the Windows Server Backup tool.
7. This will start the Backup Schedule Wizard, click Next.
8. On the Select Backup Configuration page, select Custom, and then click Next.
9. On the Select Items for Backup page of the wizard, click the Add Items button.
10. Select System State, and click OK, then click Next.
11. On the Specify Backup Time page of the wizard, select the time that you would like the backup to be scheduled for, and click Next.
12. On the Specify Destination Type page of the wizard, select either Hard Disk, Volume, or Shared Network Folder, and click Next. In this example, I am selecting Hard Disk
13. Select the Hard Disk you would like to use for backup, if it is not listed, click Show All Available Disks…, and select the appropriate disk, and click OK. Click Next.
14. You will be prompted that the disk will be reformatted and existing volumes will be deleted, click Yes if you are using this disk solely for backups, if not choose another backup destination.
15. On the Confirmation page, click Finish.
16. On the Summary page, click Close.
Restoring System State in Windows Server 2008 R2.
1. In the Actions page of the Windows Server Backup tool, click Recover…
2. This will start the Recovery Wizard, select the location of the backup, and click Next.
3. On the Select Backup Date of the wizard, select the date and time of the backup and click Next.
4. On the Select Recovery Type, select System state, and click Next.
5. On the Select Location for System State Recovery page, select Original location, and click Next.
6. On the Confirmation page of the wizard, click the Recover button.
7. You will be prompted that the recovery cannot be paused or cancelled once started, click Yes.
Manual Backup of the Certification Authority
A good guide to user for backing up and restoring a certification authority is:
298138 How to move a certification authority to another server
Steps 1 through 3 of this document cover manually backing up the CA.
Essentially, you want to do a manual back up of the private key, CA certificate, and CA database. If you are using an HSM to protect the private key pair, you will either need to backup the private key through a method provide by the HSM vendor or have a highly available configuration for the HSMs. In general, if the private key is stored on an HSM, you do not want to backup the private key to any type of media, as this will degrade the overall security and protection of the private key. The configuration for the Certification Authority is stored in the registry so you would want to backup that registry location as well. The registry location is HKLM\System\CurrentControlSet\Services\CertSvc\Configuration\<CA Name>.
Generally the private key, CA certificate and CA configuration are going to remain relatively static. You will, however, need to perform a fresh backup should you ever renew the CA certificate or update the configuration. However, the CA database is going to grow over time as certificates are issued, requests are denied, and certificates are revoked, so you are going to want to periodically backup the database. How often you perform this back up will depend on how rapidly changes to the database are made and how tolerant you are to discrepancies between the back up and the live data.
The first time you run the backup you will want to back up the CA’s certificate and private key, the CA database, and the certificate database log. To perform this task through the GUI, open up the Certification Authority MMC snap-in (certsrv.msc).
1. Right click on the certification authority name and select All Tasks from the context menu, and then select Back up CA…
2. This will launch the Certification Authority Backup Wizard, click Next.
3. Select Private key and CA certificate and Certificate database and certificate database log. Browse to a local or network location to save the backup. The backup location must be an empty folder, and click Next.
4. Enter a password to protect the private key, and click Next, then Finish.
To backup the CA via the command line, open an elevated command prompt and type certutil –backup Path. Path is the empty directory where the backed up information will be stored. You will then be prompted for a password to protect the private key. Enter the password and then press the Enter key. You will then be prompted to confirm the password. Confirm the password and press the Enter key. A message will be sent to the console indicating what has been backed up and that the certutil –backup command completed successfully.
To backup the registry run the following command: REG EXPORT "HKLM\System\CurrentControlSet\Services\CertSvc\Configuration\<CA Name>" caconfig.reg
Copy caconfig.reg to your backup directory so that all the necessary data is in the same place.
Once you have completed a full back up of the Certification Authority, you can perform incremental backups of the CA database. Alternatively, you could choose to periodically backup the entire CA database.
Although, you can back up the database through the Certification Authority console, you will most likely want to use some sort of script of scheduled task to perform the backup periodically.
Manual Restore of the Certification Authority
Once you relocate the server that will serve as the replacement for the failed CA, you must do some initial configuration of the server. Give that server the same name as the failed CA and join it to the same domain
Configure AD permissions
Since you have brought online a new machine to be the CA we need to modify the security of Active Directory to allow the new machine to be able to update PKI configuration information in AD. This is because the new machine will have a new SID associated with the machine account, even though the machine account has the same name.
Open ADSIEDIT.MSC. Open the Configuration container of the Active Directory database. Browse to CN=Public Key Services, CN=Services, CN=Configuration. Next open the AIA container. Locate the object that is associated with the failed CA. Right click on that object, and select Properties from the context menu. Click on the Security Tab. Remove the CA's computer account. Then re-add the CA's computer account, and give it full control. This will associate the permissions with the new account.
Next open the CDP container. Locate the container associated with the failed CA. Open that container and then select the CRL object contained within that container. Right click on the CRL object, and select Properties from the context menu. Click on the Security Tab. Remove the CA's computer account. Then re-add the CA's computer account, and give it full control.
Next open the Enrollment Services container. Locate the object associated with the failed CA. Right click on that object, and select Properties from the context menu. Click on the Security Tab. Remove the CA's computer account. Then click Advanced. In the Permissions tab of the Advanced Security Settings dialog box, click Add… Add the computer object for the CA. On the Permission Entry screen, select Allow for all Permissions except Full Control. Click OK 3 times.
Next open the KRA container. Locate the object that is associated with the failed CA. Right click on that object, and select Properties from the context menu. Click on the Security Tab. Remove the CA's computer account. Then re-add the CA's computer account, and give it full control. This will associate the permissions with the new account.
Installing the Certification Authority Role
Next we need to restore the Certification Authority. Log on with an account that has Enterprise Admin credentials. The first thing we will need to do is to install the Certification Authority Role. The instructions below are for a Windows Server 2008 and Windows Server 2008 R2 based CA. For exact procedures in Windows Server 2003. Please see the following article:
298138 How to move a certification authority to another server
1. Open Server Manager.
2. Click on the Roles Node, then click Add Roles.
3. When the Add Roles Wizard opens, click Next.
4. Select Active Directory Certificate Services and click Next.
5. Then Click Next Again.
6. On the Select Role Services page of the wizard, select Certification Authority, and then click Next.
7. On the Specify Setup Type page of the wizard, Select Enterprise or Standalone depending on the configuration of the failed CA, and then click Next.
8. On the Specify CA Type page of the wizard, select either Root CA or Subordinate CA, depending on the configuration of the failed CA, and then click Next.
9. On the Set Up private key page of the wizard, select Use existing private key, and the sub-option of Select a certificate and use its associated private key, then click Next.
10. On the Select Existing Certificate page of the wizard, click Import.
11. Browse to the backup of the failed CA and select the P12 file from the backup, click Open. Then enter the password for the P12 file, and click OK.
12. Then click Next.
13. On the Configure Certificate Database page of the wizard, select the same database and log file locations as were specified on the failed CA, then click Next, then Install.
14. When the installation completes, click Close.
Open an elevated command prompt and use the following command to import the previously backed up CA configuration : REG IMPORT <Previously backed up registry file> .
Restore the CA Database
At this point, you can restore the CA database from your backup.
1. Right click on the certification authority name and select All Tasks from the context menu, and then select Restore CA…
2. You will be prompted to stop Certificate Services. Click Ok.
3. When the Certification Authority Backup Wizard starts, click Next.
4. Select Certificate database and certificate database log. Browse to a local or network location of your previously saved backup.
5. Click Next.
6. Click Finish.
7. You will be prompted to restart the CA. Unless you have further incremental backups to restore, click Yes. If you have incremental backups then click No, and walk through the steps above to restore your incremental backups.
Now if there were any additional Certificate Services roles such as Online Responder (OCSP) or Web Enrollment, you can go ahead and install those at this point.
CRL re-signing is a manual process whereby the Administrator can use the CA's backed up certificate and private keys to re-sign an existing CRL file. This process allows you to extend the lifetime of the existing CRL, and even add certificates to the CRL, effectively revoking them.
Importing the CA certificate and private key
To begin, you will need to have a backup of the private key of the CA. If you have the private key stored on an HSM, you will have to follow the HSM vendor’s instructions for making the private key available to another machine. If you are not using an HSM, perform the following to import the CA public and private key pair to the machine where you will be re-signing the CRLs.
1. Click Start, then Run, and type MMC, and the press Enter.
2. Select the File Menu, and then select Add/Remove Snap-in…
3. Select Certificates, and then click Add > .
4. Then select Computer account, and click Next.
5. Then select Local computer, and then click Finish.
6. Then click OK.
7. Expand the Certificates (Local Computer) node.
8. Right click on the Personal node, then select All Tasks from the context menu, and then select Import…
9. This will open the Certificate Import Wizard, click Next.
10. Click the Browse button, to browse to the P12 file located in the CA's backup location.
11. In the drop down for the extension type, select Personal Information Exchange (*.pfx;*.p12)
12. Locate the P12 file that was previously backed up, and click Open.
13. Click Next.
14. Type the Password for the P12 file and click Next, click Next again, and click Finish.
15. Click OK to acknowledge that the import was successful.
To re-sign the CRL and Delta CRL with the same validity period as they have been previously published, use the following command:
certutil -sign <existing CRL file name> <re-signed CRL file name>
You will then have to manually publish the CRL to all CDP locations.
If you wish to adjust the validity period you can specify the validity period at the end of command in the following format DD:HH, where D=Days, and H=Hours. For example, the following command would re-sign a CRL that is valid for 14 days:
certutil -sign <existing CRL file name> <resigned CRL file name> 14:00
If you wish to add one or more issued certificates to the CRL, you specify the serial numbers in a comma separated list on the command line. For example, the following command would add serial numbers to the CRL:
certutil -sign <exiting CRL file name> <resigned CRL file name> +SerialNumber1,SerialNumber2,SerialNumber3
When building a PKI infrastructure it is critical to take into consideration how your design will have an effect on the availability of your PKI. However, the design also affects the way in which you may have to recover the CAs in the PKI.
You should definitely consider the criticality of PKI to your environment, and how much downtime is acceptable. This will help drive your decisions when designing the PKI and implementing the Certification Authorities.
Also, many customers make the mistake of either not being aware of how to recover a Certification Authority or do not have a documented process for doing so. When designing and implementing your PKI, I recommend that you test recovery and document the recovery steps for CAs in your PKI.
Chris "CLEAR!" Delay