question

NOBLERobert-3280 avatar image
0 Votes"
NOBLERobert-3280 asked Crypt32 answered

How many HSMs do we need?

Hi,
We are building a new PKI, and will use HSMs for the root and issuing CAs.
We are seeing advice (on forums, and from Microsoft support and Thales) that the HSMs need high availability, and so will need at least two, and that we should use at least two for back-up as well.
Q1 - Do we need two HSMs for high availability? If the most frequent use is for issuing certificates, then will we lose the ability to issue and renew certificates for a long time if a solo HSM goes down?
Q2 - Do we have to use a Thales HSM for backup if we have a Thales HSM in live service supporting our CAs, or can we use a USB key for backup?

Options: I would like to know if we need
- 4 HSMs (2 in Azure, two in on-prem backup locations),
- 3 (2 in Azure, one on-prem backup)
- 2 (2 in Azure, USB backup),
- 2 (1 in Azure, 1 backup) or
- 1(in Azure, USB backup)
Any recommendations?

Thanks and regards,
Rob

windows-server-securityazure-dedicated-hsm
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Thameur-BOURBITA avatar image
0 Votes"
Thameur-BOURBITA answered

Hi,

In this link you will find a example of high availability : high-availability




Please don't forget to mark helpful reply as answer

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Crypt32 avatar image
0 Votes"
Crypt32 answered NOBLERobert-3280 commented

Do we need two HSMs for high availability?

yes, you should have.

If the most frequent use is for issuing certificates

there is CRL signing as well. If you loose HSM, you loose the ability to sign CRLs, thus invalidating previously issued certificates due to offline revocation.

I would like to know if we need

no one here will answer this. It depends on a budget and recovery options. Say, how easy you can fallback from Azure to onprem in case if cloud HSM fails?

· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Ok thanks Crypt32, appreciate that best deployment is a choice for each company to balance security, availability and cost.

CRL validity:
I understand if the HSM goes down, we lose the ability to sign the publishing of baseline and delta CRLs (as well as certificates for all users, devices and applications). However, if we make sure our CRL publishing process always releases a new CRL in at least the time it takes to recover an HSM and don't use the HSM for any other part of the validation process, the OCSP responder service should always have a valid CRL to validate certificates against, right?
e.g.
- Old baseline CRL expires every Wednesday night
- New baseline CRL published every Monday night (two days before)
- HSM goes down Tuesday morning for 36hours (we actually expect to be able to restore HSM service in much shorter time, but just for the example, lets say it takes 1.5 days to restore service = less than the two day overlap between new and old CRLs)
- OCSP responders can access the new CRL published Monday
Only risk I see is that any revocations issued between Tuesday morning and Wednesday evening will not be published in a delta CRL until Wednesday evening at the earliest.
Am I missing anything?

Thanks again for your help :-)


0 Votes 0 ·
Crypt32 avatar image
0 Votes"
Crypt32 answered

if we make sure our CRL publishing process always releases a new CRL in at least the time it takes to recover an HSM and don't use the HSM for any other part of the validation process, the OCSP responder service should always have a valid CRL to validate certificates against, right?

that's correct.

I would plan CRL schedule in conjunction with HSM DR (disaster recovery). You can have your CRL be valid for any reasonable period and add maximum DR time in overlap period. For example, CRL is valid for 3 days (72hrs) and your DR period is 36hrs. Then you configure CRLPeriod to 3 days and CRLOverlapPeriod to 36hrs. This will result in:

This Update: Monday, 00:00
Next CRL Publish: Thursday 00:00
Next Update: Friday 12:00

This means that your CRL is effectively valid from Monday 00:00 till Friday 12:00. CRL publish will be attempted by Thursday 00:00. If this publish fails, you have extra 36 hours to recover HSM and publish new CRL before any cached copy of previous CRL expires. For more information, please check my blog post article: How ThisUpdate, NextUpdate and NextCRLPublish are calculated (v2)


5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.