Manage Certificates for HPC Pack 2016 Update 2 or later version Cluster

There are several certificates used in HPC Pack Cluster for different purpose. Here is a full list:

Certification Name Purpose Requirements
Microsoft HPC Azure Client Communication to Azure PaaS deployment Auto generated when cluster is setup. Cluster admin don't need to create.
The certificate will be added again by the hpcmanagement service if it is missing in the certificate store
Microsoft HPC Azure Management Communication to azure management service for Azure PaaS node deployment Admin need import this cert to his azure subscription as a management cert;
cert with private key imported in LocalMachine\My
HPC Pack Communication for Headnode For cluster services to communicate to Services running on Headnode Head node(s): imported in LocalMachine\My with private key and if it is self signed cert need imported to LocalMachine\Root with public key
Windows Compute node: if it is self signed cert, need to be imported to LocalMachine\Root with public key
Linux Nodes: N/A
HPC Pack Communication for Compute node For headnode service to communicate to services running on Compute node This cert need have the same CN name as the above cert;
Head node(s): imported in LocalMachine\Root if the cert is self signed
Windows Compute node: imported to LocalMachine\My with private key and imported to LocalMachine\Root if this cert is self signed
Linux Nodes: install folder
Certificate for Service Fabric Service Fabric Cluster For simplicity we re-use the "HPC Pack Communication" Certificate during cluster setup. You can add additional cert for Service Fabric Cluster
Certificate on Client Authentication and Secure Communication to Cluster For any client, you need import the cert prepared for "HPC Pack Communication for Headnode" to LocalMachine\Root if it is self signed cert.
you can skip this step if you choose "Skip CA and CN validation" during client setup or added below reg:
HKLM\SOFTWARE\Microsoft\HPC\CertificateValidationType = 0
If you want to manage the cluster from a remote client, you need import the cert to LocalMachine\My with private key and to CurrentUser\Root with public key, then specify the thumbprint in below reg key
HKLM\SOFTWARE\Microsoft\HPC\0386B1198B956BBAAA4154153B6CA1F44B6D1016 = <thumbprint here>

Prepare new certificate

Microsoft HPC Pack 2016 cluster requires a Personal Information Exchange (PFX) certificate to secure the communication between the HPC nodes. The certificate must meet the following requirements:

  1. Have a private key capable of key exchange;

  2. Key usage includes Digital Signature and Key Encipherment;

  3. Enhanced key usage includes Client Authentication and Server Authentication;

  4. Certificates used for headnode communication and compute node communication shall have a same CN name;

You can generate a self-signed certificate which meets the requirements with the following commands and export it as a PFX certificate. For operating system Windows 10 or Windows Server 2016, just run the built-in New-SelfSignedCertificate command as following:

New-SelfSignedCertificate -Subject "CN=HPC Pack 2016 Communication" -KeySpec KeyExchange -TextExtension @("2.5.29.37={text}1.3.6.1.5.5.7.3.1,1.3.6.1.5.5.7.3.2") -CertStoreLocation cert:\CurrentUser\My -KeyExportPolicy Exportable -NotAfter (Get-Date).AddYears(10) -NotBefore (Get-Date).AddDays(-1)

Refresh certificate on windows compute node

  1. On the HPC Cluster Manager, go to Deployment To-do List, and click Import a certificate for deployment to import the new CN certificate.

  2. Import the new certificate to LocalMachine\My manually or with following powershell cmdlet on all windows compute nodes including head node(s)

    $PfxCertificatePath = <your-cert-location>
    $password = <your-cert-password>
    $cert = Import-PfxCertificate -FilePath $PfxCertificatePath -Exportable -Password $password -CertStoreLocation cert:\localMachine\my
    

    If you're using a self signed certificate, you also need to import the certificate without private key under LocalMachine\My so that this certificate is trusted. For example:

    Import-PfxCertificate -FilePath $PfxCertificatePath -Password $password -CertStoreLocation cert:\localMachine\root
    
  3. Update the registry key so that HPC services running on the compute node will pick the new certificate for communication

    Set-ItemProperty -Path HKLM:\SOFTWARE\Microsoft\HPC -Name SSLThumbPrint -Value $cert.Thumbprint
    
  4. Restart the HPC Services on the compute node including hpcmanagement, hpcnodemanager, hpcmonitoringclient and hpcsoadiagmon services

Tip

You could try to use clusrun commands to import the certificate on all compute nodes. To do this, you can download PowerShell script Update-HpcCommunicationCert.ps1 and copy it to a network share which all HPC nodes can access, for example \headnode\REMINST. Then on the HPC Cluster Manager, click Resource Management -> Nodes. Select all the compute, broker, and workstation nodes without head node(s), and click Run Command. In the Command line field, enter the following command line (fill the correct values for headnode and password), and click Run:

```CLI
PowerShell.exe -ExecutionPolicy ByPass -Command "\\<headnode>\REMINST\Update-HpcCommunicationCert.ps1 -PfxFilePath \\<headnode>\REMINST\Certificates\HpcCnCommunication.pfx -Password <password> -RunAsScheduledTask"
```

Refresh certificate on Linux compute node

Use setup script \\<headnode>\REMINST\LinuxNodeAgent\setup.py to refresh the certificate

mkdir /LinuxNodeAgent
mount -t cifs //<headnode>/REMINST/LinuxNodeAgent /LinuxNodeAgent -o vers=2.1,username=<username>,dir_mode=0777,file_mode=0777,password='<password>'
cd /LinuxNodeAgent
python setup.py -updatecert -certfile:<certfile> -certpasswd:<certpass>

Refresh certificate for single Headnode

  1. Prepare the new HN certificate and export it as PFX format file with private key (for example NewHnCert.pfx). If the new HN certificate is self-signed and different from the certificate you used on the compute nodes, export it as CER format file without private key (for example NewHnCert.cer) as well.

  2. If the new HN certificate is self-signed and different from the certificate you used on the compute nodes, do the following steps to make the compute nodes trust the new HN certificate:

    • Copy the NewHnCert.cer to \\<headnode>\REMINST\Certificates\HpcHnPublicCert.cer
    • On the HPC Cluster Manager, click Resource Management -> Nodes. Select all the compute, broker, and workstation nodes, and click Run Command. In the Command line field, enter the following command line (fill the correct values for headnode and password), and click Run (Please do not select head node):
      PowerShell.exe -ExecutionPolicy ByPass -Command "Import-certificate -FilePath \\<headnode>\REMINST\Certificates\HpcHnPublicCert.cer -CertStoreLocation cert:\LocalMachine\Root"
      
  3. Open Windows PowerShell console as administrator on the head node, and run the following command then reboot the head node or restart all HPC services on the head node:

    Set-ItemProperty -Path HKLM:\SOFTWARE\Microsoft\HPC -Name SSLThumbPrint -Value $cert.Thumbprint
    
  4. If you are using Burst to Azure IaaS nodes feature, refer to here to upload the NewHnCert.pfx to Azure key vault.

  5. On HPC Cluster Manager, click Configuration->Set Azure Deployment Configuration to set the information of the new Azure key vault secret on Azure Key Vault Certificate page.

Refresh certificate for three Head nodes

  1. On every headnode, you need import the new cert to LocalMachine\My with private key and to LocalMachine\Root and update the reg key of SSLThumbPrint the same way as single head node mentioned in above steps.

  2. Reboot head node and wait until it becomes healthy in the service fabric explorer then reboot next. When all head nodes rebooted, the new certificate is in use for the cluster

  3. Remove the old certificate in the system

Refresh Service Fabric certificate

Upgrade Service Fabric

You may first Upgrade Service Fabric runtime to the latest version and all head nodes shall have internet connectivity to http://download.microsoft.com. And do following steps on one of your head nodes

  1. Open a PowerShell console as administrator

  2. Run the following command to get the available Service Fabric versions. The version installed with HPC Pack 2016 Update 2 should be 6.3.176.9494

    Connect-ServiceFabricCluster
    Get-ServiceFabricRegisteredClusterCodeVersion
    
  3. Run the following command to upgrade the Service Fabric cluster

    Start-ServiceFabricClusterUpgrade -Code –CodePackageVersion 6.3.176.9494 -Monitored -FailureAction Rollback
    

Refresh certificate for service fabric cluster

  1. Log on one head node, copy C:\ProgramData\SF\ClusterConfig.json as ClusterConfigNew.json

  2. Edit ClusterConfigNew.json as following

    • Change "clusterConfigurationVersion": "1.0.0" to "clusterConfigurationVersion": "1.0.1"

    • Change "apiVersion": "2015-01-01-alpha" to "apiVersion": "01-2017"

    • Update the thumbprint in the following session to new one

      "security": {
            "metadata": "The Credential type X509 indicates this is cluster is secured using X509 Certificates. The thumbprint format is - d5 ec 42 3b 79 cb e5 07 fd 83 59 3c 56 b9 d5 31 24 25 42 64.",
            "ClusterCredentialType": "X509",
            "ServerCredentialType": "X509",
            "CertificateInformation": {
              "ClusterCertificate": {
                "Thumbprint": "c440e0a9034372f053503d8e10523a67b1492156",
                "X509StoreName": "My"
              },
              "ServerCertificate": {
                "Thumbprint": "c440e0a9034372f053503d8e10523a67b1492156",
                "X509StoreName": "My"
              }
            }
      
  3. Start upgrade service fabric cluster configuration. Open powershell (run as administrator), run the following command

    Connect-ServiceFabricCluster
    Start-ServiceFabricClusterConfigurationUpgrade -ClusterConfigPath .\ClusterConfigNew.json
    

    then you run the following command to query the upgrade status, it may need several minutes to upgrade (based on our testing, need less than 30 minutes)

    Get-ServiceFabricClusterConfigurationUpgradeStatus