HPC 2019 Linux worker node relocation error

ZoliSz 1 Reputation point
2022-06-30T12:21:17.673+00:00

Hi all,

We have an HPC 2019 deployment on-prem.
When trying to add an ubuntu 20.04 worker to the cluster hpcagent service starts , but the node does not join the cluster as nodemanager keeps crashing.

hpclinuxagent.log:
ERROR:HPC node manager process crashes: 127
Restart HPC node manager process after 60 seconds

When starting the node manager manually I can see the following error:

relocation error: /lib/x86_64-linux-gnu/libnss_files.so.2: symbol __libc_readline_unlocked version GLIBC_PRIVATE not defined in file libc.so.6 with link time reference

Versions:
HPC: HPC 2019 6.1.7531.0
OS: Ubuntu 20.04.4 LTS
libc: libc6/focal,now 2.31-0ubuntu9.9

Before this we used the same node in HPC 2016 without issues.
Also tried installing a completely fresh Ubuntu and got the same error.

Any ideas on what is causing this and how to fix ?

Thanks,
Zoltan

Azure HPC Cache
Azure HPC Cache
An Azure service that provides file caching for high-performance computing.
23 questions
{count} votes

1 answer

Sort by: Most helpful
  1. vipullag-MSFT 24,196 Reputation points Microsoft Employee
    2022-07-13T11:24:31.617+00:00

    @ZoliSz

    Apologies in delayed response on this.

    I checked with internal team on this and got confirmation that current Linux node manager in HPC Pack 2019 can not run on Ubuntu20.04.

    Yes, you were correct that this is working on HPC 2016.

    Hope that helps.
    If the suggested response helped you resolve your issue, please 'Accept as answer', so that it can help others in the community looking for help on similar topics.