H 系列和 N 系列 VM 的已知問題Known issues with H-series and N-series VMs

本文會在使用 H 系列N 系列 HPC 和 GPU vm 時,嘗試列出最新的常見問題及其解決方案。This article attempts to list recent common issues and their solutions when using the H-series and N-series HPC and GPU VMs.

Standard_HB120rs_v3 上的快取拓撲Cache topology on Standard_HB120rs_v3

lstopo 在 Standard_HB120rs_v3 VM 大小上顯示不正確的快取拓撲。lstopo displays incorrect cache topology on the Standard_HB120rs_v3 VM size. 它可能會顯示每個 NUMA 只有 32 MB L3。It may display that there’s only 32 MB L3 per NUMA. 但實際上,每個 NUMA 都有 120 MB 的 L3 (如預期),因為相同的 480 MB (L3 至整個 VM)可與其他受限制核心 HBv3 VM 大小搭配使用。However in practice there is indeed 120 MB L3 per NUMA as expected since the same 480 MB of L3 to the entire VM is available as with the other constrained-core HBv3 VM sizes. 這是顯示正確值的表面錯誤,不應該影響工作負載。This is a cosmetic error in displaying the correct value which should not impact workloads.

qp0 存取限制qp0 Access Restriction

為了防止可能導致安全性弱點的低層級硬體存取,來賓 Vm 無法存取佇列配對0。To prevent low-level hardware access that can result in security vulnerabilities, Queue Pair 0 is not accessible to guest VMs. 這應該只會影響與「操作工具」的系統管理相關的動作,以及執行一些像是 ibdiagnet,但不是終端使用者應用程式的可分析診斷。This should only affect actions typically associated with administration of the ConnectX InfiniBand NIC, and running some InfiniBand diagnostics like ibdiagnet, but not end-user applications.

Ubuntu 上的 MOFED 安裝MOFED installation on Ubuntu

在具有核心版本和更新版本的 Ubuntu-18.04 架構 marketplace VM 映射上 5.4.0-1039-azure #42 ,某些較舊的 MELLANOX OFED 會導致 VM 開機時間增加,在某些情況下會增加30分鐘。On Ubuntu-18.04 based marketplace VM images with kernels version 5.4.0-1039-azure #42 and newer, some older Mellanox OFED are incompatible causing an increase in VM boot time up to 30 minutes in some cases. 這是針對 Mellanox OFED 版本 5.2-1.0.4.0 和 5.2-2.2.0.0 回報。This has been reported for both Mellanox OFED versions 5.2-1.0.4.0 and 5.2-2.2.0.0. 此問題已透過 Mellanox OFED 5.3-1.0.0.1 解決。The issue is resolved with Mellanox OFED 5.3-1.0.0.1. 如果需要使用不相容的 OFED,解決方案是使用 標準: UbuntuServer: 18_04-lts-gen2: 18.04.202101290 marketplace VM 映射或更舊的映射,而不是更新核心。If it is necessary to use the incompatible OFED, a solution is to use the Canonical:UbuntuServer:18_04-lts-gen2:18.04.202101290 marketplace VM image or older and not to update the kernel.

MPI QP 建立錯誤MPI QP creation errors

如果在執行任何 MPI 工作負載的過程中,會擲回如下所示的可將 QP 建立錯誤(如下所示),建議您重新開機 VM,並重新嘗試工作負載。If in the midst of running any MPI workloads, InfiniBand QP creation errors such as shown below, are thrown, we suggest rebooting the VM and re-trying the workload. 未來將會修正此問題。This issue will be fixed in the future.

ib_mlx5_dv.c:150  UCX  ERROR mlx5dv_devx_obj_create(QP) failed, syndrome 0: Invalid argument

當問題觀察到的時候,您可以驗證佇列組數目上限的值,如下所示。You may verify the values of the maximum number of queue-pairs when the issue is observed as follows.

[user@azurehpc-vm ~]$ ibv_devinfo -vv | grep qp
max_qp: 4096

HB、HC、HBv2 和 NDv2 的加速網路Accelerated Networking on HB, HC, HBv2, and NDv2

Azure 加速網路 現在可在 RDMA 和可支援的虛擬機器上使用,並可使用 HBHCHBv2NDv2的 sr-iov。Azure Accelerated Networking is now available on the RDMA and InfiniBand capable and SR-IOV enabled VM sizes HB, HC, HBv2, and NDv2. 這項功能現在可在 (高達 30 Gbps 的) ,以及透過 Azure Ethernet 網路的延遲來增強。This capability now allows enhanced throughout (up to 30 Gbps) and latencies over the Azure Ethernet network. 雖然這與網路上的 RDMA 功能不同,但這項功能的某些平臺變更可能會在透過未受影響的情況下執行工作時影響某些 MPI 的行為。Though this is separate from the RDMA capabilities over the InfiniBand network, some platform changes for this capability may impact behavior of certain MPI implementations when running jobs over InfiniBand. 具體而言,某些 Vm 上的「數量」介面可能會有稍微不同的名稱 (mlx5_1 與之前的 mlx5_0) 不同,而這可能需要調整 MPI 命令列,尤其是使用 UCX 介面時,通常是 OpenMPI 和 HPC X (。Specifically the InfiniBand interface on some VMs may have a slightly different name (mlx5_1 as opposed to earlier mlx5_0) and this may require tweaking of the MPI command lines especially when using the UCX interface (commonly with OpenMPI and HPC-X). 最簡單的解決方案目前可能是在 CentOS-HPC VM 映射上使用最新的 HPC X,或在不需要時停用加速網路。The simplest solution currently may be to use the latest HPC-X on the CentOS-HPC VM images or disable Accelerated Networking if not required. >techcommunity 文章 有更多詳細資料,說明如何解決任何觀察到的問題。More details on this are available on this TechCommunity article with instructions on how to address any observed issues.

非 SR-IOV Vm 上的無駕駛驅動程式安裝InfiniBand driver installation on non-SR-IOV VMs

目前 H16r、H16mr 和 NC24r 不會啟用 SR-IOV。Currently H16r, H16mr and NC24r are not SR-IOV enabled. 您可以在 這裡瞭解有關未經過的堆疊分叉的一些詳細資料。Some details on the InfiniBand stack bifurcation are here. 當非 SR-IOV VM 大小需要 ND 驅動程式時,您可以使用 OFED 驅動程式在啟用 SR-IOV 的 VM 大小上設定自動調整。InfiniBand can be configured on the SR-IOV enabled VM sizes with the OFED drivers while the non-SR-IOV VM sizes require ND drivers. 此 IB 支援適用于 CentOS、RHEL 和 UbuntuThis IB support is available appropriately for CentOS, RHEL, and Ubuntu.

在 H 系列和 N 系列 Vm 上搭配使用 cloud init 與 Ubuntu 的重複 MACDuplicate MAC with cloud-init with Ubuntu on H-series and N-series VMs

Ubuntu VM 映射上的雲端初始化有一個已知的問題,因為它會嘗試顯示 IB 介面。There is a known issue with cloud-init on Ubuntu VM images as it tries to bring up the IB interface. 這可能會在 VM 重新開機時,或在一般化之後嘗試建立 VM 映射時發生。This can happen either on VM reboot or when trying to create a VM image after generalization. VM 開機記錄檔可能會顯示如下的錯誤:The VM boot logs may show an error like so:

“Starting Network Service...RuntimeError: duplicate mac found! both 'eth1' and 'ib0' have mac”.

此「在 Ubuntu 上使用雲端初始化的重複 MAC」是已知的問題。This 'duplicate MAC with cloud-init on Ubuntu" is a known issue. 這將在較新的核心中解決。This will be resolved in newer kernels. 如果遇到問題,解決方法是:IF the issue is encountered, the workaround is:

  1. 部署 (Ubuntu 18.04) marketplace VM 映射Deploy the (Ubuntu 18.04) marketplace VM image
  2. 安裝必要的軟體套件,以 在此處 啟用 IB (指示) Install the necessary software packages to enable IB (instruction here)
  3. 編輯 waagent 以變更 EnableRDMA = yEdit waagent.conf to change EnableRDMA=y
  4. 在雲端初始化中停用網路功能Disable networking in cloud-init
    echo network: {config: disabled} | sudo tee /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg
    
  5. 編輯由雲端初始化所產生的 netplan 網路設定檔以移除 MACEdit netplan's networking configuration file generated by cloud-init to remove the MAC
    sudo bash -c "cat > /etc/netplan/50-cloud-init.yaml" <<'EOF'
    network:
      ethernets:
        eth0:
          dhcp4: true
      version: 2
    EOF
    

HB 系列 Vm 上的 DRAMDRAM on HB-series VMs

HB 系列 Vm 目前只能將 228 GB 的 RAM 公開給來賓 Vm。HB-series VMs can only expose 228 GB of RAM to guest VMs at this time. 同樣地,HBv3 Vm 上的 HBv2 和 448 GB 會有 458 GB。Similarly, 458 GB on HBv2 and 448 GB on HBv3 VMs. 這是因為 Azure 虛擬程式的已知限制,可防止將頁面指派給 AMD CCX (NUMA 網域的本機 DRAM,) 保留給來賓 VM。This is due to a known limitation of Azure hypervisor to prevent pages from being assigned to the local DRAM of AMD CCX’s (NUMA domains) reserved for the guest VM.

GSS ProxyGSS Proxy

GSS Proxy 在 CentOS/RHEL 7.5 中有已知的錯誤,可在與 NFS 搭配使用時,將其視為顯著的效能和回應性。GSS Proxy has a known bug in CentOS/RHEL 7.5 that can manifest as a significant performance and responsiveness penalty when used with NFS. 這可透過下列方式降低:This can be mitigated with:

sed -i 's/GSS_USE_PROXY="yes"/GSS_USE_PROXY="no"/g' /etc/sysconfig/nfs

快取清除Cache Cleaning

在 HPC 系統上,在作業完成後清除記憶體,直到下一個使用者被指派相同的節點時,通常會很有用。On HPC systems, it is often useful to clean up the memory after a job has finished before the next user is assigned the same node. 在 Linux 中執行應用程式之後,您可能會發現您的可用記憶體會在緩衝區記憶體增加時減少,儘管不會執行任何應用程式。After running applications in Linux you may find that your available memory reduces while your buffer memory increases, despite not running any applications.

清除前命令提示字元的螢幕擷取畫面

使用 numactl -H 將會顯示哪些 NUMAnode (s) 記憶體會 (可能的所有) 進行緩衝處理。Using numactl -H will show which NUMAnode(s) the memory is buffered with (possibly all). 在 Linux 中,使用者可以用三種方式清除快取,以將緩衝或快取的記憶體傳回「free」。In Linux, users can clean the caches in three ways to return buffered or cached memory to ‘free’. 您必須是 root 或具有 sudo 許可權。You need to be root or have sudo permissions.

echo 1 > /proc/sys/vm/drop_caches [frees page-cache]
echo 2 > /proc/sys/vm/drop_caches [frees slab objects e.g. dentries, inodes]
echo 3 > /proc/sys/vm/drop_caches [cleans page-cache and slab objects]

清除之後,命令提示字元的螢幕擷取畫面

核心警告Kernel warnings

當您在 Linux 下啟動 HB 系列 VM 時,可能會忽略下列核心警告訊息。You may ignore the following kernel warning messages when booting an HB-series VM under Linux. 這是因為 Azure 虛擬程式在一段時間內將會解決的已知限制。This is due to a known limitation of the Azure hypervisor that will be addressed over time.

[  0.004000] WARNING: CPU: 4 PID: 0 at arch/x86/kernel/smpboot.c:376 topology_sane.isra.3+0x80/0x90
[  0.004000] sched: CPU #4's llc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency.
[  0.004000] Modules linked in:
[  0.004000] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 3.10.0-957.el7.x86_64 #1
[  0.004000] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090007 05/18/2018
[  0.004000] Call Trace:
[  0.004000] [<ffffffffb8361dc1>] dump_stack+0x19/0x1b
[  0.004000] [<ffffffffb7c97648>] __warn+0xd8/0x100
[  0.004000] [<ffffffffb7c976cf>] warn_slowpath_fmt+0x5f/0x80
[  0.004000] [<ffffffffb7c02b34>] ? calibrate_delay+0x3e4/0x8b0
[  0.004000] [<ffffffffb7c574c0>] topology_sane.isra.3+0x80/0x90
[  0.004000] [<ffffffffb7c57782>] set_cpu_sibling_map+0x172/0x5b0
[  0.004000] [<ffffffffb7c57ce1>] start_secondary+0x121/0x270
[  0.004000] [<ffffffffb7c000d5>] start_cpu+0x5/0x14
[  0.004000] ---[ end trace 73fc0e0825d4ca1f ]---

下一步Next steps