Azure VMSS mount Nvme disk issue

Fanfan Wu 0 Reputation points
2024-03-12T04:40:00.6533333+00:00

Hi, we met VMSS Nvme disk mount issue in our vmss cluster, and this issue occurs frequently.

We created a path which named "/ssd", mounted it into "/dev/nvme0n1" path and run docker container under this path. But we observed mount missed without any operation in our-side and the frequency is very high, which is blocking our services, we have 2 questions about it:

  1. the root cause of mount missed?
  2. how should we do to prevent this issue

More info:

vmss cluster name: prod-nam1-eastus-00-runner

subscription id:*******

environment: production

issue frequency:

User's image

issued vm name:

prod-nam1-eastus-00-runner-00000J

prod-nam1-eastus-00-runner-00000C

prod-nam1-eastus-00-runner-00001K

prod-nam1-eastus-00-runner-00000B

Azure Disk Storage
Azure Disk Storage
A high-performance, durable block storage designed to be used with Azure Virtual Machines and Azure VMware Solution.
575 questions
Azure Virtual Machine Scale Sets
Azure Virtual Machine Scale Sets
Azure compute resources that are used to create and manage groups of heterogeneous load-balanced virtual machines.
348 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Nehruji R 2,126 Reputation points Microsoft Vendor
    2024-03-12T09:29:36.14+00:00

    Hello Fanfan Wu,

    Greetings! Welcome to Microsoft Q&A Forum.

    The frequent occurrence of mount issues without any direct operation on your side can have several potential root causes. Here are some possibilities that you need to check,

    Disk Resizing and VMSS Updates: When you expand a data disk (such as your Nvme disk), it’s essential to ensure that both the VMSS model and the individual VM instances are aware of the new disk size. Sometimes, the VMSS model might not propagate the updated disk size to all instances immediately.

    Azure Maintenance and Live Migration: VMSS instances might undergo planned maintenance or live migration. During these processes, the local NVMe disk might be temporarily unavailable, leading to mount issues.

    Filesystem Errors: If there are filesystem errors or inconsistencies on the NVMe disk, it could cause mounting problems.

    Resource Constraints: Insufficient resources (CPU, memory, etc.) on the VMSS instances could impact disk operations.

    To prevent the frequent mount issues, consider the following steps:

    • Ensure Proper Disk Expansion: When expanding the Nvme disk, follow these steps:
    • Update the VMSS model to reflect the new disk size using the az vmss update command.
    • Deallocate and restart the VMSS instances (not just stop/start). This ensures that the updated disk size is recognized by all instances.
    • Verify that the expanded size is correctly displayed on the instances.
    • Monitor Azure Maintenance Events: Keep an eye on Azure maintenance events. If instances are being migrated or updated, it might impact disk availability.
    • Check for Filesystem Errors: Regularly check the filesystem integrity on the NVMe disk. Use tools like fsck or xfs_repair to identify and fix any issues.
    • Resource Monitoring: Monitor resource utilization (CPU, memory, disk I/O) on the VMSS instances. Ensure that there are no resource bottlenecks affecting disk operations.

     

    Hope this answer helps! Please let us know if you have any further queries. I’m happy to assist you further.


    Please "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.