Azure Kubernetes Cluster: Errno 122 when attempting to concurrently file lock the same file in AzureFiles from = 100 nodes

Question

I have the following issue I am witnessing when I have a AKS k8s cluster with >100 nodes, and I attempt to have every node lock the same file which is located on a shared AzureFile mount; the 100th node to request the lock is returned an errno 122 (Disk Quote Exceeded.) . (I am doing this to in-parallel have a 100 node computation platform parse a dataset.)

  volumes:
  - azureFile:
      readOnly: false
      secretName: azure-secret
      shareName: aksshare

This happens always exactly on the 100th node to request the concurrent file lock; and I have never seen it on < 100 nodes; So I am assuming there is some Hard Limit on the amount of concurrent locks that are allowed.

Specifically I was curious if anyone had seen this; and if possibly there is some configuration setting that could be increased to allow more concurrent locks?

To simplify the scenario I wrote a simple C program (shown below) which is able to reproduce the problem.

(Other data points:
*I did try have 100 pids/program on a single kubernetes node lock the same file in an azure files mount and that did work.
*I did also have the 100 separate k8s nodes each lock a hostPath file, and that did work. (would expect that to work since those are different files per each host, but just to sanity check it.)
)

Simple repro-program:

int main(int argc, char *argv[])
{
     struct flock fltest = {0,0,0,0,0};
     fltest.l_type = F_RDLCK;
     int fd = open( argv[1], O_RDONLY, 0);
    printf("opened file: %s, fid:%d errno:%d
", argv[1], fd, errno);
    int irc = fcntl(fd, F_SETLK, &fltest);
    printf("locked file: %s, %d errno %d
", argv[1], irc, errno);
    if (irc == 0)
    {
        printf("Waiting 30 seconds while holding lock.
");
        sl_ep(30); 
    }
    fltest.l_type = F_UNLCK;
    irc = fcntl(fd, F_SETLK, &fltest);
    printf("Lock released irc:%d errno %d
", irc, errno);
}

Accepted Answer

Hi @David Zanter

Thank you for reaching out to Microsoft Q&A Forum ,

Looks like in your case you were hitting the maximum concurrent requests for the Azure Files.
Reviewed various other similar cases as well and unfortunately all these limits are Hard limits and there is no way out of it .

Hope you find another cloud solution for your use case .

Thanks & Regards,
Pradeep

----------------------------------------

Please don't forget to accept the answer if this clarifies your ask .

Azure Kubernetes Cluster: Errno 122 when attempting to concurrently file lock the same file in AzureFiles from >= 100 nodes

0 additional answers