How to recover when an Azure batch pool fails to allocate nodes

Green, Jim 20 Reputation points
2024-05-06T17:14:28.05+00:00

Our application uses several types of batch pools. There are two fixed pools (for tasks with and w/o MPI) and a continuous web job that manages scaling every minute. For larger tasks, we create a new pool with up to 20 nodes. Occasionally batch fails to allocate all nodes so the task does not start but the pool remains in an error state indefinitely. What would be the best way to recover from this so that the waiting task can run?Would it be more efficient/robust to allocate all large-task nodes in single pool instead of creating a new pool per-task?

Thanks.

Azure Batch
Azure Batch
An Azure service that provides cloud-scale job scheduling and compute management.
309 questions
0 comments No comments
{count} votes

Accepted answer
  1. vipullag-MSFT 25,126 Reputation points
    2024-05-07T10:21:56.29+00:00

    Hello Green, Jim

    Welcome to Microsoft Q&A Platform, thanks for posting your query here.

    If Batch fails to allocate all nodes in a pool, the pool will remain in an error state indefinitely. In such cases, you can try to resize the pool to a smaller size and then resize it back to the original size. This will trigger a new allocation attempt for the nodes that failed to allocate earlier. If this does not work, you can try deleting the pool and creating a new one with the same configuration.

    Regarding your second question, it would be more efficient and robust to allocate all large-task nodes in a single pool instead of creating a new pool per-task. This approach will help you avoid the overhead of creating and deleting pools for each task, and it will also help you better manage your resources. You can use the same pool for multiple tasks, and you can resize the pool as needed to accommodate the workload. This approach will also help you avoid the issue of pools remaining in an error state indefinitely.

    Hope this helps.


0 additional answers

Sort by: Most helpful