Hello Green, Jim
Welcome to Microsoft Q&A Platform, thanks for posting your query here.
If Batch fails to allocate all nodes in a pool, the pool will remain in an error state indefinitely. In such cases, you can try to resize the pool to a smaller size and then resize it back to the original size. This will trigger a new allocation attempt for the nodes that failed to allocate earlier. If this does not work, you can try deleting the pool and creating a new one with the same configuration.
Regarding your second question, it would be more efficient and robust to allocate all large-task nodes in a single pool instead of creating a new pool per-task. This approach will help you avoid the overhead of creating and deleting pools for each task, and it will also help you better manage your resources. You can use the same pool for multiple tasks, and you can resize the pool as needed to accommodate the workload. This approach will also help you avoid the issue of pools remaining in an error state indefinitely.
Hope this helps.