When creating a batch pool, a subset of the nodes will become unusable without any errors. Sometimes batch will reschedule these nodes and successfully start them, other times they remain unusable. I'm not sure what could be going wrong here, we have a pretty simple setup:
dedicated node count=1
low priority node count=79
node start task=None
We're using a custom docker image to deploy our code which works well and hasn't caused node startup issues before. Similar posts have been made about unusable nodes, but these are generally due to application package issues & VM image issues which aren't at play here.
I'm not sure where to begin troubleshooting here, any help or suggestions would be appreciated!