I'm trying to run an R script in paralell on azure batch. The script uses the xgboost package and pulls csv files from my azure storage account. The output is a csv file back in to my azure storage account. After a node boots it literally takes hours for my tasks to go from queued to running. This happens whether I use dedicated nodes or low priority nodes. If I add .packages = c('xgboost') to my foreach function Rstudio tells me "Job Preparation Status: Package(s) being installed" and that step takes literally hours to finish even if I only have 1 node in the pool.
I understand cloud computing resources take time to boot up, but is it supposed to take hours or is there something I can do to speed it up? It takes about 20 minutes for my PC to run 1 task on it's own.