Azure-DataBricks Spark not performing
Hi All,
My requirement was to process approx 1TB of data stored in Azure container.The container contains millions of json files which are multi part in nature .
For this i was using HdInsight which was able to process the data in 45 mins approx :
Worker Nodes (1-4)autoscale - 16 cores 112 gb
Headnodes-2 - 4 cores 28gb
we planned to migrate to Azure Databricks Spark cluster
configuration of cluster used
Worker Nodes (4-10) autoscale - 8 cores 56gb - memory optimized
Head nodes - 4 cores 28gb
But this keeps running for more then 2.5 hrs but still the process was not completed, and i can see it used 4 worker nodes to the maximum but does not scale up to leverage the remaining worker nodes to speed up the process.
Can any one help if i am doing something wrong here.