Azure-DataBricks Spark not performing

Debasish22 1

Hi All,

My requirement was to process approx 1TB of data stored in Azure container.The container contains millions of json files which are multi part in nature .

For this i was using HdInsight which was able to process the data in 45 mins approx :

Worker Nodes (1-4)autoscale - 16 cores 112 gb
Headnodes-2 - 4 cores 28gb

we planned to migrate to Azure Databricks Spark cluster

configuration of cluster used

Worker Nodes (4-10) autoscale - 8 cores 56gb - memory optimized
Head nodes - 4 cores 28gb

But this keeps running for more then 2.5 hrs but still the process was not completed, and i can see it used 4 worker nodes to the maximum but does not scale up to leverage the remaining worker nodes to speed up the process.

Can any one help if i am doing something wrong here.

PRADEEPCHEEKATLA-MSFT 77,751 Reputation points Microsoft Employee

2022-04-18T06:33:05.65+00:00

Hello @Debasish22 ,

Welcome to the MS Q&A platform.

This issue looks strange. For a deeper investigation and immediate assistance on this issue, if you have a support plan you may file a support ticket.
PRADEEPCHEEKATLA-MSFT 77,751 Reputation points Microsoft Employee

2022-04-22T03:57:04.407+00:00

Hello @Debasish22 ,

Did you get a chance to open a support ticket for this issue?