question

91925677 avatar image
0 Votes"
91925677 asked ·

Getting different results when I run a Notebook in Data Factory vs manually.

Hi,

I have a pipeline that has seven Notebooks and, all of them are executing different SQL scripts and generates CSV files. Two Notebooks are working correctly but, the other five Notebooks are just creating CSV files with headers only(without any rows). However, when I run those Notebooks manually, they generate CSV files with headers and rows. I tried running the Notebooks in different pipelines but still, have the same problem. I'm not sure what causes this problem but I couldn't find any solution. Lastly, I'm using Sparks 3.

azure-data-factoryazure-databricks
· 5
10 |1000 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hello @UfuktepeEren-9284,

Welcome to the Microsoft Q&A platform.

This issue need deeper investigation, could you please share the below details:

  • In order to repro the scenario, can you share the notebook which you are experiencing the different results?

  • Could you share the exact steps which you are trying manually vs using Notebook in Data Factory.

In case if you need immediate assistance on this issue, you may a support ticket.

Regards,
PRADEEPCHEEKATLA-MSFT


We would like to invite you to participate in ADF Hackathon and win ADF hoodie, stickers for Category winners.


0 Votes 0 ·

Hello @UfuktepeEren-9284,

Just checking in if you have had a chance to see the previous response. We need the following information to understand/investigate this issue further.

0 Votes 0 ·
91925677 avatar image 91925677 PRADEEPCHEEKATLA-MSFT ·

Hello, sorry for the late response. I created a support ticket for this issue last week and I send the required information to them.

0 Votes 0 ·
Show more comments

1 Answer

MartinJaffer-MSFT avatar image
0 Votes"
MartinJaffer-MSFT answered ·

Summary of issues:

The results of a Databricks notebook were different when run directly from Databricks, versus Data Factory calling Databricks.

This notebook used temporary tables. Temporary tables are stored on the cluster. Because temporary tables are stored on the cluster, they do not survive cluster shut down.

The settings in Data Factory were set to create new cluster to run the notebook. This caused the temporary tables not to be retained between notebook runs.

After changing the settings in Data Factory to use an existing cluster, the results of running directly from Databricks, and being called via Data Factory matched. This is because the temporary tables persisted since the cluster was retained.

·
10 |1000 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.