question

mayurikadam avatar image
0 Votes"
mayurikadam asked PRADEEPCHEEKATLA-MSFT commented

AnalysisException: Path does not exist: dbfs:/databricks/python/lib/python3.7/site-packages/sampleFolder/data;

I am packing the following code in a whl file:

 from pkg_resources import resource_filename
 def path_to_model(anomaly_dir_name: str, data_path: str):
     filepath = resource_filename(anomaly_dir_name, data_path)
     return filepath
 def read_data(spark) -> DataFrame:
     return (spark.read.parquet(str(path_to_model("sampleFolder", "data"))))

I confirmed that the whl file contains the parquet files under sampleFolder/data/ directory correctly. When i run this locally it works, but when i upload this whl file to dbfs and run then i get this error:

 AnalysisException: Path does not exist: dbfs:/databricks/python/lib/python3.7/site-packages/sampleFolder/data;

I confirmed that this directory actually does not exist: dbfs:/databricks/python Any idea what this error could be?

Thanks.



azure-databricks
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

mayurikadam avatar image
1 Vote"
mayurikadam answered PRADEEPCHEEKATLA-MSFT commented

the issue was 2-folds:

  1. When you package a python module in a whl file and deploy to databricks job, to access any data files within the whl files using spark, you need to specify the scheme ‘file:’. If left unspecified the spark automatically appends ‘dbfs:’ by default and tries to find the data files in dbfs eventually which do not exist. We need to make it search locally within the whl file.

  2. While using UDFs in python whl, do not use ‘decorators’. Decorators work well when testing in notebook as spark session is already available to you. But while testing in whl it does not work and fails at runtime as spark session gets initialized later and the UDF syntax is parsed first.


· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hello @mayurikadam,

Glad to know that your issue has resolved. And thanks for sharing the solution, which might be beneficial to other community members reading this thread.

0 Votes 0 ·
PRADEEPCHEEKATLA-MSFT avatar image
1 Vote"
PRADEEPCHEEKATLA-MSFT answered PRADEEPCHEEKATLA-MSFT commented

Hello @mayurikadam,

Thanks for the question and using MS Q&A platform.

You are experiencing this error message because the path doesn't exists.

Make sure you have upload a file to DBFS, and pass the exact path of whl file.

 Spark API Format - dbfs:/sampleFolder/data
 File API Format - /dbfs/sampleFolder/data

You may checkout the answer provided by @Alex on your SO thread.

Hope this helps. Do let us know if you any further queries.


Please "Accept the answer" if the information helped you. This will help us and others in the community as well.

· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hello @mayurikadam,

Just checking in to see if the above answer helped. If this answers your query, do click Accept Answer and Up-Vote for the same. And, if you have any further query do let us know.

0 Votes 0 ·

Hello @mayurikadam,

Following up to see if the above suggestion was helpful. And, if you have any further query do let us know.

0 Votes 0 ·