Data profiling on azure synapse using pyspark

Shivank.Agarwal 61 Reputation points
2021-06-01T08:06:33.313+00:00

I am trying to do the data profiling on synapse database using pyspark. I was able to create a connection and loaded data into DF.

import spark_df_profiling

report = spark_df_profiling.ProfileReport(jdbcDF)

but getting the below error.

101248-image.png

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,395 questions
Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
1,939 questions
{count} votes

Accepted answer
  1. PRADEEPCHEEKATLA-MSFT 77,676 Reputation points Microsoft Employee
    2021-06-02T06:15:35.29+00:00

    Hello @Shivank.Agarwal ,

    Welcome to the Microsoft Q&A platform.

    It looks like issue with the package spark-df-profiling 1.1.13 .

    The package spark-df-profiling 1.1.13 is quite old and could you please try with the latest package spark-df-profiling-new 1.1.14.

    101565-image.png

    As per the repro, I had tried with both the versions.

    Spark2 cluster is installed with spark-df-profiling 1.1.13, when I execute the query I get similar kind of error message as shown.

    101559-image.png

    Spark1 cluster is installed with spark-df-profiling-new 1.1.14, when I execute the query I’m able to successfully get the output.

    101602-image.png

    Hope this helps. Do let us know if you any further queries.

    ---------------------------------------------------------------------------

    Please "Accept the answer" if the information helped you. This will help us and others in the community as well.


1 additional answer

Sort by: Most helpful
  1. Gladys Cardozo 1 Reputation point
    2021-07-30T19:23:19.16+00:00

    Hi to all! I already tryied what you explain and it works! But my problem is I don't know how to read the object I obtained: <spark_df_profiling.ProfileReport object at 0x7fa1008dfb38>. I tryied to save to the azure blob but I don't what I'm doing wrong. I try this because I wanted to explore the html generated but seems like azure don't recognize it.

    How do you manage to open the result after executing profiling library?

    King Regards,

    Gladys.

    0 comments No comments