Very slow and long running simple querry

Morpheuss 1 Reputation point
2021-08-30T08:51:12.857+00:00

from azureml.opendatasets import NycTlcGreen

data = NycTlcGreen()
df = data.to_spark_dataframe()

Display 10 rows

display(df.limit(10))

run for over 40 min without ever ending : Conf : (8 vcpu /64 GO 3nodes).

Any help would be appreciated. Nothing in the Queue, no previous job, spark pool basic config.

Many thanks for any hint.

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,430 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. PRADEEPCHEEKATLA-MSFT 78,576 Reputation points Microsoft Employee
    2021-08-31T07:27:29.487+00:00

    Hello @Morpheuss ,

    Welcome to the Microsoft Q&A platform.

    We haven't experienced the above behaviour using Synapse Apache Spark pools till date.

    This issue looks strange. For a deeper investigation and immediate assistance on this issue, if you have a support plan you may file a support ticket.

    As per the test from my end on Synapse Apache Spark Pool: Medium (8 vCores/64 GB).

    On a new cluster it took nearly 3mins 20 secs.

    127833-image.png

    On a running cluster which took just 15 secs.

    127851-image.png

    Hope this helps. Do let us know if you any further queries.

    ---------------------------------------------------------------------------

    Please "Accept the answer" if the information helped you. This will help us and others in the community as well.