question

RyanAbbey-0701 avatar image
0 Votes"
RyanAbbey-0701 asked RyanAbbey-0701 commented

Synapse Spark pipeline - choosing cluster

Say I have some small files and some very large files all being processed via Synapse pipeline calls to Spark... how do I say the small files should run on a small cluster and the very large files on a bigger cluster? There does not seem to be much available on where spark sessions should run

azure-synapse-analytics
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

1 Answer

PRADEEPCHEEKATLA-MSFT avatar image
0 Votes"
PRADEEPCHEEKATLA-MSFT answered RyanAbbey-0701 commented

Hello @RyanAbbey-0701,

Thanks for the question and using MS Q&A platform.

Unfortunately, there is no built-in mechanism to prioritize the jobs based on the file sizes.

Azure Synapse provides this feature out of box in Apache Spark pools.

Apache Spark pools provide the ability to automatically scale up and down compute resources based on the amount of activity.

  • When the autoscale feature is enabled, you can set the minimum and maximum number of nodes to scale.

  • When the autoscale feature is disabled, the number of nodes set will remain fixed.

For more details, refer to Apache Spark pool configurations in Azure Synapse Analytics and Automatically scale Azure Synapse Analytics Apache Spark pools.

Hope this helps. Do let us know if you any further queries.


Please "Accept the answer" if the information helped you. This will help us and others in the community as well.

· 10
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Apache Spark pools is what we configure, correct? Currently I have three pools defined, how do I direct the pipeline to use a specific pool?

0 Votes 0 ·

Hello @RyanAbbey-0701,

Every activity in Synapse Pipeline is associated with the anyone of the Apache Spark Pool.

Example: If you select any Synapse Notebook/Spark Job definition is associated with the Apache Spark Pool.

0 Votes 0 ·
RyanAbbey-0701 avatar image RyanAbbey-0701 PRADEEPCHEEKATLA-MSFT ·

I have no idea what that means, sorry... I have 3 spark pools, can I choose, from the pipeline, which of those the notebook should run against? If so, how?

0 Votes 0 ·
Show more comments