question

RyanAbbey-0701 avatar image
0 Votes"
RyanAbbey-0701 asked

Synapse Spark consumption

We have a series of executions to carry out within Spark for which we have created a small spark pool with min/max 3/10 nodes. Observations note that this configuration results in a max of 3 notebooks executing at once (does not appear to be more than 2 but not part of this issue) and while all the notebook processes start up around the same time, all except the first 2/3 are effectively in a "waiting" state

With each of our notebooks taking approx. 2 minutes to run, this gives the tail end notebooks the appearance they have taken 20-30 minutes to run even though actual execution time is under 2 minutes. Looking at the "consumption" box that comes with a spark execution, those tail end notebooks have a very high "External activities" value

199062-image.png

So what I'm trying to understand, even though those notebooks are not actively running, are they generating cost? If they are all using the one Spark pool, then should we expect the cost to be purely be the number of active nodes on that pool for the duration they are running rather than a cost associated with the individual consumption values?

We don't want to sequentialise these notebooks and we don't want a spark pool with 50/100 nodes just so we don't have waiting notebooks so seeing these notebooks all with (relative) high consumption values is a little disconcerting


azure-synapse-analytics
image.png (11.2 KiB)
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

0 Answers