question

RyanAbbey-0701 avatar image
0 Votes"
RyanAbbey-0701 asked MartinJaffer-MSFT answered

Optimize Delta Lake

Can a Synapse delta lake (Spark 2.4, delta 0.6) table be optimised in a similar vein to a Databricks delta table?

If not, what optimisation options are available as we've loaded a lot of small files and found the deterioration quite significant (as we're only talking 80 files, the deterioration is pretty appalling really!)

azure-synapse-analytics
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

1 Answer

MartinJaffer-MSFT avatar image
0 Votes"
MartinJaffer-MSFT answered

Hello again @RyanAbbey-0701 and welcome back.

The Databricks optimization speaks of two main functions, Z-ordering and compacting.

Compacting is part of the base "Delta" package and not unique to Databricks. (See here) Compacting must also be available in Synapse.

I am less certain about the Z-ordering. It may be Databricks administering to Delta.

Aside from that, partitioning and file-size is available on both Synapse and Databricks.

Ahh, I just found another optimize, Auto-optimize during write. Can you confirm which optimize you are asking about?


· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Thanks for the response... the compaction was the one I was thinking of and am looking in to whether that resolves our issue.

Z-ordering doesn't look to be an option (currently anyway) and the auto-optimize seems a bit in the air... would the following work in Synapse?

set spark.databricks.delta.properties.defaults.autoOptimize.optimizeWrite = true;
set spark.databricks.delta.properties.defaults.autoOptimize.autoCompact = true;

0 Votes 0 ·

I'm not sure. Let's try it out together!

0 Votes 0 ·