Vacuum

Clean up files associated with a table. There are different versions of this command for Apache Spark and Delta tables.

Vacuum a Spark table

VACUUM ([db_name.]table_name|path) [RETAIN num HOURS]

RETAIN num HOURS

The retention threshold.

Recursively vacuum directories associated with the Spark table and remove uncommitted files older than a retention threshold. The default threshold is 7 days. Azure Databricks automatically triggers VACUUM operations as data is written. See Clean up uncommitted files for more information.

Vacuum a Delta table (Delta Lake on Azure Databricks)

VACUUM [db_name.]table_name|path [RETAIN num HOURS] [DRY RUN]

Recursively vacuum directories associated with the Delta table and remove files that are no longer in the latest state of the transaction log for the table and are older than a retention threshold. The default threshold is 7 days. VACUUM operations on Delta tables are not triggered automatically. See Vacuum for more information.

If you run VACUUM on a Delta table, you lose the ability time travel back to a version older than the default 7 day data retention period.

RETAIN num HOURS

The retention threshold.

DRY RUN

Return a list of files to be deleted.