Best Practices for Dropping a Managed Delta Lake Table
Regardless of how you drop a managed table, it can take a significant amount of time, depending on the data size. Delta Lake managed tables in particular contain a lot of metadata in the form of transaction logs, and they can contain duplicate data files. If a Delta table has been in use for a long time, it can accumulate a very large amount of data.
In the Azure Databricks environment, there are two ways to drop tables:
DROP TABLEin a notebook cell.
- Click Delete in the UI.
Even though you can delete tables in the background without affecting workloads, it is always good to make sure that you run DELETE FROM and VACUUM before you start a drop command on any table. This ensures that the metadata and file sizes are cleaned up before you initiate the actual data deletion.
For example, if you are trying to delete the Delta table
events, run the following commands before you start the
DROP TABLE command:
- Run DELETE FROM:
DELETE FROM events
- Run VACUUM with an interval of zero:
VACUUM events RETAIN 0 HOURS
These two steps reduce the amount of metadata and number of uncommitted files that would otherwise increase the data deletion time.