真空Vacuum

清除與資料表相關聯的檔案。Clean up files associated with a table. 此命令的 Apache Spark 和 Delta 資料表有不同的版本。There are different versions of this command for Apache Spark and Delta tables.

清理 Spark 資料表Vacuum a Spark table

VACUUM [ [db_name.]table_name | path] [RETAIN num HOURS]

RETAIN num HOURS

保留閾值。The retention threshold.

與 Spark 資料表相關聯的遞迴清除目錄,並移除早于保留閾值的未認可檔案。Recursively vacuum directories associated with the Spark table and remove uncommitted files older than a retention threshold. 預設閾值為7天。The default threshold is 7 days. Azure Databricks 會 VACUUM 在寫入資料時自動觸發作業。Azure Databricks automatically triggers VACUUM operations as data is written. 請參閱 清除未認可的檔案。See Clean up uncommitted files.

在 Azure Databricks) 上將差異資料表 ( delta Lake Vacuum a Delta table (Delta Lake on Azure Databricks)

VACUUM [ [db_name.]table_name | path] [RETAIN num HOURS] [DRY RUN]

與 Delta 資料表相關聯的遞迴清除目錄,並移除已不再處於資料表交易記錄之最新狀態的資料檔案,而且比保留閾值還舊。Recursively vacuum directories associated with the Delta table and remove data files that are no longer in the latest state of the transaction log for the table and are older than a retention threshold. 系統會根據以邏輯方式從差異的交易記錄檔中移除的檔案 + 保留時數來刪除檔案,而不是儲存系統上的修改時間戳記。Files are deleted according to the time they have been logically removed from Delta’s transaction log + retention hours, not their modification timestamps on the storage system.

預設閾值為7天。The default threshold is 7 days. Azure Databricks 不會自動 觸發 VACUUM Delta 資料表上的作業。Azure Databricks does not automatically trigger VACUUM operations on Delta tables. 請參閱 移除差異資料表不再參考的檔案。See Remove files no longer referenced by a Delta table.

如果您 VACUUM 在 Delta 資料表上執行,您會失去復原到比指定的資料保留期限還舊的版本的能力。If you run VACUUM on a Delta table, you lose the ability time travel back to a version older than the specified data retention period.

RETAIN num HOURS

保留閾值。The retention threshold.

DRY RUN

傳回要刪除的檔案清單。Return a list of files to be deleted.