Convert To Delta (Delta Lake on Azure Databricks)
CONVERT TO DELTA ([db_name.]table_name|parquet.`path/to/table`) [NO STATISTICS] [PARTITIONED BY (col_name1 col_type1, col_name2 col_type2, ...)]
CONVERT TO DELTA parquet.path/to/table`` requires Databricks Runtime 5.2 or above.
CONVERT TO DELTA [db_name.]table_namerequires Databricks Runtime 6.1 or above.
Convert an existing Parquet table to a Delta table in-place. This command lists all the files in the directory, creates a Delta Lake transaction log that tracks these files, and automatically infers the data schema by reading the footers of all Parquet files. The conversion process collects statistics to improve query performance on the converted Delta table. If you provide a table name, the metastore is also updated to reflect that the table is now a Delta table.
Bypass statistics collection during the conversion process and finish conversion faster. After the table is converted to Delta Lake, you can use
OPTIMIZE … ZORDER BY to reorganize the data layout and generate statistics.
Partition the created table by the specified columns. Required if the data is partitioned. The conversion process aborts and throw an exception if the directory structure does not conform to the
PARTITIONED BY specification. If you do not provide the
PARTITIONED BY clause, the command assumes that the table is not partitioned.
Any file not tracked by Delta Lake is invisible and can be deleted when you run
VACUUM. You should avoid updating or appending data files during the conversion process. After the table is converted, make sure all writes go through Delta Lake.
It is possible that multiple external tables share the same underlying Parquet directory. In this case, if you run
CONVERT on one of the external tables, then you will not be able to access the other external tables because their underlying directory has been converted from Parquet to Delta Lake. To query or write to these external tables again, you must run
CONVERT on them as well.
CONVERT populates the catalog information, such as schema and table properties, to the Delta Lake transaction log. If the underlying directory has already been converted to Delta Lake and its metadata is different from the catalog metadata, a
convertMetastoreMetadataMismatchException will be thrown. If you want
CONVERT to overwrite the existing metadata in the Delta Lake transaction log, set the SQL configuration
spark.databricks.delta.convert.metadataCheck.enabled to false.
Undo the conversion
If you have performed Delta Lake operations such as
OPTIMIZE that can change the data files, first run the following command for garbage collection:
VACUUM delta.`path/to/table` RETAIN 0 HOURS
Then, delete the