Delta Lake is an open source storage layer that brings reliability to data lakes. Delta Lake provides ACID transactions, scalable metadata handling, and unifies streaming and batch data processing. Delta Lake runs on top of your existing data lake and is fully compatible with Apache Spark APIs.
Delta Lake on Azure Databricks allows you to configure Delta Lake based on your workload patterns and provides optimized layouts and indexes for fast interactive queries.
This is the documentation for Delta Lake on Azure Databricks.
- Introduction to Delta Lake
- Introductory Notebooks
- Table Batch Reads and Writes
- Table Streaming Reads and Writes
- Table Deletes, Updates, and Merges
- Table Utility Commands
- Delta Lake API Reference
- Concurrency Control
- Migrate Workloads to Delta Lake
- Best Practices
- Frequently Asked Questions (FAQ)
- What is Delta Lake?
- How is Delta Lake related to Apache Spark?
- What format does Delta Lake use to store data?
- How can I read and write data with Delta Lake?
- Where does Delta Lake store the data?
- Can I stream data directly into and from Delta tables?
- Does Delta Lake support writes or reads using the Spark Streaming DStream API?
- When I use Delta Lake, will I be able to port my code to other Spark platforms easily?
- How do Delta tables compare to Hive SerDe tables?
- What DDL and DML features does Delta Lake not support?
- Does Delta Lake support multi-table transactions?
- How can I change the type of a column?
- What does it mean that Delta Lake supports multi-cluster writes?
- Can I modify a Delta table from different workspaces?
- Can I access Delta tables outside of Databricks Runtime?
- Additional Resources