Delta Lake is an open source storage layer that brings reliability to data lakes. Delta Lake provides ACID transactions, scalable metadata handling, and unifies streaming and batch data processing. Delta Lake runs on top of your existing data lake and is fully compatible with Apache Spark APIs.
Delta Lake on Azure Databricks allows you to configure Delta Lake based on your workload patterns. Azure Databricks also includes Delta Engine, which provides optimized layouts and indexes for fast interactive queries.
This section covers Delta Lake on Azure Databricks.
- Delta Lake quickstart
- Introductory notebooks
- Ingest data into Delta Lake
- Table batch reads and writes
- Table streaming reads and writes
- Table deletes, updates, and merges
- Table utility commands
- Table versioning
- API reference
- Concurrency control
- Migration Guide
- Best practices
- Frequently asked questions (FAQ)
- What is Delta Lake?
- How is Delta Lake related to Apache Spark?
- What format does Delta Lake use to store data?
- How can I read and write data with Delta Lake?
- Where does Delta Lake store the data?
- Can I stream data directly into and from Delta tables?
- Does Delta Lake support writes or reads using the Spark Streaming DStream API?
- When I use Delta Lake, will I be able to port my code to other Spark platforms easily?
- How do Delta tables compare to Hive SerDe tables?
- What DDL and DML features does Delta Lake not support?
- Does Delta Lake support multi-table transactions?
- How can I change the type of a column?
- What does it mean that Delta Lake supports multi-cluster writes?
- Can I modify a Delta table from different workspaces?
- Can I access Delta tables outside of Databricks Runtime?