Imagine that you work as a data engineer or data scientist for a large retail store. Your organization uses Azure Data Lake to store all its online shopping data. However, as the volume of data increases, updating and querying information from storage is becoming more and more time consuming. Your responsibility is to investigate the problem and find a solution.
You need a solution that matches Data Lake in scalability but is also reliable and fast.
Delta Lake can solve your problem. It's a file format that integrates with Spark and has both open-source and managed offerings. Delta Lake is provided as a managed offering as part of your Azure Databricks account, and helps you combine the best capabilities of Data Lake, data warehousing, and a streaming ingestion system.
In this module, you will:
- Learn about the key features and use cases of Delta Lake.
- Use Delta Lake to create, append, and upsert tables.
- Perform optimizations in Delta Lake.
- Compare different versions of a Delta table using Time Machine.