Perform data engineering with Azure Databricks

Intermediate
Developer
Solution Architect
Data Scientist
Data Engineer
Azure
Databricks

Learn how to use Azure Databricks to accelerate the setup of Databricks in Azure. You'll work with data in an Azure SQL Data Warehouse with the built-in conector services. Explore the data services available with Azure Data Factory. Build streamlined workflows, and work with the interactive analytics workspace powered by Apache Spark.

Prerequisites:

You'll need an Azure subscription. If you don't have an Azure subscription, create a free account and add a subscription before you begin.

Modules in this learning path

Introduction to Azure Databricks

Learn the fundamentals of Azure Databricks and Apache Spark notebooks.

Access SQL Data Warehouse instances with Azure Databricks

Learn how to access Azure SQL Data Warehouse from Azure Databricks by using the SQL Data Warehouse connector. This allows you to use Apache Spark with Azure Blob storage and PolyBase in SQL Data Warehouse to efficiently transfer large volumes of data between a Databricks cluster and a SQL Data Warehouse instance.

Data ingestion with Azure data factory

In this module, you use Azure Databricks to work with multiple data sources. Learn how to combine inputs from files and data stores, such as Azure SQL Database, and transform and store that data for advanced analytics.

Read and write data by using Azure Databricks

Use Azure Databricks to work with multiple data sources, combining inputs from files and data stores such as Azure SQL Database, and transform and store that data for advanced analytics.

Perform basic data transformations in Azure Databricks

Learn the tools and techniques to do basic data transformations in Azure Databricks.

Perform advanced data transformation in Azure Databricks

Learn how to perform advanced data transformations in Azure Databricks, and encapsulate transformation logic through user-defined functions (UDFs) and libraries.

Create data pipelines by using Databricks Delta

Learn how to use Databricks Delta in Azure to manage the flow of data (a data pipeline) to and from a data lake. This system includes mechanisms to create, append, and upsert data to Apache Spark tables, taking advantage of built-in reliability and optimizations. Learn how Databricks Delta architecture helps speed up reads, and how it lets multiple writers modify a dataset simultaneously and see consistent views. Finally, implement a Lambda Architecture by processing batch and streaming data with Delta.

Work with streaming data in Azure Databricks

Learn how to analyze and process streaming data by using Azure Event Hubs, Spark Structured Streaming, and Databricks Delta.

Create data visualizations by using Azure Databricks and Power BI

Use Azure Databricks to create basic to advanced visualizations by using built-in charts and third-party libraries such as Matplotlib. Connect your Azure Databricks data to Power BI to create business-intelligence dashboards that can be shared with others.