Transform data with Spark in Azure Synapse Analytics

Module
7 Units

Intermediate

Data Engineer

Azure Synapse Analytics

Data engineers commonly need to transform large volumes of data. Apache Spark pools in Azure Synapse Analytics provide a distributed processing platform that they can use to accomplish this goal.

Learning objectives

In this module, you will learn how to:

Use Apache Spark to modify and save dataframes
Partition data files for improved performance and scalability.
Transform data with SQL

Prerequisites

Before taking this module, you should be familiar with Apache Spark pools in Azure Synapse Analytics. Consider completing the Analyze data with Apache Spark in Azure Synapse Analytics module first.

Introduction min
Modify and save dataframes min
Partition data files min
Transform data with SQL min
Exercise: Transform data with Spark in Azure Synapse Analytics min
Knowledge check min
Summary min