Transform data with Spark in Azure Synapse Analytics

Intermediate
Data Engineer
Azure Synapse Analytics

Data engineers commonly need to transform large volumes of data. Apache Spark pools in Azure Synapse Analytics provide a distributed processing platform that they can use to accomplish this goal.

Learning objectives

In this module, you will learn how to:

  • Use Apache Spark to modify and save dataframes
  • Partition data files for improved performance and scalability.
  • Transform data with SQL

Prerequisites

Before taking this module, you should be familiar with Apache Spark pools in Azure Synapse Analytics. Consider completing the Analyze data with Apache Spark in Azure Synapse Analytics module first.