Understand other Azure data services

Completed

To round out your understanding of offerings on the Azure data platform, consider Azure Databricks, Data Factory, and Microsoft Purview.

Databricks

Databricks is a data analytics platform that's optimized for Azure. It provides one-click setup, streamlined workflows, and an interactive workspace for Spark-based applications.

Databricks adds capabilities to Apache Spark, including fully managed Spark clusters and an interactive workspace. You can use REST APIs to program clusters.

In Databricks notebooks, you'll use familiar programming tools such as R, Python, Scala, and SQL. Role-based security in Azure Active Directory and Databricks provides enterprise-grade security.

Data Factory

Data Factory is a cloud-integration service. It orchestrates the movement of data between various data stores.

As a data engineer, you can create data-driven workflows in the cloud to orchestrate and automate data movement and data transformation. Use Data Factory to create and schedule data-driven workflows (called pipelines) that can ingest data from data stores.

Data Factory processes and transforms data by using compute services such as Azure HDInsight, Hadoop, Spark, and Azure Machine Learning. Publish output data to data stores such as Azure Synapse Analytics so that business intelligence applications can consume the data. Ultimately, you use Data Factory to organize raw data into meaningful data stores and data lakes so your organization can make better business decisions.

Microsoft Purview

Microsoft Purview brings together the former Azure Purview and the former Microsoft 365 Compliance portfolio, to form a comprehensive set of solutions that help you govern, protect, and manage your entire data estate. This unified data governance service helps you manage and govern your on-premises, multicloud, and software-as-a-service (SaaS) data. With Microsoft Purview, you can easily create a holistic, up-to-date map of your data landscape with automated data discovery, sensitive data classification, and end-to-end data lineage.