Introduction to Databricks

Azure Databricks is a fully-managed version of the open-source Apache Spark analytics and data processing engine. Azure Databricks is an enterprise-grade and secure cloud-based big data and machine learning platform.

Databricks provides a notebook-oriented Apache Spark as-a-service workspace environment, making it easy to manage clusters and explore data interactively.

Note

This module's labs can be completed for free using the Databricks 14-day trial, but you cannot use an Azure free trial subscription to create a Databricks workspace. To switch a free trial subscription to pay-as-you-go, go to your profile and change your subscription offer to pay-as-you-go. You may also need to remove the spending limit, and request a quota increase for vCPUs in your region. When you create your Azure Databricks workspace, you can select the Trial (Premium - 14-Days Free DBUs) pricing tier to give the workspace access to free Premium Azure Databricks DBUs for 14 days.

Learning objectives

In this module, you will:

  • Create your own Azure Databricks workspace
  • Create a notebook inside your home folder in Databricks
  • Understand the fundamentals of Apache Spark notebook
  • Create, or attach to, a Spark cluster
  • Identify the types of tasks well suited to the unified analytics engine Apache Spark