Exercise - Use Apache Spark notebooks

You can use Apache Spark notebooks to:

  • Read and process huge files and data sets
  • Query, explore, and visualize data sets
  • Join disparate data sets found in data lakes
  • Train and evaluate machine learning models
  • Process live streams of data
  • Perform analysis on large graph data sets and social networks

To learn more about using notebooks, clone the labs archive where sample notebooks are provided. These notebooks will help you understand how to use notebooks for your day-to-day tasks.

Clone the Databricks archive

  1. In the Azure portal, navigate to your deployed Azure Databricks workspace and select Launch Workspace.

  2. In the left pane, select Workspace > Users, and select your username (the entry with the house icon).

  3. In the pane that appears, select the arrow next to your name, and select Import.

    The menu option to import the archive

  4. In the Import Notebooks dialog box, select the URL and paste in the following URL:

 https://github.com/MicrosoftDocs/mslearn-azure-databricks-notebooks-fundamentals/blob/master/DBC/01-notebook-fundamentals.dbc?raw=true
  1. Select Import.
  2. Select the 01 Notebook Fundamentals folder that appears.
  3. Use the set of notebooks in this folder to complete this lab.

Complete the following notebooks

  • 01 Notebook Fundamentals - This notebook illustrates the fundamentals of a Databricks notebook.
  • Why Apache Spark? - In this notebook, you can practice some of the use cases for Databricks notebook.