Create, develop, and maintain Synapse notebooks in Azure Synapse Analytics

A Synapse notebook is a web interface for you to create files that contain live code, visualizations, and narrative text. Notebooks are a good place to validate ideas and use quick experiments to get insights from your data. Notebooks are also widely used in data preparation, data visualization, machine learning, and other Big Data scenarios.

With a Synapse notebook, you can:

  • Get started with zero setup effort.
  • Keep data secure with built-in enterprise security features.
  • Analyze data across raw formats (CSV, txt, JSON, etc.), processed file formats (parquet, Delta Lake, ORC, etc.), and SQL tabular data files against Spark and SQL.
  • Be productive with enhanced authoring capabilities and built-in data visualization.

This article describes how to use notebooks in Synapse Studio.

Preview of the new notebook experience

Synapse team brought the new notebooks component into Synapse Studio to provide consistent notebook experience for Microsoft customers and maximize discoverability, productivity, sharing, and collaboration. The new notebook experience is ready for preview. Check the Preview Features button in notebook toolbar to turn it on. The table below captures feature comparison between existing notebook (so called "classical notebook") with the new preview one.

Feature Classical Notebook Preview Notebook
%run Not supported
%history Not supported
%load Not supported
%%html Not supported
Drag and drop to move a cell Not supported
Outline (Table of Content) Not supported
Variable explorer Not supported
Format text cell with toolbar buttons Not supported
Code cell commenting Not supported

Note

Variable explorer only supports python.

Create a notebook

There are two ways to create a notebook. You can create a new notebook or import an existing notebook to a Synapse workspace from the Object Explorer. Synapse notebooks recognize standard Jupyter Notebook IPYNB files.

Screenshot of create new or import notebook

Develop notebooks

Notebooks consist of cells, which are individual blocks of code or text that can be run independently or as a group.

We provide rich operations to develop notebooks:

Add a cell

There are multiple ways to add a new cell to your notebook.

  1. Expand the upper left + Cell button, and select Add code cell or Add text cell.

    Screenshot of add-cell-with-cell-button

  2. Hover over the space between two cells and select Add code or Add text.

    Screenshot of add-cell-between-space

  3. Use Shortcut keys under command mode. Press A to insert a cell above the current cell. Press B to insert a cell below the current cell.

Set a primary language

Synapse notebooks support four Apache Spark languages:

  • PySpark (Python)
  • Spark (Scala)
  • Spark SQL
  • .NET Spark (C#)

You can set the primary language for new added cells from the dropdown list in the top command bar.

Screenshot of default-synapse-language

Use multiple languages

You can use multiple languages in one notebook by specifying the correct language magic command at the beginning of a cell. The following table lists the magic commands to switch cell languages.

Magic command Language Description
%%pyspark Python Execute a Python query against Spark Context.
%%spark Scala Execute a Scala query against Spark Context.
%%sql SparkSQL Execute a SparkSQL query against Spark Context.
%%csharp .NET for Spark C# Execute a .NET for Spark C# query against Spark Context.

The following image is an example of how you can write a PySpark query using the %%pyspark magic command or a SparkSQL query with the %%sql magic command in a Spark(Scala) notebook. Notice that the primary language for the notebook is set to pySpark.

Screenshot of Synapse spark magic commands

Use temp tables to reference data across languages

You cannot reference data or variables directly across different languages in a Synapse notebook. In Spark, a temporary table can be referenced across languages. Here is an example of how to read a Scala DataFrame in PySpark and SparkSQL using a Spark temp table as a workaround.

  1. In Cell 1, read a DataFrame from a SQL pool connector using Scala and create a temporary table.

    %%spark
    val scalaDataFrame = spark.read.sqlanalytics("mySQLPoolDatabase.dbo.mySQLPoolTable")
    scalaDataFrame.createOrReplaceTempView( "mydataframetable" )
    
  2. In Cell 2, query the data using Spark SQL.

    %%sql
    SELECT * FROM mydataframetable
    
  3. In Cell 3, use the data in PySpark.

    %%pyspark
    myNewPythonDataFrame = spark.sql("SELECT * FROM mydataframetable")
    

IDE-style IntelliSense

Synapse notebooks are integrated with the Monaco editor to bring IDE-style IntelliSense to the cell editor. Syntax highlight, error marker, and automatic code completions help you to write code and identify issues quicker.

The IntelliSense features are at different levels of maturity for different languages. Use the following table to see what's supported.

Languages Syntax Highlight Syntax Error Marker Syntax Code Completion Variable Code Completion System Function Code Completion User Function Code Completion Smart Indent Code Folding
PySpark (Python) Yes Yes Yes Yes Yes Yes Yes Yes
Spark (Scala) Yes Yes Yes Yes Yes Yes - Yes
SparkSQL Yes Yes Yes Yes Yes - - -
.NET for Spark (C#) Yes Yes Yes Yes Yes Yes Yes Yes

Note

An active Spark session is required to benefit the Variable Code Completion, System Function Code Completion,User Function Code Completion for .NET for Spark (C#).

Code Snippets

Synapse notebooks provide code snippets that make it easier to enter common used code patterns, such as configuring your Spark session, reading data as a Spark DataFrame, or drawing charts with matplotlib etc.

Snippets appear in Shortcut keys of IDE style IntelliSense mixed with other suggestions. The code snippets contents align with the code cell language. You can see available snippets by typing Snippet or any keywords appear in the snippet title in the code cell editor. For example, by typing read you can see the list of snippets to read data from various data sources.

Animated GIF of Synapse code snippets

Format text cell with toolbar buttons

You can use the format buttons in the text cells toolbar to do common markdown actions. It includes bolding text, italicizing text, inserting code snippets, inserting unordered list, inserting ordered list and inserting image from URL.

Screenshot of Synapse text cell toolbar

Undo/Redo cell operation

Select the Undo / Redo button or press Ctrl+Z / Ctrl+Y to revoke the most recent cell operations. Now you can undo/redo up to the latest 20 historical cell operations.

Screenshot of Synapse undo cells

Code cell commenting

Not supported.

Move a cell

Select the ellipses (...) to access the other cell actions menu at the far right. Then select Move cell up or Move cell down to move the current cell.

You can also use shortcut keys under command mode. Press Ctrl+Alt+↑ to move up the current cell. Press Ctrl+Alt+↓ to move the current cell down.

Screenshot of move-a-cell

Delete a cell

To delete a cell, select the ellipses (...) to access the other cell actions menu at the far right then select Delete cell.

You can also use shortcut keys under command mode. Press D,D to delete the current cell.

Screenshot of delete-a-cell

Collapse a cell input

Select the arrow button at the bottom of the current cell to collapse it. To expand it, select the arrow button while the cell is collapsed.

Animated GIF of collapse-cell-input

Collapse a cell output

Select the collapse output button at the upper left of the current cell output to collapse it. To expand it, select the Show cell output while the cell output is collapsed.

Animated GIF of collapse-cell-output

Notebook outline

Not supported.

Run notebooks

You can run the code cells in your notebook individually or all at once. The status and progress of each cell is represented in the notebook.

Run a cell

There are several ways to run the code in a cell.

  1. Hover on the cell you want to run and select the Run Cell button or press Ctrl+Enter.

    Screenshot of run-cell-1

  2. Use Shortcut keys under command mode. Press Shift+Enter to run the current cell and select the cell below. Press Alt+Enter to run the current cell and insert a new cell below.


Run all cells

Select the Run All button to run all the cells in current notebook in sequence.

Screenshot of run-all-cells

Run all cells above or below

To Access the other cell actions menu at the far right, select the ellipses (...). Then, select Run cells above to run all the cells above the current in sequence. Select Run cells below to run all the cells below the current in sequence.

Screenshot of run-cells-above-or-below

Cancel all running cells

Select the Cancel All button to cancel the running cells or cells waiting in the queue. Screenshot of cancel-all-cells

Notebook reference

Not supported.

Variable explorer

Not supported.

Cell status indicator

A step-by-step cell execution status is displayed beneath the cell to help you see its current progress. Once the cell run is complete, an execution summary with the total duration and end time are shown and kept there for future reference.

Screenshot of cell-status

Spark progress indicator

Synapse notebook is purely Spark based. Code cells are executed on the serverless Apache Spark pool remotely. A Spark job progress indicator is provided with a real-time progress bar appears to help you understand the job execution status. The number of tasks per each job or stage help you to identify the parallel level of your spark job. You can also drill deeper to the Spark UI of a specific job (or stage) via selecting the link on the job (or stage) name.

Screenshot of spark-progress-indicator

Spark session config

You can specify the timeout duration, the number, and the size of executors to give to the current Spark session in Configure session. Restart the Spark session is for configuration changes to take effect. All cached notebook variables are cleared.

Screenshot of session-management

Spark session config magic command

You can also specify spark session settings via a magic command %%configure. The spark session needs to restart to make the settings effect. We recommend you to run the %%configure at the beginning of your notebook. Here is a sample, refer to https://github.com/cloudera/livy#request-body for full list of valid parameters.

%%configure
{
    //You can get a list of valid parameters to config the session from https://github.com/cloudera/livy#request-body.
    "driverMemory":"28g", // Recommended values: ["28g", "56g", "112g", "224g", "400g", "472g"]
    "driverCores":4, // Recommended values: [4, 8, 16, 32, 64, 80]
    "executorMemory":"28g",
    "executorCores":4, 
    "jars":["abfs[s]://<file_system>@<account_name>.dfs.core.windows.net/<path>/myjar.jar","wasb[s]://<containername>@<accountname>.blob.core.windows.net/<path>/myjar1.jar"],
    "conf":{
    //Example of standard spark property, to find more available properties please visit:https://spark.apache.org/docs/latest/configuration.html#application-properties.
        "spark.driver.maxResultSize":"10g",
    //Example of customized property, you can specify count of lines that Spark SQL returns by configuring "livy.rsc.sql.num-rows".
        "livy.rsc.sql.num-rows":"3000" 
    }
}

Note

  • "DriverMemory" and "ExecutorMemory" are recommended to set as same value in %%configure, so do "driverCores" and "executorCores".
  • You can use Spark session config magic command in Synapse pipelines. It only takes effect when it's called in the top level. The %%configure used in referenced notebook is going to be ignored.
  • The Spark configuration properties has to be used in the "conf" body. We do not support top level reference for the Spark configuration properties.

Bring data to a notebook

You can load data from Azure Blob Storage, Azure Data Lake Store Gen 2, and SQL pool as shown in the code samples below.

Read a CSV from Azure Data Lake Store Gen2 as a Spark DataFrame

from pyspark.sql import SparkSession
from pyspark.sql.types import *
account_name = "Your account name"
container_name = "Your container name"
relative_path = "Your path"
adls_path = 'abfss://%s@%s.dfs.core.windows.net/%s' % (container_name, account_name, relative_path)

df1 = spark.read.option('header', 'true') \
                .option('delimiter', ',') \
                .csv(adls_path + '/Testfile.csv')

Read a CSV from Azure Blob Storage as a Spark DataFrame


from pyspark.sql import SparkSession

# Azure storage access info
blob_account_name = 'Your account name' # replace with your blob name
blob_container_name = 'Your container name' # replace with your container name
blob_relative_path = 'Your path' # replace with your relative folder path
linked_service_name = 'Your linked service name' # replace with your linked service name

blob_sas_token = mssparkutils.credentials.getConnectionStringOrCreds(linked_service_name)

# Allow SPARK to access from Blob remotely

wasb_path = 'wasbs://%s@%s.blob.core.windows.net/%s' % (blob_container_name, blob_account_name, blob_relative_path)

spark.conf.set('fs.azure.sas.%s.%s.blob.core.windows.net' % (blob_container_name, blob_account_name), blob_sas_token)
print('Remote blob path: ' + wasb_path)

df = spark.read.option("header", "true") \
            .option("delimiter","|") \
            .schema(schema) \
            .csv(wasbs_path)

Read data from the primary storage account

You can access data in the primary storage account directly. There's no need to provide the secret keys. In Data Explorer, right-click on a file and select New notebook to see a new notebook with data extractor autogenerated.

Screenshot of data-to-cell

IPython Widgets

Not supported.

Save notebooks

You can save a single notebook or all notebooks in your workspace.

  1. To save changes you made to a single notebook, select the Publish button on the notebook command bar.

    Screenshot of publish-notebook

  2. To save all notebooks in your workspace, select the Publish all button on the workspace command bar.

    Screenshot of publish-all

In the notebook properties, you can configure whether to include the cell output when saving.

Screenshot of notebook-properties

Magic commands

You can use familiar Jupyter magic commands in Synapse notebooks. Review the following list as the current available magic commands. Tell us your use cases on GitHub so that we can continue to build out more magic commands to meet your needs.

Note

Only following magic commands are supported in Synapse pipeline : %%pyspark, %%spark, %%csharp, %%sql.

Integrate a notebook

Add a notebook to a pipeline

Select the Add to pipeline button on the upper right corner to add a notebook to an existing pipeline or create a new pipeline.

Screenshot of Add notebook to pipeline

Designate a parameters cell

To parameterize your notebook, select the ellipses (...) to access the other cell actions menu at the far right. Then select Toggle parameter cell to designate the cell as the parameters cell.

Screenshot of toggle-parameter

Azure Data Factory looks for the parameters cell and treats this cell as defaults for the parameters passed in at execution time. The execution engine will add a new cell beneath the parameters cell with input parameters in order to overwrite the default values.

Assign parameters values from a pipeline

Once you've created a notebook with parameters, you can execute it from a pipeline with the Synapse Notebook activity. After you add the activity to your pipeline canvas, you will be able to set the parameters values under Base parameters section on the Settings tab.

Screenshot of Assign a parameter

When assigning parameter values, you can use the pipeline expression language or system variables.

Shortcut keys

Similar to Jupyter Notebooks, Synapse notebooks have a modal user interface. The keyboard does different things depending on which mode the notebook cell is in. Synapse notebooks support the following two modes for a given code cell: command mode and edit mode.

  1. A cell is in command mode when there is no text cursor prompting you to type. When a cell is in Command mode, you can edit the notebook as a whole but not type into individual cells. Enter command mode by pressing ESC or using the mouse to select outside of a cell's editor area.

    Screenshot of command-mode

  2. Edit mode is indicated by a text cursor prompting you to type in the editor area. When a cell is in edit mode, you can type into the cell. Enter edit mode by pressing Enter or using the mouse to select on a cell's editor area.

    Screenshot of edit-mode

Shortcut keys under command mode

Using the following keystroke shortcuts, you can more easily navigate and run code in Synapse notebooks.

Action Synapse notebook Shortcuts
Run the current cell and select below Shift+Enter
Run the current cell and insert below Alt+Enter
Select cell above Up
Select cell below Down
Insert cell above A
Insert cell below B
Extend selected cells above Shift+Up
Extend selected cells below Shift+Down
Move cell up Ctrl+Alt+↑
Move cell down Ctrl+Alt+↓
Delete selected cells D, D
Switch to edit mode Enter

Shortcut keys under edit mode

Using the following keystroke shortcuts, you can more easily navigate and run code in Synapse notebooks when in Edit mode.

Action Synapse notebook shortcuts
Move cursor up Up
Move cursor down Down
Undo Ctrl + Z
Redo Ctrl + Y
Comment/Uncomment Ctrl + /
Delete word before Ctrl + Backspace
Delete word after Ctrl + Delete
Go to cell start Ctrl + Home
Go to cell end Ctrl + End
Go one word left Ctrl + Left
Go one word right Ctrl + Right
Select all Ctrl + A
Indent Ctrl +]
Dedent Ctrl + [
Switch to command mode Esc

Next steps