Apache Spark notebooks
After creating your Databricks workspace, it's time to create your first notebook and Spark cluster.
What is Apache Spark notebook?
A notebook is a collection of cells. These cells are run to execute code, to render formatted text, or to display graphical visualizations.
What is a cluster?
The notebooks are backed by clusters, or networked computers, that work together to process your data. The first step is to create a cluster.
Create a cluster
In the Azure portal, click All resources menu on the left side navigation and select the Databricks workspace you created in the last unit.
Select Launch Workspace to open your Databricks workspace in a new tab.
In the left-hand menu of your Databricks workspace, select Clusters.
Select Create Cluster to add a new cluster.
Enter a name for your cluster. Use your name or initials to easily differentiate your cluster from your coworkers.
Select the Databricks RuntimeVersion. We recommend the latest runtime and Scala 2.11.
Specify your cluster configuration. While on the 14 day free trial, the defaults will be sufficient. When the trial is ended, you may prefer to change
Min Workersto zero. That will allow the compute resources to shut down when you are not in a coding exercise and reduce your charges.
Select Create Cluster.
Create a notebook
On the left-hand menu of your Databricks workspace, select Home.
Right-click on your home folder.
Name your notebook First Notebook.
Set the Language to Python.
Select the cluster to which to attach this notebook.
This option displays only when a cluster is currently running. You can still create your notebook and attach it to a cluster later.
Now that you've created your notebook, let's use it to run some code.
Attach and detach your notebook
To use your notebook to run a code, you must attach it to a cluster. You can also detach your notebook from a cluster and attach it to another depending upon your organization's requirements.
If your notebook is attached to a cluster, you can:
- Detach your notebook from the cluster
- Restart the cluster
- Attach to another cluster
- Open the Spark UI
- View the log files of the driver