使用笔记本Use notebooks

笔记本是) (命令的可运行单元的集合。A notebook is a collection of runnable cells (commands). 使用笔记本时,主要是在开发和运行单元。When you use a notebook, you are primarily developing and running cells.

UI 操作支持所有笔记本任务,但你也可以使用键盘快捷方式执行许多任务。All notebook tasks are supported by UI actions, but you can also perform many tasks using keyboard shortcuts. 通过单击  键盘图标 图标或选择 "> 快捷方式" 来切换快捷方式显示。Toggle the shortcut display by clicking the Keyboard Icon icon or selecting ? > Shortcuts.

键盘快捷键Keyboard shortcuts

开发笔记本 Develop notebooks

本部分介绍如何开发笔记本单元并在笔记本中导航。This section describes how to develop notebook cells and navigate around a notebook.

本节内容:In this section:

关于笔记本About notebooks

笔记本包含一个工具栏,可用于在笔记本中管理笔记本和执行操作:A notebook has a toolbar that lets you manage the notebook and perform actions within the notebook:

笔记本工具栏Notebook toolbar

可以运行的一个或多个单元 (或命令) :and one or more cells (or commands) that you can run:

笔记本单元Notebook cells

在单元格的最右侧,单元操作  单元操作 包含三个菜单: " 运行"、" 仪表板" 和 " 编辑":At the far right of a cell, the cell actions Cell actions, contains three menus: Run, Dashboard, and Edit:

运行图标 仪表板 编辑

和两个操作:隐藏and two actions: Hide 单元格最小化 和 Deleteand Delete “删除”图标..

添加单元格Add a cell

若要添加单元格,请将鼠标移到顶部或底部的单元格上,单击 "  添加单元格" 图标,或访问最右侧的笔记本单元菜单,单击 "  向下插入" ,然后选择 "在 上方添加单元格 " 或 " 添加单元格"。To add a cell, mouse over a cell at the top or bottom and click the Add Cell icon, or access the notebook cell menu at the far right, click Down Caret, and select Add Cell Above or Add Cell Below.

删除单元格Delete a cell

中转到单元格操作菜单Go to the cell actions menu 单元操作 最右侧,然后单击at the far right and click “删除”图标 (删除) 。(Delete).

删除单元时,默认情况下会显示 "删除确认" 对话框。When you delete a cell, by default a delete confirmation dialog displays. 若要禁用未来确认对话框,请选中 "不再 显示此 对话框" 复选框,然后单击 " 确认"。To disable future confirmation dialogs, select the Do not show this again checkbox and click Confirm. 你还可以切换 "确认" 对话框设置, Turn on command delete confirmation方法是在 "  帐户" 图标 > 用户设置 "> 笔记本设置" 中打开 "删除确认选项"。You can also toggle the confirmation dialog setting with the Turn on command delete confirmation option in Account Icon > User Settings > Notebook Settings.

若要还原已删除的单元格,请选择 " 编辑" > 撤消删除单元 或使用 (Z) 键盘快捷方式。To restore deleted cells, either select Edit > Undo Delete Cells or use the (Z) keyboard shortcut.

剪切单元Cut a cell

在最右侧的单元格操作菜单  中,单击 "  向下插入" ,然后选择 " 剪切单元"。Go to the cell actions menu Cell actions at the far right, click Down Caret, and select Cut Cell.

你还可以使用 (X) 的键盘快捷方式。You can also use the (X) keyboard shortcut.

若要还原已删除的单元格,请选择 " 编辑" > 撤消剪切单元 或使用 (Z) 键盘快捷方式。To restore deleted cells, either select Edit > Undo Cut Cells or use the (Z) keyboard shortcut.

选择多个单元或所有单元Select multiple cells or all cells

您可以Shift + 分别使用上一个和下一个单元格的向上向下移动来选择相邻的笔记本单元。You can select adjacent notebook cells using Shift + Up or Down for the previous and next cell respectively. 可以复制、剪切、删除和粘贴多选单元。Multi-selected cells can be copied, cut, deleted, and pasted.

若要选择所有单元,请选择 " 编辑" > 选择所有单元 或使用命令模式快捷方式 Cmd + ATo select all cells, select Edit > Select All Cells or use the command mode shortcut Cmd+A.

默认语言Default language

每个单元格的默认语言显示在 " ** () ** " 笔记本名称旁边的 "链接" 中。The default language for each cell is shown in a () link next to the notebook name. 在以下笔记本中,默认语言为 SQL。In the following notebook, the default language is SQL.

笔记本默认语言Notebook default language

更改默认语言:To change the default language:

  1. 单击 " ** () ** " 链接。Click () link. 将显示 "更改默认语言" 对话框。The Change Default Language dialog displays.

    笔记本默认语言Notebook default language

  2. 从 " 默认语言 " 下拉选择新语言。Select the new language from the Default Language drop-down.

  3. 单击“更改”****。Click Change.

  4. 若要确保现有命令可继续正常工作,则使用 language 幻命令自动为以前的默认语言设置前缀。To ensure that existing commands continue to work, commands of the previous default language are automatically prefixed with a language magic command.

混合语言 Mix languages

您可以通过在单元格的开头指定语言 "魔术" 命令来覆盖默认语言 %<language>You can override the default language by specifying the language magic command %<language> at the beginning of a cell. 支持的幻命令有: %python%r%scala%sqlThe supported magic commands are: %python, %r, %scala, and %sql.

备注

调用 language 幻命令时,该命令会被调度到笔记本 执行上下文 中的复制。When you invoke a language magic command, the command is dispatched to the REPL in the execution context for the notebook. 用一种语言定义的变量 (,因此在该语言的复制中) 在其他语言的复制中不可用。Variables defined in one language (and hence in the REPL for that language) are not available in the REPL of another language. Repl 只能通过外部资源(例如 DBFS 中的文件或对象存储中的对象)共享状态。REPLs can share state only through external resources such as files in DBFS or objects in object storage.

笔记本还支持几个辅助幻命令:Notebooks also support a few auxiliary magic commands:

  • %sh:允许在笔记本中运行 shell 代码。%sh: Allows you to run shell code in your notebook. 若要在 shell 命令的退出状态为非零的情况下使单元失败,请添加 -e 选项。To fail the cell if the shell command has a non-zero exit status, add the -e option. 此命令仅在 Apache Spark 驱动程序上运行,而不是在辅助角色上运行。This command runs only on the Apache Spark driver, and not the workers. 若要在所有节点上运行 shell 命令,请使用 init 脚本To run a shell command on all nodes, use an init script.
  • %fs:允许你使用 dbutils filesystem 命令。%fs: Allows you to use dbutils filesystem commands. 请参阅 dbutilsSee dbutils.
  • %md:允许包括各种类型的文档,包括文本、图像和数学公式和公式。%md: Allows you to include various types of documentation, including text, images, and mathematical formulas and equations. 请参阅下一节。See the next section.

包含文档Include documentation

若要在笔记本中包含文档,可以使用 %md 魔棒命令来识别 Markdown 标记。To include documentation in a notebook you can use the %md magic command to identify Markdown markup. 包含的 Markdown 标记将呈现为 HTML。The included Markdown markup is rendered into HTML. 例如,此 Markdown 代码段包含一级标题的标记:For example, this Markdown snippet contains markup for a level-one heading:

%md # Hello This is a Title

它将呈现为 HTML 标题:It is rendered as a HTML title:

笔记本 HTML 标题Notebook HTML title

可折叠的标题Collapsible headings

在包含 Markdown 标题的单元格后显示的单元格可以折叠到标题单元中。Cells that appear after cells containing Markdown headings can be collapsed into the heading cell. 下图显示了名为 " 标题 1 " 的一级标题,其中折叠了以下两个单元。The following image shows a level-one heading called Heading 1 with the following two cells collapsed into it.

折叠的单元格Collapsed cells

若要展开和折叠标题,请单击 +-To expand and collapse headings, click the + and -.

另请参阅 隐藏和显示单元格内容Also see Hide and show cell content.

您可以使用相对路径链接到 Markdown 单元中的其他笔记本或文件夹。You can link to other notebooks or folders in Markdown cells using relative paths. href 定位点标记的属性指定为相对路径,从开始, $ 然后遵循与 Unix 文件系统相同的模式:Specify the href attribute of an anchor tag as the relative path, starting with a $ and then follow the same pattern as in Unix file systems:

%md
<a href="$./myNotebook">Link to notebook in same folder as current notebook</a>
<a href="$../myFolder">Link to folder in parent folder of current notebook</a>
<a href="$./myFolder2/myNotebook2">Link to nested notebook</a>

显示图像Display images

若要显示存储在存储 中的图像,请使用以下语法:To display images stored in the FileStore, use the syntax:

%md
![test](files/image.png)

例如,假设在文件文件中具有 Databricks 徽标图像文件:For example, suppose you have the Databricks logo image file in FileStore:

dbfs ls dbfs:/FileStore/
databricks-logo-mobile.png

如果在 Markdown 单元格中包含以下代码:When you include the following code in a Markdown cell:

Markdown 单元中的图像Image in Markdown cell

图像呈现在单元格中:the image is rendered in the cell:

呈现的图像Rendered image

显示数学等式Display mathematical equations

笔记本支持 KaTeX 以显示数学公式和公式。Notebooks support KaTeX for displaying mathematical formulas and equations. 例如,For example,

%md
\\(c = \\pm\\sqrt{a^2 + b^2} \\)

\\(A{_i}{_j}=B{_i}{_j}\\)

$$c = \\pm\\sqrt{a^2 + b^2}$$

\\[A{_i}{_j}=B{_i}{_j}\\]

呈现为:renders as:

呈现的公式1Rendered equation 1

and

%md
\\( f(\beta)= -Y_t^T X_t \beta + \sum log( 1+{e}^{X_t\bullet\beta}) + \frac{1}{2}\delta^t S_t^{-1}\delta\\)

where \\(\delta=(\beta - \mu_{t-1})\\)

呈现为:renders as:

呈现的公式2Rendered equation 2

包含 HTMLInclude HTML

可以通过使用函数在笔记本中包含 HTML displayHTMLYou can include HTML in a notebook by using the function displayHTML. 有关如何执行此操作的示例,请参阅 笔记本中的 HTML、D3 和 SVGSee HTML, D3, and SVG in notebooks for an example of how to do this.

备注

displayHTMLIframe 通过域提供 databricksusercontent.com ,而 iframe 沙盒包含 allow-same-origin 特性。The displayHTML iframe is served from the domain databricksusercontent.com and the iframe sandbox includes the allow-same-origin attribute. 必须可在浏览器中访问 databricksusercontent.comdatabricksusercontent.com must be accessible from your browser. 如果它当前被企业网络阻止,IT 人员需要将它加入允许列表。If it is currently blocked by your corporate network, it will need to be whitelisted by IT.

命令注释Command comments

您可以使用命令注释与协作者进行讨论。You can have discussions with collaborators using command comments.

若要切换注释边栏,请单击笔记本右上角的 " 评论 " 按钮。To toggle the Comments sidebar, click the Comments button at the top right of a notebook.

切换笔记本评论Toggle notebook comments

向命令添加注释:To add a comment to a command:

  1. 突出显示命令文本,并单击注释气泡:Highlight the command text and click the comment bubble:

    打开注释Open comments

  2. 添加注释,然后单击 " 注释"。Add your comment and click Comment.

    添加 commentyAdd commenty

若要编辑、删除或回复注释,请单击注释并选择操作。To edit, delete, or reply to a comment, click the comment and choose an action.

编辑注释Edit comment

更改单元格显示 Change cell display

笔记本有三个显示选项:There are three display options for notebooks:

  • 标准视图:结果立即显示在代码单元后面Standard view: results are displayed immediately after code cells
  • 仅结果:仅显示结果Results only: only results are displayed
  • 并行显示:代码和结果单元并排显示,结果显示在右侧Side-by-side: code and results cells are displayed side by side, with results to the right

中转到 "视图" 菜单Go to the View menu 视图菜单 以选择您的显示选项。to select your display option.

并行视图side-by-side view

显示行和命令编号Show line and command numbers

若要显示行号或命令编号,请在 "视图" 菜单的 "  视图" 菜单中 选择 " 显示行号 " 或 " 显示命令号"。To show line numbers or command numbers, go to the View menu View Menu and select Show line numbers or Show command numbers. 显示后,您可以从同一菜单中再次隐藏它们。Once they’re displayed, you can hide them again from the same menu. 还可以使用键盘快捷方式 控件 + L启用行号。You can also enable line numbers with the keyboard shortcut Control+L.

通过 "视图" 菜单显示行或命令编号Show line or command numbers via the view menu

笔记本中启用的行和命令号Line and command numbers enabled in notebook

如果启用行号或命令号,Databricks 将保存首选项,并在该浏览器的所有其他笔记本中显示它们。If you enable line or command numbers, Databricks saves your preference and shows them in all of your other notebooks for that browser.

单元格上方的命令数链接到该特定命令。Command numbers above cells link to that specific command. 如果单击某个单元格的命令编号,则会将 URL 更新为锚定到该命令。If you click on the command number for a cell, it updates your URL to be anchored to that command. 如果要链接到笔记本中的特定命令,请右键单击命令编号,然后选择 " 复制链接地址"。If you want to link to a specific command in your notebook, right-click the command number and choose copy link address.

查找和替换文本Find and replace text

若要查找和替换笔记本中的文本,请选择 " 文件" > 查找和替换"。To find and replace text within a notebook, select File > Find and Replace.

查找和替换文本Find and replace text

当前匹配项以橙色突出显示,所有其他匹配项突出显示为黄色。The current match is highlighted in orange and all other matches are highlighted in yellow.

匹配文本Matching text

可以通过单击 " 替换" 来替换单个的匹配项。You can replace matches on an individual basis by clicking Replace.

可以通过单击 " 一步" 和 " 下一步 " 按钮,或者按 shift + enter 并按 enter 键,分别转到上一个和下一个匹配项。You can switch between matches by clicking the Prev and Next buttons or pressing shift+enter and enter to go to the previous and next matches, respectively.

通过单击 "  删除" 图标 或按 esc 键,关闭 "查找和替换" 工具。Close the find and replace tool by clicking Delete Icon or by pressing esc.

自动完成Autocomplete

您可以使用 Azure Databricks 自动完成功能,在单元格中输入代码段时自动完成这些代码段。You can use Azure Databricks autocomplete features to automatically complete code segments as you enter them in cells. 这可以减少您需要记住的内容,并最大限度地减少您需要执行的键入操作。This reduces what you have to remember and minimizes the amount of typing you have to do. Azure Databricks 在笔记本中支持两种类型的记忆式:本地和服务器。Azure Databricks supports two types of autocomplete in your notebook: local and server.

本地自动完成功能会完成笔记本中存在的单词。Local autocomplete completes words that exist in the notebook. 服务器记忆式键入功能更强大,因为它为定义的类型、类和对象以及 SQL 数据库和表名称访问群集。Server autocomplete is more powerful because it accesses the cluster for defined types, classes, and objects, as well as SQL database and table names. 若要激活服务器自动完成功能,必须将 附加笔记本附加到群集 ,并运行定义 completable 对象的 所有单元格To activate server autocomplete, you must attach your attach a notebook to a cluster and run all cells that define completable objects.

重要

在执行命令的过程中,会阻止 R 笔记本中的服务器自动完成功能。Server autocomplete in R notebooks is blocked during command execution.

输入 completable 对象后,按 tab 可触发自动完成。You trigger autocomplete by pressing Tab after entering a completable object. 例如,在定义和运行包含和定义的单元格后, MyClassinstance 方法 instance 将 completable,并且当你按 tab键时,将显示一个有效的完成列表。For example, after you define and run the cells containing the definitions of MyClass and instance, the methods of instance are completable, and a list of valid completions displays when you press Tab.

触发器自动完成Trigger autocomplete

类型完成和 SQL 数据库和表名称完成工作采用相同的方式。Type completion and SQL database and table name completion work in the same way.

类型完成 — —— — SQL 完成

格式 SQL Format SQL

Azure Databricks 提供的工具可用于在笔记本单元中快速轻松地设置 SQL 代码的格式。Azure Databricks provides tools that allow you to format SQL code in notebook cells quickly and easily. 这些工具减少了保持代码格式的工作量,并有助于在笔记本中强制实施相同的编码标准。These tools reduce the effort to keep your code formatted and help to enforce the same coding standards across your notebooks.

可以通过以下方式触发格式化程序:You can trigger the formatter in the following ways:

  • 单个单元格Single cells

    • 键盘快捷方式:按 Cmd + Shift + FKeyboard shortcut: Press Cmd+Shift+F.

    • 命令上下文菜单:在 SQL 单元的命令上下文下拉菜单中选择 " 格式 SQL "。Command context menu: Select Format SQL in the command context drop-down menu of a SQL cell. 此项仅在 SQL 笔记本单元和具有某种 %sql 语言幻方的单元中可见。This item is visible only in SQL notebook cells and those with a %sql language magic.

      从命令上下文格式化 SQLFormatting SQL From Command Context

  • 多个单元Multiple cells

    选择 " 多个 sql 单元 ",然后选择 " 编辑 sql 单元" > 格式Select multiple SQL cells and then select Edit > Format SQL Cells. 如果选择多个语言的单元格,则仅设置 SQL 单元格的格式。If you select cells of more than one language, only SQL cells are formatted. 这包括使用的 %sqlThis includes those that use %sql.

    设置 "编辑" 菜单中的 SQL 格式Formatting SQL From Edit Menu

下面是前面示例中设置格式后的第一个单元格:Here’s the first cell in the preceding example after formatting:

格式化 SQL 后After Formatting SQL

运行笔记本 Run notebooks

本部分介绍如何运行一个或多个笔记本单元。This section describes how to run one or more notebook cells.

本节内容:In this section:

要求Requirements

笔记本必须 连接 到群集。The notebook must be attached to a cluster. 如果群集未运行,则在运行一个或多个单元时,将启动该群集。If the cluster is not running, the cluster is started when you run one or more cells.

运行单元格Run a cell

在最右侧的 "单元格操作" 菜单中  ,单击 "  运行" 图标 并选择 " 运行单元",或按 shift + enterIn the cell actions menu Cell actions at the far right, click Run Icon and select Run Cell, or press shift+enter.

重要

笔记本单元的最大大小(内容和输出)为16MB。The maximum size for a notebook cell, both contents and output, is 16MB.

例如,尝试运行引用预定义变量的此 Python 代码段 spark variableFor example, try running this Python code snippet that references the predefined spark variable.

spark

然后,运行一些真实的代码:and then, run some real code:

1+1 # => 2

备注

笔记本具有多个默认设置:Notebooks have a number of default settings:

  • 运行单元时,笔记本会自动 附加 到正在运行的群集,而不会提示。When you run a cell, the notebook automatically attaches to a running cluster without prompting.
  • shift + enter时,如果该单元格不可见,则该笔记本自动滚动到下一个单元格。When you press shift+enter, the notebook auto-scrolls to the next cell if the cell is not visible.

若要更改这些设置,请选择  "帐户图标" > "用户设置" > 笔记本设置 ",然后配置相应的复选框。To change these settings, select Account Icon > User Settings > Notebook Settings and configure the respective checkboxes.

运行上面或下面的所有Run all above or below

若要在单元格之前或之后运行所有单元格,请前往最右侧的单元格操作菜单  单元操作 ,单击 "  运行菜单 ",然后选择 " 全部运行 " 或 " 全部运行"。To run all cells before or after a cell, go to the cell actions menu Cell actions at the far right, click Run Menu, and select Run All Above or Run All Below.

运行下面的所有 内容,包括你所在的单元格。Run All Below includes the cell you are in. 运行以上所有功能。Run All Above does not.

运行所有单元格 Run all cells

若要运行笔记本中的所有单元,请在笔记本工具栏中选择 " 全部运行 "。To run all the cells in a notebook, select Run All in the notebook toolbar.

重要

如果装载和卸载的步骤在同一笔记本中,请不要执行 "全部运行"。Do not do a Run All if steps for mount and unmount are in the same notebook. 这可能导致争用情况并可能损坏装入点。It could lead to a race condition and possibly corrupt the mount points.

查看每个单元的多个输出View multiple outputs per cell

Python 笔记本和 %python 非 Python 笔记本中的单元支持每个单元的多个输出。Python notebooks and %python cells in non-Python notebooks support multiple outputs per cell.

一个单元格中有多个输出Multiple outputs in one cell

此功能需要 Databricks Runtime 7.1 或更高版本,并且可以通过设置在 Databricks Runtime 7.1-7.3 中启用 spark.databricks.workspace.multipleResults.enabled trueThis feature requires Databricks Runtime 7.1 or above and can be enabled in Databricks Runtime 7.1-7.3 by setting spark.databricks.workspace.multipleResults.enabled true. 默认情况下,它在 Databricks Runtime 7.4 及更高版本中启用。It is enabled by default in Databricks Runtime 7.4 and above.

Python 和 Scala 错误突出显示Python and Scala error highlighting

Python 和 Scala 笔记本支持突出显示错误。Python and Scala notebooks support error highlighting. 也就是说,将在单元格中突出显示引发错误的代码行。That is, the line of code that is throwing the error will be highlighted in the cell. 此外,如果错误输出为 stacktrace,则引发错误的单元将在 stacktrace 中显示为指向该单元的链接。Additionally, if the error output is a stacktrace, the cell in which the error is thrown is displayed in the stacktrace as a link to the cell. 单击此链接可跳转到有问题的代码。You can click this link to jump to the offending code.

Python 错误突出显示Python error highlighting

Scala 错误突出显示Scala error highlighting

通知Notifications

通知会提醒你某些事件,如 运行所有单元格 期间当前正在运行的命令,以及哪些命令处于错误状态。Notifications alert you to certain events, such as which command is currently running during Run all cells and which commands are in error state. 当笔记本显示多个错误通知时,第一个错误通知将有一个链接,可用于清除所有通知。When your notebook is showing multiple error notifications, the first one will have a link that allows you to clear all notifications.

笔记本通知Notebook notifications

默认情况下启用笔记本通知。Notebook notifications are enabled by default. 你可以在 "帐户" 图标下禁用它们  > "用户设置" > 笔记本设置"。You can disable them under Account Icon > User Settings > Notebook Settings.

Databricks 顾问Databricks Advisor

Databricks Advisor 将在每次运行时自动分析命令,并在笔记本中显示适当的建议。Databricks Advisor automatically analyzes commands every time they are run and displays appropriate advice in the notebooks. 建议通知提供的信息可帮助你提高工作负荷的性能、降低成本以及避免常见错误。The advice notices provide information that can assist you in improving the performance of workloads, reducing costs, and avoiding common mistakes.

查看建议View advice

带有灯泡图标的蓝色框表示通知可用于命令。A blue box with a lightbulb icon signals that advice is available for a command. 该框显示了不同建议的部分数。The box displays the number of distinct pieces of advice.

Databricks 建议Databricks advice

单击灯泡以展开该框,并查看建议。Click the lightbulb to expand the box and view the advice. 一个或多个建议将变为可见。One or more pieces of advice will become visible.

查看建议View advice

单击 " 了解详细 信息" 链接以查看提供与建议相关的详细信息的文档。Click the Learn more link to view documentation providing more information related to the advice.

单击 " 不再显示 " 链接以隐藏建议部分。Click the Don’t show me this again link to hide the piece of advice. 将不再显示此类型的建议。The advice of this type will no longer be displayed. 此操作可以 在笔记本设置中撤消This action can be reversed in Notebook Settings.

再次单击灯泡以折叠 "建议" 框。Click the lightbulb again to collapse the advice box.

建议设置Advice settings

通过选择  "帐户" 图标 > 用户设置 "> 笔记本设置 " 或单击 "展开的建议" 框中的齿轮图标,来访问 "笔记本设置" 页。Access the Notebook Settings page by selecting Account Icon > User Settings > Notebook Settings or by clicking the gear icon in the expanded advice box.

笔记本设置Notebook settings

切换 " 打开 Databricks Advisor " 选项以启用或禁用通知。Toggle the Turn on Databricks Advisor option to enable or disable advice.

如果当前隐藏一种或多种类型的建议,则会显示 " 重置隐藏建议 " 链接。The Reset hidden advice link is displayed if one or more types of advice is currently hidden. 单击链接以使该建议类型再次可见。Click the link to make that advice type visible again.

从另一笔记本 运行笔记本 Run a notebook from another notebook

你可以使用魔棒命令从另一笔记本运行笔记本 %run <notebook>You can run a notebook from another notebook by using the %run <notebook> magic command. 这大致相当于 :load 本地计算机上的 SCALA 复制或 Python 中的语句上的命令 importThis is roughly equivalent to a :load command in a Scala REPL on your local machine or an import statement in Python. 中定义的所有变量 <notebook> 在当前笔记本中都可用。All variables defined in <notebook> become available in your current notebook.

%run 必须位于单元格内,因为它 _会直接运行_整个笔记本。%run must be in a cell by itself, because it runs the entire notebook inline.

备注

不能 使用 %run 运行 Python 文件和 import 该文件中定义的实体。You cannot use %run to run a Python file and import the entities defined in that file into a notebook. 若要从 Python 文件导入,必须将文件打包到 Python 库,从该 Python 库创建 Azure Databricks ,并将 库安装到 用于运行笔记本的群集中。To import from a Python file you must package the file into a Python library, create an Azure Databricks library from that Python library, and install the library into the cluster you use to run your notebook.

示例Example

假设您具有 notebookAnotebookBSuppose you have notebookA and notebookB. notebookA 包含一个具有以下 Python 代码的单元格:notebookA contains a cell that has the following Python code:

x = 5

即使你未 x 在中定义 notebookB ,你也可以 x 在运行后在中访问 notebookB %run notebookAEven though you did not define x in notebookB, you can access x in notebookB after you run %run notebookA.

%run /Users/path/to/notebookA

print(x) # => 5

若要指定相对路径,请在它前面加上 ./../To specify a relative path, preface it with ./ or ../. 例如,如果 notebookAnotebookB 位于同一个目录中,则也可以从相对路径运行它们。For example, if notebookA and notebookB are in the same directory, you can alternatively run them from a relative path.

%run ./notebookA

print(x) # => 5
%run ../someDirectory/notebookA # up a directory and into another

print(x) # => 5

有关笔记本之间更复杂的交互,请参阅 笔记本工作流For more complex interactions between notebooks, see Notebook workflows.

管理笔记本状态 和结果 Manage notebook state and results

将笔记本附加到群集运行一个或多个单元后,笔记本将具有状态并显示结果。After you attach a notebook to a cluster and run one or more cells, your notebook has state and displays results. 本部分介绍如何管理笔记本状态和结果。This section describes how to manage notebook state and results.

本节内容:In this section:

清除笔记本状态 和结果 Clear notebooks state and results

若要清除笔记本状态和结果,请在笔记本工具栏中单击 " 清除 ",并选择操作:To clear the notebook state and results, click Clear in the notebook toolbar and select the action:

清除状态和结果Clear state and results

下载结果Download results

默认情况下,将启用下载结果。By default downloading results is enabled. 若要切换此设置,请参阅 管理从笔记本下载结果的能力To toggle this setting, see Manage the ability to download results from notebooks. 如果禁用下载结果,则  不会显示 "下载结果" 按钮。If downloading results is disabled, the Download Result button is not visible.

下载单元结果Download a cell result

您可以将包含表格输出的单元格结果下载到您的本地计算机。You can download a cell result that contains tabular output to your local machine. 单击Click the 下载结果 按钮位于单元格的底部。button at the bottom of a cell.

下载单元结果Download cell results

将名为的 CSV 文件 export.csv 下载到默认的下载目录。A CSV file named export.csv is downloaded to your default download directory.

下载完整结果Download full results

默认情况下 Azure Databricks 返回数据帧的1000行。By default Azure Databricks returns 1000 rows of a DataFrame. 当有超过1000行时,向下箭头When there are more than 1000 rows, a down arrow 下拉按钮 添加到is added to the 下载结果 按钮。button. 下载查询的所有结果:To download all the results of a query:

  1. 单击 "下载结果" 旁的向下箭头  ,然后选择 " 下载完整结果"。Click the down arrow next to Download Result and select Download full results.

    下载完整结果Download full results

  2. 选择 " 重新执行和下载"。Select Re-execute and download.

    重新运行并下载结果Re-run and download results

    下载完整结果后,会将一个名为的 CSV 文件 export.csv 下载到本地计算机,并且该 /databricks-results 文件夹包含一个生成的文件夹,其中包含完整的查询结果。After you download full results, a CSV file named export.csv is downloaded to your local machine and the /databricks-results folder has a generated folder containing full the query results.

    下载的结果Downloaded results

隐藏和显示单元 格内容 Hide and show cell content

单元格内容包含单元代码和单元格的运行结果。Cell content consists of cell code and the result of running the cell. 您可以使用 "单元操作" 菜单隐藏和显示单元代码和结果You can hide and show the cell code and result using the cell actions menu 单元操作 位于单元格的右上角。at the top right of the cell.

隐藏单元代码:To hide cell code:

  • 单击  向下插入符号 ,然后选择 "隐藏代码"Click Down Caret and select Hide Code

若要隐藏并显示单元结果,请执行下列任一操作:To hide and show the cell result, do any of the following:

  • 单击  向下插入符号 ,然后选择 "隐藏结果"Click Down Caret and select Hide Result
  • 选择Select 单元格最小化
  • Shift + o > 键入 EscType Esc > Shift + o

若要显示隐藏的单元代码或结果,请单击 " 显示 链接:To show hidden cell code or results, click the Show links:

显示隐藏的代码和结果Show hidden code and results

另请参阅可 折叠标题See also Collapsible headings.

笔记本隔离Notebook isolation

笔记本隔离是指笔记本之间变量和类的可见性。Notebook isolation refers to the visibility of variables and classes between notebooks. Azure Databricks 支持两种类型的隔离:Azure Databricks supports two types of isolation:

  • 变量和类隔离Variable and class isolation
  • Spark 会话隔离Spark session isolation

备注

由于附加到同一群集的所有笔记本都在同一群集 Vm 上执行,即使启用了 Spark 会话隔离,也不能保证群集内的用户隔离。Since all notebooks attached to the same cluster execute on the same cluster VMs, even with Spark session isolation enabled there is no guaranteed user isolation within a cluster.

变量和类隔离Variable and class isolation

变量和类仅在当前笔记本中可用。Variables and classes are available only in the current notebook. 例如,附加到同一个群集的两个笔记本可以定义具有相同名称的变量和类,但这些对象是不同的。For example, two notebooks attached to the same cluster can define variables and classes with the same name, but these objects are distinct.

若要定义对 _附加到同一群集的所有笔记本_可见的类,请在 包单元中定义类。To define a class that is visible to all notebooks attached to the same cluster, define the class in a package cell. 然后,可以通过使用其完全限定的名称(与访问附加的 Scala 或 Java 库中的类相同)来访问类。Then you can access the class by using its fully qualified name, which is the same as accessing a class in an attached Scala or Java library.

Spark 会话隔离 Spark session isolation

附加到运行 Apache Spark 2.0.0 和更高版本的群集的每个笔记本都有一个名为的 预定义变量 spark ,该变量表示 SparkSessionEvery notebook attached to a cluster running Apache Spark 2.0.0 and above has a pre-defined variable called spark that represents a SparkSession. SparkSession 是使用 Spark Api 以及设置运行时配置的入口点。SparkSession is the entry point for using Spark APIs as well as setting runtime configurations.

默认情况下启用 Spark 会话隔离。Spark session isolation is enabled by default. 你还可以使用 全局 临时视图跨笔记本共享临时视图。You can also use global temporary views to share temporary views across notebooks. 请参阅 创建视图创建视图See Create View or CREATE VIEW. 若要禁用 Spark 会话隔离, spark.databricks.session.sharetrueSpark 配置中将设置为。To disable Spark session isolation, set spark.databricks.session.share to true in the Spark configuration.

重要

spark.databricks.session.share如果设置为 "true",则会中断流式处理笔记本单元和流式处理作业使用的监视。Setting spark.databricks.session.share true breaks the monitoring used by both streaming notebook cells and streaming jobs. 尤其是在下列情况下:Specifically:

  • 不显示流式处理单元中的 关系图The graphs in streaming cells are not displayed.
  • 只要流正在运行,就不会阻止作业 (它们只是在 "成功" 完成,停止流) 。Jobs do not block as long as a stream is running (they just finish “successfully”, stopping the stream).
  • 作业中的流不会监视终止。Streams in jobs are not monitored for termination. 相反,你必须手动调用 awaitTermination()Instead you must manually call awaitTermination().
  • 对流式处理 DataFrames 调用 显示函数 不起作用。Calling the display function on streaming DataFrames doesn’t work.

使用、、和) (的单元格以及包含其他 (笔记本的单元格(即使用) 的单元格)是 %scala %python %r %sql %run 当前笔记本的一部分。Cells that trigger commands in other languages (that is, cells using %scala, %python, %r, and %sql) and cells that include other notebooks (that is, cells using %run) are part of the current notebook. 因此,这些单元位于与其他笔记本单元相同的会话中。Thus, these cells are in the same session as other notebook cells. 与此相反, 笔记本工作流 运行独立的笔记本 SparkSession ,这意味着在此类笔记本中定义的临时视图在其他笔记本中 不可见By contrast, a notebook workflow runs a notebook with an isolated SparkSession, which means temporary views defined in such a notebook are not visible in other notebooks.

版本控制Version control

Azure Databricks 具有笔记本的基本版本控制。Azure Databricks has basic version control for notebooks. 你可以在修订时执行以下操作:添加注释、还原和删除修订版本,并清除修订历史记录。You can perform the following actions on revisions: add comments, restore and delete revisions, and clear revision history.

若要访问笔记本修订版本,请单击笔记本工具栏右上角的 修订历史记录To access notebook revisions, click Revision History at the top right of the notebook toolbar.

本节内容:In this section:

添加注释Add a comment

将注释添加到最新版本:To add a comment to the latest revision:

  1. 单击修订版本。Click the revision.

  2. 单击 " 立即保存 " 链接。Click the Save now link.

    保存注释Save comment

  3. 在 "保存笔记本修订版本" 对话框中,输入注释。In the Save Notebook Revision dialog, enter a comment.

  4. 单击“ 保存”。Click Save. 笔记本修订版本连同输入的注释一起保存。The notebook revision is saved with the entered comment.

还原修订版本Restore a revision

还原修订版:To restore a revision:

  1. 单击修订版本。Click the revision.

  2. 单击 " 还原此修订版本"。Click Restore this revision.

    还原修订版本Restore revision

  3. 单击“确认” 。Click Confirm. 所选修订将成为笔记本的最新版本。The selected revision becomes the latest revision of the notebook.

删除修订版本Delete a revision

删除笔记本的修订条目:To delete a notebook’s revision entry:

  1. 单击修订版本。Click the revision.

  2. 单击垃圾桶图标Click the trash icon 回收站..

    删除修订版本Delete revision

  3. 单击 "是,清除"Click Yes, erase. 将从笔记本的修订历史记录中删除所选的修订版本。The selected revision is deleted from the notebook’s revision history.

清除修订历史记录Clear a revision history

清除笔记本的修订历史记录:To clear a notebook’s revision history:

  1. 选择 " 文件 > 清除修订历史记录Select File > Clear Revision History.

  2. 单击 "是,清除"Click Yes, clear. 将清除笔记本修订历史记录。The notebook revision history is cleared.

    警告

    清除后,修订历史记录将无法恢复。Once cleared, the revision history is not recoverable.

Git 版本控制Git version control

Azure Databricks 还可与以下基于 Git 的版本控制工具集成:Azure Databricks also integrates with these Git-based version control tools: