Azure Databricks 可使用 displaydisplayHTML 函式來現成支援各種視覺效果類型。Azure Databricks supports various types of visualizations out of the box using the display and displayHTML functions.

Azure Databricks 也可以透過原生方式支援 Python 和 R 的視覺效果程式庫,並可讓您安裝和使用第三方程式庫。Azure Databricks also natively supports visualization libraries in Python and R and lets you install and use third-party libraries.

display 函式display function

display 函式支援數種資料和視覺效果類型。The display function supports several data and visualization types.

本節內容:In this section:

資料類型Data types


若要在 Azure Databricks 中建立資料框架視覺效果,最簡單的方式就是呼叫 display(<dataframe-name>)The easiest way to create a DataFrame visualization in Azure Databricks is to call display(<dataframe-name>). 例如,如果您有一個鑽石資料集的 Spark 資料框架 diamonds_df,而此資料集是以鑽石色彩加以分組,並計算平均價格,然後您呼叫For example, if you have a Spark DataFrame diamonds_df of a diamonds dataset grouped by diamond color, computing the average price, and you call

from pyspark.sql.functions import avg
diamonds_df = spark.read.csv("/databricks-datasets/Rdatasets/data-001/csv/ggplot2/diamonds.csv", header="true", inferSchema="true")


隨即會顯示鑽石色彩與平均價格的資料表。A table of diamond color versus average price displays.

鑽石色彩與平均價格Diamond color versus average price


如果您在呼叫 display 函式之後看到 OK,系統卻未呈現任何內容,原因很可能是您傳入的資料框架或集合是空的。If you see OK with no rendering after calling the display function, mostly likely the DataFrame or collection you passed in is empty.

display() 支援 pandas 資料框架display() supports pandas DataFrames. 如果您在沒有 display 的情況下參考 pandas 或 Koalas 資料框架,則資料表的呈現方式會和在 Jupyter 時一樣。If you reference a pandas or Koalas DataFrame without display, the table is rendered as it would be in Jupyter.

資料框架 display 方法 DataFrame display method


Databricks Runtime 7.1 和更新版本。Available in Databricks Runtime 7.1 and above.

PySparkpandasKoalas 資料框架有 display 方法可呼叫 Azure Databricks display 函數。PySpark, pandas, and Koalas DataFrames have a display method that calls the Azure Databricks display function. 您可以在簡單的資料框架作業之後呼叫,例如:You can call it after a simple DataFrame operation, for example:

diamonds_df = spark.read.csv("/databricks-datasets/Rdatasets/data-001/csv/ggplot2/diamonds.csv", header="true", inferSchema="true")

或在一系列連鎖資料框架作業結束時,例如:or at the end of a series of chained DataFrame operations, for example:

from pyspark.sql.functions import avg
diamonds_df = spark.read.csv("/databricks-datasets/Rdatasets/data-001/csv/ggplot2/diamonds.csv", header="true", inferSchema="true")


影像 Images

display 會將包含影像資料類型的資料行呈現為 Rich HTML。display renders columns containing image data types as rich HTML. display 嘗試針對符合 Spark ImageSchemaDataFrame 資料行轉譯影像縮圖。display attempts to render image thumbnails for DataFrame columns matching the Spark ImageSchema. 縮圖呈現適用於透過 readImages 函式成功讀入的任何影像。Thumbnail rendering works for any images successfully read in through the readImages function. 對於透過其他方式產生的影像值,Azure Databricks 則支援呈現 1 通道、3 通道或 4 通道的影像 (每個通道都包含單一位元組),並具有下列限制:For image values generated through other means, Azure Databricks supports the rendering of 1, 3, or 4 channel images (where each channel consists of a single byte), with the following constraints:

  • 一通道的影像mode 欄位必須等於 0。One-channel images: mode field must be equal to 0. heightwidthnChannels 欄位必須正確描述 data 欄位中的二進位影像資料height, width, and nChannels fields must accurately describe the binary image data in the data field
  • 三通道的影像mode 欄位必須等於 16。Three-channel images: mode field must be equal to 16. heightwidthnChannels 欄位必須正確描述 data 欄位中的二進位影像資料。height, width, and nChannels fields must accurately describe the binary image data in the data field. data 欄位必須包含由三位元組區塊組成的像素資料,且每個像素的通道排序為 (blue, green, red)The data field must contain pixel data in three-byte chunks, with the channel ordering (blue, green, red) for each pixel.
  • 四通道的影像mode 欄位必須等於 24。Four-channel images: mode field must be equal to 24. heightwidthnChannels 欄位必須正確描述 data 欄位中的二進位影像資料。height, width, and nChannels fields must accurately describe the binary image data in the data field. data 欄位必須包含由四位元組區塊組成的像素資料,且每個像素的通道排序為 (blue, green, red, alpha)The data field must contain pixel data in four-byte chunks, with the channel ordering (blue, green, red, alpha) for each pixel.

假設您有一個資料夾,裡面有一些影像:Suppose you have a folder containing some images:

影像資料的資料夾Folder of image data

如果您使用 ImageSchema.readImages 將影像讀入到資料框架中,然後顯示資料框架,則 display 會呈現影像的縮圖:If you read the images into a DataFrame with ImageSchema.readImages and then display the DataFrame, display renders thumbnails of the images:

from pyspark.ml.image import ImageSchema
image_df = ImageSchema.readImages(sample_img_dir)

顯示影像資料框架Display image DataFrame

結構化串流資料框架 Structured Streaming DataFrames

若要即時視覺化串流查詢的結果,您可以在 Scala 和 Python 中 display 結構化串流資料框架。To visualize the result of a streaming query in real time you can display a Structured Streaming DataFrame in Scala and Python.

streaming_df = spark.readStream.format("rate").load()
val streaming_df = spark.readStream.format("rate").load()

display 支援下列選擇性參數:display supports the following optional parameters:

  • streamName:串流查詢名稱。streamName: the streaming query name.
  • trigger (Scala) 和 processingTime (Python):定義執行串流查詢的頻率。trigger (Scala) and processingTime (Python): defines how often the streaming query is run. 如果未指定,系統會在前一個處理完成時立即檢查新資料的可用性。If not specified, the system checks for availability of new data as soon as the previous processing has completed. 為了降低生產環境中的成本,Databricks 建議您 一律 設定觸發程式間隔。To reduce the cost in production, Databricks recommends that you always set a trigger interval. 使用 Databricks Runtime 8.0 和更新版本時,預設觸發程式間隔為500毫秒。With Databricks Runtime 8.0 and above, the default trigger interval is 500 ms.
  • checkpointLocation:系統寫入所有檢查點資訊的位置。checkpointLocation: the location where the system writes all the checkpoint information. 如果未指定,系統會在 DBFS 上自動產生暫時的檢查點位置。If it is not specified, the system automatically generates a temporary checkpoint location on DBFS. 為了讓您的串流繼續從停止的地方處理資料,您必須提供檢查點位置。In order for your stream to continue processing data from where it left off, you must provide a checkpoint location. Databricks 建議您 一律 在生產環境中指定 checkpointLocation 選項。Databricks recommends that in production you always specify the checkpointLocation option.
streaming_df = spark.readStream.format("rate").load()
display(streaming_df.groupBy().count(), processingTime = "5 seconds", checkpointLocation = "dbfs:/<checkpoint-path>")
import org.apache.spark.sql.streaming.Trigger

val streaming_df = spark.readStream.format("rate").load()
display(streaming_df.groupBy().count(), trigger = Trigger.ProcessingTime("5 seconds"), checkpointLocation = "dbfs:/<checkpoint-path>")

如需這些參數的詳細資訊,請參閱啟動串流查詢For more information about these parameters, see Starting Streaming Queries.

繪圖類型Plot types

display 函式支援一組豐富的繪圖類型:The display function supports a rich set of plot types:

圖表類型Chart types

選擇及設定圖表類型Choose and configure a chart type

若要選擇橫條圖,請按一下橫條圖圖示。To choose a bar chart, click the bar chart icon 圖表按鈕::

長條圖圖示Bar chart icon

若要選擇另一個繪圖類型,請按一下To choose another plot type, click 向下箭號按鈕 (在長條圖to the right of the bar chart 圖表按鈕 右邊) 並選擇繪圖類型。and choose the plot type.

圖表工具列Chart toolbar

折線圖與橫條圖都有內建的工具列,可支援一組豐富的用戶端互動。Both line and bar charts have a built-in toolbar that support a rich set of client-side interactions.

圖表工具列Chart toolbar

若要設定圖表,請按一下 [繪圖選項]。To configure a chart, click Plot Options….

繪圖選項Plot options

折線圖有幾個自訂圖表選項:設定 Y 軸範圍、顯示及隱藏點,以及顯示具有記錄刻度的 Y 軸。The line chart has a few custom chart options: setting a Y-axis range, showing and hiding points, and displaying the Y-axis with a log scale.

如需舊版圖表類型的詳細資訊,請參閱:For information about legacy chart types, see:

跨圖表的色彩一致性Color consistency across charts

Azure Databricks 支援兩種圖表色彩一致性:數列集和全域。Azure Databricks supports two kinds of color consistency across charts: series set and global.

如果您的數列有同樣的值,但順序不同 (例如 A = ["Apple", "Orange", "Banana"] 和 B = ["Orange", "Banana", "Apple"]),則「數列集」色彩一致性會對同樣的值指派同樣的色彩。Series set color consistency assigns the same color to the same value if you have series with the same values but in different orders (for example, A = ["Apple", "Orange", "Banana"] and B = ["Orange", "Banana", "Apple"]). 這些值會先排序再繪圖,因此這兩個數列範例會以相同方式排序 (["Apple", "Banana", "Orange"]),而且相同的值會獲得相同的色彩。The values are sorted before plotting, so both legends are sorted the same way (["Apple", "Banana", "Orange"]), and the same values are given the same colors. 不過,如果您有一個數列 (C = ["Orange", "Banana"]),因為此集合與 A 集合並不相同,因此其色彩不會與集合 A 一致。However, if you have a series C = ["Orange", "Banana"], it would not be color consistent with set A because the set isn’t the same. 排序演算法會對集合 C 中的 "Banana" 指派第一個色彩,…並對集合 A 中的 "Banana" 指派第二個色彩。如果您想要讓這兩個數列有一致的色彩,則可以指定圖表應具有全域色彩一致性。The sorting algorithm would assign the first color to “Banana” in set C but the second color to “Banana” in set A. If you want these series to be color consistent, you can specify that charts should have global color consistency.

在「全域」色彩一致性中,不論數列擁有什麼值,這些值一定會對應到相同色彩。In global color consistency, each value is always mapped to the same color no matter what values the series have. 若要為每個圖表啟用此選項,請選取 [全域色彩一致性] 核取方塊。To enable this for each chart, select the Global color consistency checkbox.

全域色彩一致性Global color consistency


為了實現此一致性,Azure Databricks 會直接從值雜湊為色彩。To do achieve this consistency, Azure Databricks hashes directly from values to colors. 為避免衝突 (兩個值進入到完全相同的色彩),此雜湊會指向一組大量色彩,但副作用是無法保證色彩是否美觀或容易辨別;其中有許多色彩必然會極為相近。To avoid collisions (where two values go to the exact same color), the hash is to a large set of colors, which has the side effect that nice-looking or easily distinguishable colors cannot be guaranteed; with many colors there are bound to be some that are very similar looking.

機器學習視覺效果 Machine learning visualizations

除了標準圖表類型之外,display 函式還支援下列機器學習訓練參數和結果的視覺效果:In addition to the standard chart types, the display function supports visualizations of the following machine learning training parameters and results:


針對線性和羅吉斯迴歸,display 支援呈現配適與殘差繪圖。For linear and logistic regressions, display supports rendering a fitted versus residuals plot. 若要取得此繪圖,請提供模型和資料框架。To obtain this plot, you supply the model and DataFrame.

下列範例會執行城市人口與房價資料的線性迴歸,然後顯示殘差資料與配適資料。The following example runs a linear regression on city population to house sale price data and then displays the residuals versus the fitted data.

# Load data
pop_df = spark.read.csv("/databricks-datasets/samples/population-vs-price/data_geo.csv", header="true", inferSchema="true")

# Drop rows with missing values and rename the feature and label columns, replacing spaces with _
from pyspark.sql.functions import col
pop_df = pop_df.dropna() # drop rows with missing values
exprs = [col(column).alias(column.replace(' ', '_')) for column in pop_df.columns]

# Register a UDF to convert the feature (2014_Population_estimate) column vector to a VectorUDT type and apply it to the column.
from pyspark.ml.linalg import Vectors, VectorUDT

spark.udf.register("oneElementVec", lambda d: Vectors.dense([d]), returnType=VectorUDT())
tdata = pop_df.select(*exprs).selectExpr("oneElementVec(2014_Population_estimate) as features", "2015_median_sales_price as label")

# Run a linear regression
from pyspark.ml.regression import LinearRegression

lr = LinearRegression()
modelA = lr.fit(tdata, {lr.regParam:0.0})

# Plot residuals versus fitted data
display(modelA, tdata)

顯示殘差Display residuals

ROC 曲線ROC curves

針對羅吉斯迴歸,display 可支援呈現 ROC 曲線。For logistic regressions, display supports rendering an ROC curve. 若要取得此繪圖,請提供模型、所準備好要輸入到 fit 方法的資料,以及 "ROC" 參數。To obtain this plot, you supply the model, the prepped data that is input to the fit method, and the parameter "ROC".

下列範例會開發分類器,以透過各種個人屬性來預測個人年收入是 <= 50K 還是 >50K。The following example develops a classifier that predicts if an individual earns <=50K or >50k a year from various attributes of the individual. 成人資料集衍生自人口普查資料,其中會包含 48842 個人和其年收入的相關資訊。The Adult dataset derives from census data, and consists of information about 48842 individuals and their annual income.

本節中的範例程式碼會使用 One-hot 編碼。The example code in this section uses one-hot encoding. 此函式已使用 Apache Spark 3.0 重新命名,因此程式碼會根據您使用的 Databricks Runtime 版本而略有不同。The function was renamed with Apache Spark 3.0, so the code is slightly different depending on the version of Databricks Runtime you are using. 如果您使用 Databricks Runtime 6.x 或較低版本,則必須在程式碼中調整兩行,如程式碼註解中所述。If you are using Databricks Runtime 6.x or below, you must adjust two lines in the code as described in the code comments.

# This code uses one-hot encoding to convert all categorical variables into binary vectors.

schema = """`age` DOUBLE,
`workclass` STRING,
`fnlwgt` DOUBLE,
`education` STRING,
`education_num` DOUBLE,
`marital_status` STRING,
`occupation` STRING,
`relationship` STRING,
`race` STRING,
`sex` STRING,
`capital_gain` DOUBLE,
`capital_loss` DOUBLE,
`hours_per_week` DOUBLE,
`native_country` STRING,
`income` STRING"""

dataset = spark.read.csv("/databricks-datasets/adult/adult.data", schema=schema)

from pyspark.ml import Pipeline
from pyspark.ml.feature import OneHotEncoder, StringIndexer, VectorAssembler
# If you are using Databricks Runtime 6.x or below, comment out the preceding line and uncomment the following line.
# from pyspark.ml.feature import OneHotEncoderEstimator, StringIndexer, VectorAssembler
categoricalColumns = ["workclass", "education", "marital_status", "occupation", "relationship", "race", "sex", "native_country"]

stages = [] # stages in the Pipeline
for categoricalCol in categoricalColumns:
    # Category indexing with StringIndexer
    stringIndexer = StringIndexer(inputCol=categoricalCol, outputCol=categoricalCol + "Index")
    # Use OneHotEncoder to convert categorical variables into binary SparseVectors
    encoder = OneHotEncoder(inputCols=[stringIndexer.getOutputCol()], outputCols=[categoricalCol + "classVec"])
    # If you are using Databricks Runtime 6.x or below, comment out the preceding line and uncomment the following line.
    # encoder = OneHotEncoderEstimator(inputCols=[stringIndexer.getOutputCol()], outputCols=[categoricalCol + "classVec"])
    # Add stages.  These are not run here, but will run all at once later on.
    stages += [stringIndexer, encoder]

# Convert label into label indices using the StringIndexer
label_stringIdx = StringIndexer(inputCol="income", outputCol="label")
stages += [label_stringIdx]

# Transform all features into a vector using VectorAssembler
numericCols = ["age", "fnlwgt", "education_num", "capital_gain", "capital_loss", "hours_per_week"]
assemblerInputs = [c + "classVec" for c in categoricalColumns] + numericCols
assembler = VectorAssembler(inputCols=assemblerInputs, outputCol="features")
stages += [assembler]

# Run the stages as a Pipeline. This puts the data through all of the feature transformations in a single call.

partialPipeline = Pipeline().setStages(stages)
pipelineModel = partialPipeline.fit(dataset)
preppedDataDF = pipelineModel.transform(dataset)

# Fit logistic regression model

from pyspark.ml.classification import LogisticRegression
lrModel = LogisticRegression().fit(preppedDataDF)

# ROC for data
display(lrModel, preppedDataDF, "ROC")

顯示 ROCDisplay ROC

若要顯示殘差,請省略 "ROC" 參數:To display the residuals, omit the "ROC" parameter:

display(lrModel, preppedDataDF)

顯示羅吉斯迴歸殘差Display logistic regression residuals

決策樹Decision trees

display 函式可支援呈現決策樹。The display function supports rendering a decision tree.

若要取得此視覺效果,請提供決策樹模型。To obtain this visualization, you supply the decision tree model.

下列範例會訓練決策樹,使其能夠從手寫數字影像的 MNIST 資料集中辨識數字 (0 - 9),然後顯示該決策樹。The following examples train a tree to recognize digits (0 - 9) from the MNIST dataset of images of handwritten digits and then displays the tree.

trainingDF = spark.read.format("libsvm").load("/databricks-datasets/mnist-digits/data-001/mnist-digits-train.txt").cache()
testDF = spark.read.format("libsvm").load("/databricks-datasets/mnist-digits/data-001/mnist-digits-test.txt").cache()

from pyspark.ml.classification import DecisionTreeClassifier
from pyspark.ml.feature import StringIndexer
from pyspark.ml import Pipeline

indexer = StringIndexer().setInputCol("label").setOutputCol("indexedLabel")

dtc = DecisionTreeClassifier().setLabelCol("indexedLabel")

# Chain indexer + dtc together into a single ML Pipeline.
pipeline = Pipeline().setStages([indexer, dtc])

model = pipeline.fit(trainingDF)
val trainingDF = spark.read.format("libsvm").load("/databricks-datasets/mnist-digits/data-001/mnist-digits-train.txt").cache
val testDF = spark.read.format("libsvm").load("/databricks-datasets/mnist-digits/data-001/mnist-digits-test.txt").cache

import org.apache.spark.ml.classification.{DecisionTreeClassifier, DecisionTreeClassificationModel}
import org.apache.spark.ml.feature.StringIndexer
import org.apache.spark.ml.Pipeline

val indexer = new StringIndexer().setInputCol("label").setOutputCol("indexedLabel")
val dtc = new DecisionTreeClassifier().setLabelCol("indexedLabel")
val pipeline = new Pipeline().setStages(Array(indexer, dtc))

val model = pipeline.fit(trainingDF)
val tree = model.stages.last.asInstanceOf[DecisionTreeClassificationModel]


顯示決策樹Display decision tree

displayHTML 函式 displayHTML function

Azure Databricks 程式設計語言筆記本 (Python、R 和 Scala) 可使用 displayHTML 函式來支援 HTML 圖形;您可以對該函式傳遞任何 HTML、CSS 或 JavaScript 程式碼。Azure Databricks programming language notebooks (Python, R, and Scala) support HTML graphics using the displayHTML function; you can pass the function any HTML, CSS, or JavaScript code. 此函式可使用 JavaScript 程式庫 (例如 D3) 來支援互動式圖形。This function supports interactive graphics using JavaScript libraries such as D3.

如需使用 displayHTML 的範例,請參閱:For examples of using displayHTML, see:


displayHTML iframe 會從網域 databricksusercontent.com 提供,而 iframe 沙箱會包含 allow-same-origin 屬性。The displayHTML iframe is served from the domain databricksusercontent.com, and the iframe sandbox includes the allow-same-origin attribute. databricksusercontent.com 必須可從瀏覽器存取。databricksusercontent.com must be accessible from your browser. 如果您的公司網路目前已封鎖,則必須將其新增至允許清單。If it is currently blocked by your corporate network, it must added to an allow list.

各語言的視覺效果Visualizations by language

本節內容:In this section:

Python 中的視覺效果Visualizations in Python

若要在 Python 中繪製資料,請使用 display 函式,如下所示:To plot data in Python, use the display function as follows:

diamonds_df = spark.read.csv("/databricks-datasets/Rdatasets/data-001/csv/ggplot2/diamonds.csv", header="true", inferSchema="true")


Python 長條圖Python bar chart

本節內容:In this section:

深入探討 Python 筆記本Deep dive Python notebook

如需使用 display 來深入探討 Python 視覺效果,請參閱筆記本:For a deep dive into Python visualizations using display, see the notebook:


您也可以使用其他 Python 程式庫來產生繪圖。You can also use other Python libraries to generate plots. Databricks Runtime 包含 seaborn 視覺效果程式庫。The Databricks Runtime includes the seaborn visualization library. 若要建立 seaborn 繪圖,請匯入程式庫、建立繪圖,然後將繪圖傳遞至 display 函式。To create a seaborn plot, import the library, create a plot, and pass the plot to the display function.

import seaborn as sns

df = sns.load_dataset("iris")
g = sns.PairGrid(df, diag_sharey=False)
g.map_diag(sns.kdeplot, lw=3)



Seaborn 繪圖Seaborn plot

其他 Python 程式庫Other Python libraries

R 中的視覺效果Visualizations in R

若要在 R 中繪製資料,請使用 display 函式,如下所示:To plot data in R, use the display function as follows:

diamonds_df <- read.df("/databricks-datasets/Rdatasets/data-001/csv/ggplot2/diamonds.csv", source = "csv", header="true", inferSchema = "true")

display(arrange(agg(groupBy(diamonds_df, "color"), "price" = "avg"), "color"))

您可以使用預設的 R 繪圖 函式。You can use the default R plot function.

fit <- lm(Petal.Length ~., data = iris)
layout(matrix(c(1,2,3,4),2,2)) # optional 4 graphs/page

R 預設繪圖R default plot

您也可以使用任何 R 視覺效果套件。You can also use any R visualization package. R 筆記本會將產生的繪圖擷取為 .png,並以內嵌方式加以顯示。The R notebook captures the resulting plot as a .png and displays it inline.

本節內容:In this section:


Lattice 套件支援柵狀圖—顯示一個變數或多個變數之間關係的圖表 (以一個或多個其他變數作為條件)。The Lattice package supports trellis graphs—graphs that display a variable or the relationship between variables, conditioned on one or more other variables.

xyplot(price ~ carat | cut, diamonds, scales = list(log = TRUE), type = c("p", "g", "smooth"), ylab = "Log price")

R Lattice 繪圖R Lattice plot


DandEFA 套件支援蒲公英 (dandelion) 繪圖。The DandEFA package supports dandelion plots.

install.packages("DandEFA", repos = "https://cran.us.r-project.org")
timss2011 <- na.omit(timss2011)
dandpal <- rev(rainbow(100, start = 0, end = 0.2))
facl <- factload(timss2011,nfac=5,method="prax",cormeth="spearman")
facl <- factload(timss2011,nfac=8,method="mle",cormeth="pearson")

R DandEFA 繪圖R DandEFA plot


Plotly R 套件會依賴 htmlwidgets for R。如需安裝指示和筆記本,請參閱 htmlwidgetsThe Plotly R package relies on htmlwidgets for R. For installation instructions and a notebook, see htmlwidgets.

其他 R 程式庫Other R libraries

Scala 中的視覺效果Visualizations in Scala

若要在 Scala 中繪製資料,請使用 display 函式,如下所示:To plot data in Scala, use the display function as follows:

val diamonds_df = spark.read.format("csv").option("header","true").option("inferSchema","true").load("/databricks-datasets/Rdatasets/data-001/csv/ggplot2/diamonds.csv")


Scala 長條圖Scala bar chart

深入探討 Scala 筆記本Deep dive Scala notebook

如需使用 display 來深入探討 Scala 視覺效果,請參閱筆記本:For a deep dive into Scala visualizations using display, see the notebook:

SQL 中的視覺效果 Visualizations in SQL

當您執行 SQL 查詢時,Azure Databricks 會自動擷取其中的某些資料,並將其顯示為資料表。When you run a SQL query, Azure Databricks automatically extracts some of the data and displays it as a table.

SELECT color, avg(price) AS price FROM diamonds GROUP BY color ORDER BY COLOR

SQL 資料表SQL table

您可以從該處選取不同的圖表類型。From there you can select different chart types.

SQL 長條圖SQL bar chart