深度學習管線 Deep Learning Pipelines

注意

本頁面說明 Databricks Runtime 6.6 ML 和以下版本中包含的開放原始碼 深度學習管線套件This page describes the open source Deep Learning Pipelines package included in Databricks Runtime 6.6 ML and below. 此頁面不適合做為 Azure Databricks 上深度學習管線之一般資訊的資源。This page is not intended as a resource for general information about deep learning pipelines on Azure Databricks.

深度學習管線套件是一種高階深度學習架構,可透過 Apache Spark MLlib 管線 API 來加速常見的深度學習工作流程,並使用 Spark 來向外延展海量資料。The Deep Learning Pipelines package is a high-level deep learning framework that facilitates common deep learning workflows via the Apache Spark MLlib Pipelines API and scales out deep learning on big data using Spark. 它是採用 Apache 2.0 授權的開放原始碼專案。It is an open source project employing the Apache 2.0 License.

深度學習管線封裝呼叫較低層級的深度學習程式庫。The Deep Learning Pipelines package calls into lower-level deep learning libraries. 它支援 TensorFlow 和 Keras 與 TensorFlow 後端。It supports TensorFlow and Keras with the TensorFlow backend.

Databricks Runtime 7.0 ML 和更新版本的移轉指南Migration guide to Databricks Runtime 7.0 ML and above

重要

部分的深度學習管線程式庫已 sparkdlDATABRICKS RUNTIME 7.0 ML 中移除 (不支援的) ,尤其是 Apache Spark ML 管線中使用的轉換器和估算器。Parts of the Deep Learning Pipelines library sparkdl have been removed in Databricks Runtime 7.0 ML (Unsupported), specifically, the Transformers and Estimators used in Apache Spark ML pipelines. 請參閱下列各節,以瞭解遷移秘訣和因應措施。See the following sections for migration tips and workarounds.

讀取影像Reading images

深度學習管線套件包含映射讀取器 sparkdl.image.imageIO ,已在 DATABRICKS RUNTIME 7.0 ML 中移除, (不支援的) The Deep Learning Pipelines package includes an image reader sparkdl.image.imageIO, which was removed in Databricks Runtime 7.0 ML (Unsupported).

相反地,請使用 Apache Spark 中的 影像資料來源二進位檔案資料來源Instead, use the image data source or binary file data source from Apache Spark. 載入資料中的許多範例筆記本都會顯示這兩個數據源的使用案例。Many of the example notebooks in Load data show use cases of these two data sources.

傳輸學習Transfer learning

深度學習管線套件包含 Spark ML 轉換器,可 sparkdl.DeepImageFeaturizer 利用深度學習模型來加速傳輸學習。The Deep Learning Pipelines package includes a Spark ML Transformer sparkdl.DeepImageFeaturizer for facilitating transfer learning with deep learning models. DeepImageFeaturizer 已在 Databricks Runtime 7.0 ML 中移除 (不支援的) DeepImageFeaturizer was removed in Databricks Runtime 7.0 ML (Unsupported).

相反地,請使用 pandas Udf 來執行具有深度學習模型的特徵化。Instead, use pandas UDFs to perform featurization with deep learning models. Pandas udf與其較新的 variant 純量 Iterator pandas udf,提供更具彈性的 api,支援更多深度學習程式庫,並提供更高的效能。pandas UDFs, and their newer variant Scalar Iterator pandas UDFs, offer more flexible APIs, support more deep learning libraries, and give higher performance.

如需使用 pandas Udf 進行轉移學習的範例,請參閱 適用于傳輸學習的特徵化Refer to Featurization for transfer learning for examples of transfer learning with pandas UDFs.

分散式超參數微調Distributed hyperparameter tuning

深度學習管線套件包含 Spark ML 估算器,可 sparkdl.KerasImageFileEstimator 使用 SPARK ml 微調公用程式微調超參數。The Deep Learning Pipelines package includes a Spark ML Estimator sparkdl.KerasImageFileEstimator for tuning hyperparameters using Spark ML tuning utilities. KerasImageFileEstimator 已在 Databricks Runtime 7.0 ML 中移除 (不支援的) KerasImageFileEstimator was removed in Databricks Runtime 7.0 ML (Unsupported).

相反地,請使用 超參數微調與 Hyperopt 來散發深度學習模型的超參數微調。Instead, use Hyperparameter tuning with Hyperopt to distribute hyperparameter tuning for deep learning models.

分散式推斷Distributed inference

深度學習管線套件包含數個用於散發推斷的 Spark ML 轉換器, Databricks Runtime 7.0 ML (不支援的) 中移除這些轉換器:The Deep Learning Pipelines package includes several Spark ML Transformers for distributing inference, all of which are removed in Databricks Runtime 7.0 ML (Unsupported):

  • DeepImagePredictor
  • TFImageTransformer
  • KerasImageFileTransformer
  • TFTransformer
  • KerasTransformer

相反地,請使用 Pandas udf 在 Spark 資料框架上執行推斷,並遵循 模型推斷中的範例。Instead, use pandas UDFs to run inference on Spark DataFrames, following the examples in Model inference.

將模型部署為 SQL UdfDeploy models as SQL UDFs

深度學習管線套件包含一個公用程式 sparkdl.udf.keras_image_model.registerKerasImageUDF ,可將深度學習模型部署為從 SPARK SQL 呼叫的 UDF。The Deep Learning Pipelines package includes a utility sparkdl.udf.keras_image_model.registerKerasImageUDF for deploying a deep learning model as a UDF callable from Spark SQL. registerKerasImageUDF 已在 Databricks Runtime 7.0 ML 中移除 (不支援的) registerKerasImageUDF was removed in Databricks Runtime 7.0 ML (Unsupported).

相反地,請使用 MLflow 將模型匯出為 UDF,並遵循 scikit-learn 中的範例 -瞭解 Azure ML 上的模型部署Instead, use MLflow to export the model as a UDF, following the example in scikit-learn model deployment on Azure ML.