Model training examples

This section includes examples showing how to train machine learning and deep learning models on Azure Databricks using many popular open-source libraries.

You can also use Databricks AutoML, which automatically prepares a dataset for model training, performs a set of trials using open-source libraries such as scikit-learn and XGBoost, and creates a Python notebook with the source code for each trial run so you can review, reproduce, and modify the code.

Machine learning examples

Package Notebook(s) Features
scikit-learn Machine learning quickstart Classification model, MLflow, automated hyperparameter tuning with Hyperopt and MLflow
scikit-learn Machine learning with Model Registry Classification model, MLflow, automated hyperparameter tuning with Hyperopt and MLflow, Model Registry
scikit-learn End-to-end example Classification model, MLflow, automated hyperparameter tuning with Hyperopt and MLflow, XGBoost, Model Registry, Model Serving
MLlib MLlib examples Binary classification, decision trees, GBT regression, Structured Streaming, custom transformer
xgboost XGBoost examples Python, PySpark, and Scala, single node workloads and distributed training

Deep learning examples

Also see Best practices for deep learning on Azure Databricks.

Package Notebook Features
TensorFlow Keras Deep learning quickstart TensorFlow Keras, TensorBoard, Hyperopt, MLflow
TensorFlow (single node) TensorFlow tutorial with MNIST dataset TensorFlow, TensorBoard
PyTorch (single node) PyTorch tutorial with MNIST dataset PyTorch

For distributed deep learning training, see:

Package Notebook Features
HorovodRunner (TensorFlow Keras) TensorFlow Keras MNIST example TensorFlow Keras single node to distributed training
HorovodRunner (PyTorch) PyTorch MNIST example PyTorch single node to distributed training
HorovodRunner Horovod timeline Horovod timeline
horovod.spark (PyTorch and Keras) horovod.spark package horovod.spark estimator API for use in ML pipelines with Keras and PyTorch
spark-tensorflow-distributor Distributed Training with TensorFlow Distributed training with TensorFlow on Apache Spark clusters

Hyperparameter tuning examples

For general information about hyperparameter tuning in Azure Databricks, see Hyperparameter tuning.

Package Notebook Features
Hyperopt Distributed hyperopt Distributed hyperopt, scikit-learn, MLflow
Hyperopt Compare models Use distributed hyperopt to search hyperparameter space for different model types simultaneously
Hyperopt Distributed training algorithms and hyperopt Hyperopt, MLlib
Hyperopt Hyperopt best practices Best practices for datasets of different sizes