Model training examples
This section includes examples showing how to train machine learning and deep learning models on Azure Databricks using many popular open-source libraries.
You can also use Databricks AutoML, which automatically prepares a dataset for model training, performs a set of trials using open-source libraries such as scikit-learn and XGBoost, and creates a Python notebook with the source code for each trial run so you can review, reproduce, and modify the code.
Machine learning examples
Package | Notebook(s) | Features |
---|---|---|
scikit-learn | Machine learning quickstart | Classification model, MLflow, automated hyperparameter tuning with Hyperopt and MLflow |
scikit-learn | Machine learning with Model Registry | Classification model, MLflow, automated hyperparameter tuning with Hyperopt and MLflow, Model Registry |
scikit-learn | End-to-end example | Classification model, MLflow, automated hyperparameter tuning with Hyperopt and MLflow, XGBoost, Model Registry, Model Serving |
MLlib | MLlib examples | Binary classification, decision trees, GBT regression, Structured Streaming, custom transformer |
xgboost | XGBoost examples | Python, PySpark, and Scala, single node workloads and distributed training |
Deep learning examples
Also see Best practices for deep learning on Azure Databricks.
Package | Notebook | Features |
---|---|---|
TensorFlow Keras | Deep learning quickstart | TensorFlow Keras, TensorBoard, Hyperopt, MLflow |
TensorFlow (single node) | TensorFlow tutorial with MNIST dataset | TensorFlow, TensorBoard |
PyTorch (single node) | PyTorch tutorial with MNIST dataset | PyTorch |
For distributed deep learning training, see:
Package | Notebook | Features |
---|---|---|
HorovodRunner (TensorFlow Keras) | TensorFlow Keras MNIST example | TensorFlow Keras single node to distributed training |
HorovodRunner (PyTorch) | PyTorch MNIST example | PyTorch single node to distributed training |
HorovodRunner | Horovod timeline | Horovod timeline |
horovod.spark (PyTorch and Keras) |
horovod.spark package | horovod.spark estimator API for use in ML pipelines with Keras and PyTorch |
spark-tensorflow-distributor |
Distributed Training with TensorFlow | Distributed training with TensorFlow on Apache Spark clusters |
Hyperparameter tuning examples
For general information about hyperparameter tuning in Azure Databricks, see Hyperparameter tuning.
Package | Notebook | Features |
---|---|---|
Hyperopt | Distributed hyperopt | Distributed hyperopt, scikit-learn, MLflow |
Hyperopt | Compare models | Use distributed hyperopt to search hyperparameter space for different model types simultaneously |
Hyperopt | Distributed training algorithms and hyperopt | Hyperopt, MLlib |
Hyperopt | Hyperopt best practices | Best practices for datasets of different sizes |
Feedback
Submit and view feedback for