Use Apache Spark MLlib on Azure Databricks
This page provides example notebooks showing how to use MLlib on Azure Databricks.
Apache Spark MLlib is the Apache Spark machine learning library consisting of common learning algorithms and utilities, including classification, regression, clustering, collaborative filtering, dimensionality reduction, and underlying optimization primitives. For reference information about MLlib features, Azure Databricks recommends the following Apache Spark API references:
For information about using Apache Spark MLlib from R, see the R machine learning documentation.
Binary classification example notebook
This notebook shows you how to build a binary classification application using the Apache Spark MLlib Pipelines API.
Binary classification notebook
Decision trees example notebooks
These examples demonstrate various applications of decision trees using the Apache Spark MLlib Pipelines API.
Decision trees
These notebooks show you how to perform classifications with decision trees.
Decision trees for digit recognition notebook
Decision trees for SFO survey notebook
GBT regression using MLlib pipelines
This notebook shows you how to use MLlib pipelines to perform a regression using gradient boosted trees to predict bike rental counts (per hour) from information such as day of the week, weather, season, and so on.
Bike sharing regression notebook
Apache Spark MLlib pipelines and Structured Streaming example
This notebook shows how to train an Apache Spark MLlib pipeline on historic data and apply it to streaming data.
MLlib pipeline Structured Streaming notebook
Advanced Apache Spark MLlib notebook example
This notebook illustrates how to create a custom transformer.
Custom transformer notebook
Feedback
https://aka.ms/ContentUserFeedback.
Coming soon: Throughout 2024 we will be phasing out GitHub Issues as the feedback mechanism for content and replacing it with a new feedback system. For more information see:Submit and view feedback for