Feature engineering with MLlib
Apache Spark MLlib contains many utility functions for performing feature engineering at scale, including methods for encoding and transforming features. These methods can also be used to process features for other machine learning libraries.
Azure Databricks recommends the following Apache Spark MLLib guides:
- Extracting, transforming and selecting features with MLlib
- MLlib Programming Guide
- Python API Reference
- Scala API Reference
This PySpark-based notebook includes preprocessing steps that convert categorical data to numeric variables using category indexing and one-hot encoding.
Binary classification example
Feedback
https://aka.ms/ContentUserFeedback.
Coming soon: Throughout 2024 we will be phasing out GitHub Issues as the feedback mechanism for content and replacing it with a new feedback system. For more information see:Submit and view feedback for