Machine learning in the real world is messy. Data sources contain missing values, include redundant rows, or may not fit in memory. Feature engineering often requires domain expertise and can be tedious. Modeling too often mixes data science and systems engineering, requiring not only knowledge of algorithms but also of machine architecture and distributed systems.
Azure Databricks simplifies this process. The following 10-minute tutorial notebook shows an end-to-end example of training machine learning models on tabular data. You can import this notebook and run it yourself, or copy code-snippets and ideas for your own use.
This notebook requires Databricks Runtime 6.5 ML or above.