What is automated machine learning?

Automated machine learning, also referred to as automated ML, is the process of automating the time consuming, iterative tasks of machine learning model development. It allows data scientists, analysts, and developers to build ML models with high scale, efficiency, and productivity all while sustaining model quality.

Traditional machine learning model development is resource-intensive, requiring significant domain knowledge and time to produce and compare dozens of models. Apply automated ML when you want Azure Machine Learning to train and tune a model for you using the target metric you specify. The service then iterates through ML algorithms paired with feature selections, where each iteration produces a model with a training score. The higher the score, the better the model is considered to "fit" your data.

With automated machine learning, you'll accelerate the time it takes to get production-ready ML models with great ease and efficiency.

When to use automated ML

Automated ML democratizes the machine learning model development process, and empowers its users, no matter their data science expertise, to identify an end-to-end machine learning pipeline for any problem.

Data scientists, analysts and developers across industries can use automated ML to:

  • Implement machine learning solutions without extensive programming knowledge
  • Save time and resources
  • Leverage data science best practices
  • Provide agile problem-solving

How automated ML works

Using Azure Machine Learning service, you can design and run your automated ML training experiments with these steps:

  1. Identify the ML problem to be solved: classification, forecasting, or regression

  2. Specify the source and format of the labeled training data: Numpy arrays or Pandas dataframe

  3. Configure the compute target for model training, such as your local computer, Azure Machine Learning Computes, remote VMs, or Azure Databricks. Learn about automated training on a remote resource.

  4. Configure the automated machine learning parameters that determine how many iterations over different models, hyperparameter settings, advanced preprocessing/featurization, and what metrics to look at when determining the best model. You can configure the settings for automatic training experiment in Azure portal or with the SDK.

  5. Submit the training run.

Automated Machine learning

During training, the Azure Machine Learning service creates a number of in parallel pipelines that try different algorithms and parameters. It will stop once it hits the exit criteria defined in the experiment.

You can also inspect the logged run information, which contains metrics gathered during the run. The training run produces a Python serialized object (.pkl file) that contains the model and data preprocessing.

While model building is automated, you can also learn how important or relevant features are to the generated models.

Preprocessing

In every automated machine learning experiment, your data is preprocessed using the default methods and optionally through advanced preprocessing.

Automatic preprocessing (standard)

In every automated machine learning experiment, your data is automatically scaled or normalized to help algorithms perform well. During model training, one of the following scaling or normalization techniques will be applied to each model.

Scaling & normalization Description
StandardScaleWrapper Standardize features by removing the mean and scaling to unit variance
MinMaxScalar Transforms features by scaling each feature by that column’s minimum and maximum
MaxAbsScaler Scale each feature by its maximum absolute value
RobustScalar This Scaler features by their quantile range
PCA Linear dimensionality reduction using Singular Value Decomposition of the data to project it to a lower dimensional space
TruncatedSVDWrapper This transformer performs linear dimensionality reduction by means of truncated singular value decomposition (SVD). Contrary to PCA, this estimator does not center the data before computing the singular value decomposition. This means it can work with scipy.sparse matrices efficiently
SparseNormalizer Each sample (that is, each row of the data matrix) with at least one non-zero component is re-scaled independently of other samples so that its norm (l1 or l2) equals one

Advanced preprocessing: optional featurization

Additional advanced preprocessing and featurization are also available, such as missing values imputation, encoding, and transforms. Learn more about what featurization is included. Enable this setting with:

Ensemble models

You can train ensemble models using automated machine learning with the Caruana ensemble selection algorithm with sorted Ensemble initialization. Ensemble learning improves machine learning results and predictive performance by combing many models as opposed to using single models. The ensemble iteration appears as the last iteration of your run.

Use with ONNX in C# apps

With Azure Machine Learning, you can use automated ML to build a Python model and have it converted to the ONNX format. The ONNX runtime supports C#, so you can use the model built automatically in your C# apps without any need for recoding or any of the network latencies that REST endpoints introduce. Try an example of this flow in this Jupyter notebook.

Automated ML across Microsoft

Automated ML is also available in other Microsoft solutions such as:

  • In .NET apps using Visual Studio and Visual Studio Code with ML.NET
  • On HDInsight, where you scale out your automated ML training jobs on Spark in HDInsight clusters in parallel.
  • In Power BI

Next steps

See examples and learn how to build models using Automated Machine Learning: