Machine Learning - Train

Important

Support for Machine Learning Studio (classic) will end on 31 August 2024. We recommend you transition to Azure Machine Learning by that date.

Beginning 1 December 2021, you will not be able to create new Machine Learning Studio (classic) resources. Through 31 August 2024, you can continue to use the existing Machine Learning Studio (classic) resources.

ML Studio (classic) documentation is being retired and may not be updated in the future.

This article describes the modules provided in Machine Learning Studio (classic) for training a machine learning model. Training is the process of analyzing input data by using the parameters of a predefined model. From this analysis, the model learns the patterns, and saves them in the form of a trained model.

Note

Applies to: Machine Learning Studio (classic) only

Similar drag-and-drop modules are available in Azure Machine Learning designer.

This article also describes the overall process in Machine Learning Studio (classic) for model creation, training, evaluation, and scoring.

Create and use machine learning models

The typical workflow for machine learning includes these phases:

  • Choosing a suitable algorithm, and setting initial options.
  • Training the model on compatible data.
  • Creating predictions by using new data, based on the patterns in the model.
  • Evaluating the model to determine if the predictions are accurate, how much error there is, and if there is any overfitting.

Machine Learning Studio (classic) supports a flexible, customizable framework for machine learning. Each task in this process is performed by a specific type of module, which can be modified, added, or removed, without breaking the rest of your experiment.

The modules in this category support training for different types of models. During training, the data is analyzed by the machine learning algorithm. This algorithm analyzes the distribution and type of the data, compiles statistics, and creates patterns that can be used later for prediction.

More about model training

When Machine Learning is training a model, rows with missing values are skipped. Therefore, if you want to fix the values manually, use imputation, or specify a different method for handling missing values, use the Clean Missing Data module before training on the dataset.

We recommend that you use the Edit Metadata module to fix any other issues with the data. You might need to mark the label column, change data types, or correct column names.

For other common data cleanup tasks, such as normalization, sampling, binning, and scaling, see the Data Transformation category.

Choose the right trainer

The method that you use to train a model depends on the type of model you are creating, and the type of data that the model requires. For example, Machine Learning provides modules specifically for training anomaly detection models, recommendation models, and more.

Check the list of training modules to determine which one is correct for your scenario.

If you are not sure of the best parameters to use when training a model, use one of the modules provided for parameter sweeping and validation:

  • Tune Model Hyperparameters can perform a parameter sweep on almost all classification and regression models. It trains multiple models, and then returns the best model.

  • The Sweep Clustering module supports model tuning during the training process, and is intended for use only with clustering models. You can specify a range of centroids, and train on data while automatically detecting the best parameters.

  • The Cross-Validate Model module is also useful for model optimization, but does not return a trained model. Instead, it provides metrics that you can use to determine the best model.

Retrain models

If you need to retrain a production model, you can re-run the experiment at any time.

You can also automate the retraining process by using web services. For a walkthrough, see Retraining and updating Machine Learning models with Azure Data Factory.

Use pretrained models

Machine Learning includes some models that are pretrained, such as the Pretrained Cascade Image Classification module. You can use these models for scoring without additional data input.

Also, some modules (such as Time Series Anomaly Detection) do not generate a trained model in the iLearner format. But they do take training data and create a model internally, which can then be used to make predictions. To use these, you just configure the parameters and provide data.

Save a snapshot of a trained model

If you want to save or export the model, right-click the training module, and select Save as Trained Model. The model is exported to the iLearner format and saved in your workspace, under Trained Models. Trained models can be re-used in other experiments, or connected to other modules for scoring.

You can also use the Load Trained Model module in an experiment to retrieve a stored model.

List of modules

The Train category includes these modules:

  • Sweep Clustering: Performs a parameter sweep on a clustering model to determine the optimum parameter settings, and trains the best model.
  • Train Anomaly Detection Model: Trains an anomaly detector model and labels data from a training set.
  • Train Clustering Model: Trains a clustering model and assigns data from the training set to clusters.
  • Train Matchbox Recommender: Trains a Bayesian recommender by using the Matchbox algorithm.
  • Train Model: Trains a classification or regression model from a training set.
  • Tune Model Hyperparameters: Performs a parameter sweep on a regression or classification model to determine the optimum parameter settings, and trains the best model.

Some modules are not in this category, because they require a special format or are customized for a specific task:

See also