Machine learning modules in Azure Machine Learning Studio
The typical workflow for machine learning includes many phases:
Identifying a problem to solve and a metric for measuring results.
Finding, cleaning, and preparing appropriate data.
Identifying the best features and engineering new features.
Building, evaluating, and tuning models.
Using models to generate predictions, recommendations, and other results.
The modules in this section provide tools for the final phases of machine learning, in which you apply an algorithm to data to train a model. In these final phases, you also generate scores, and then evaluate the accuracy and usefulness of the model.
Applies to: Machine Learning Studio
This content pertains only to Studio. Similar drag and drop modules have been added to the visual interface in Machine Learning service. Learn more in this article comparing the two versions.
List of machine learning tasks by category
Provide your data to the configured model to learn from patterns and create statistics that can be used for predictions.
Create predictions using the trained models.
Measure the accuracy of a trained model, or compare multiple models.
For a detailed description of this experimental workflow, see the credit risk solution walkthrough.
Before you can get to the fun part of building a model, typically a lot of preparation is required. This section provides links to tools in Machine Learning Studio that can help you clean up your data, improve the quality of input, and prevent run-time errors.
Data exploration and data quality
Ensure that your data is the right kind of data, the right quantity, and the right quality for the algorithm you’ve chosen. Understand how much data you have, and how it is distributed. Are there outliers? How were those generated, and what do they mean? Are there any duplicate records?
Handle missing values
Missing values can affect your results in many ways. For example, almost all statistical methods discard cases with missing values. By default, Machine Learning follows these rules when it encounters rows with missing values:
If data used to train a model has missing values, any rows with missing values are skipped.
If data used as input when scoring against a model has missing values, the missing values are used as inputs, but nulls are propagated. This usually means that a null is inserted in the results instead of a valid prediction.
Be sure to check your data before training your model. To impute the missing values or correct your data, use this module:
Select features and reduce dimensionality
Machine Learning Studio can help you sift through your data to find the most useful attributes.
Use tools such as Fisher Linear Discriminant Analysis or Filter Based Feature Selection to determine which columns of data have the most predictive power. These tools can also identify columns that should be removed because of data leakage.
Choose an appropriate algorithm
The problem you are trying to solve determines both the choice of data to use in analysis, and the choice of an algorithm.
For more information, see How to choose an algorithm in Azure Machine Learning.
For examples of machine learning in action, see the Azure AI Gallery.
For tips, and a walkthrough of some typical data prepration tasks, see Advanced data processing in Azure.