This article describes the modules in Azure Machine Learning Studio that support mathematical and statistical operations critical for machine learning. If you need to perform tasks such as the following in your experiment, look in the Statistical Functions category:
- Perform ad hoc computations on column values, such as rounding or using an absolute value.
- Compute means, logarithms, and other statistics commonly used in machine learning.
- Calculate correlation and probability scores.
- Compute z-scores.
- Compute widely used statistical distributions, such as Weibull, gamma, and beta.
- Generate statistical reports over a set of columns or a dataset.
Applies to: Machine Learning Studio
This content pertains only to Studio. Similar drag and drop modules have been added to the visual interface in Machine Learning service. Learn more in this article comparing the two versions.
For example, if you have a new dataset, you might use the Summarize Data module first. It generates a report for an entire dataset that includes standard statistical measures, such as mean and standard deviation.
If you need more advanced statistics, such as sample skewness or interquartile distance, use the Compute Elementary Statistics module to generate additional descriptive statistics.
Because the modules generate the results each time you run the experiment, the results are updated if your data changes.
List of modules
The Statistical Functions category includes the following modules:
- Apply Math Operation: Applies a mathematical operation to column values.
- Compute Elementary Statistics: Calculates specified summary statistics for selected dataset columns.
- Compute Linear Correlation: Calculates the linear correlation between column values in a dataset.
- Evaluate Probability Function: Fits a specified probability distribution function to a dataset.
- Replace Discrete Values: Replaces discrete values from one column with numeric values based on another column.
- Summarize Data: Generates a basic descriptive statistics report for the columns in a dataset.
- Test Hypothesis Using t-Test: Compares means from two datasets by using a t-test.