This article lists the modules that are provided in Azure Machine Learning Studio for data transformation. For machine learning, data transformation entails some very general tasks, such as joining datasets or changing column names. But, it also includes many tasks that are specific to machine learning, such as normalization, binning and grouping, and inference of missing values.
Applies to: Machine Learning Studio
This content pertains only to Studio. Similar drag and drop modules have been added to the visual interface in Machine Learning service. Learn more in this article comparing the two versions.
Data that you use in Machine Learning Studio is generally expected to be "tidy" before you import it to Machine Learning Studio. Data preparation might include, for example, ensuring that the data uses the correct encoding and checking that the data has a consistent schema.
You can use Azure Machine Learning Workbench to transform and prepare all kinds of data. For examples, see Data transformations “by example” in Machine Learning Workbench.
Modules for data transformation are grouped into the following task-based categories:
- Creating filters for digital signal processing: Digital signal filters can be applied to numeric data to support machine learning tasks such as image recognition, voice recognition, and waveform analysis.
- Generating and using count-based features: Count-based featurization modules help you develop compact features to use in machine learning.
- General data manipulation and preparation: Merging datasets, cleaning missing values, grouping and summarizing data, changing column names and data types, or indicating which column is a label or a feature.
- Sampling and splitting datasets: Divide your data into training and test sets, split datasets by percentage or by a filter condition, or perform sampling.
- Scaling and reducing data: Prepare numerical data for analysis by applying normalization or by scaling. Bin data into groups, remove or replace outliers, or perform principal component analysis (PCA).
List of modules
The following module categories are included in the Data Transformation category:
- Data Transformation - Filter
- Learning with Counts
- Data Transformation - Manipulation
- Data Transformation - Sample and Split
- Data Transformation - Scale and Reduce