Machine Learning module descriptions
This topic provides an overview of all the modules included in Azure Machine Learning Studio, which is an interactive, visual workspace to easily build and test predictive models.
Applies to: Machine Learning Studio
This content pertains only to Studio. Similar drag and drop modules have been added to the visual interface in Machine Learning service. Learn more in this article comparing the two versions.
What is a module?
In Machine Learning Studio, a module is a building block for creating experiments. Each module encapsulates a specific machine learning algorithm, function, or code library that can act on data in your workspace. The modules are designed to accept connections from other modules, to share and modify data.
The code that runs in each module comes from many sources. These include open source libraries and languages, algorithms developed by Microsoft Research, and tools for working with Azure and other cloud services.
Looking for machine learning algorithms? See the Machine Learning category, which contains modules for decision trees, clustering, neural networks, among others. The Train and Evaluate categories include modules to help train and test your models.
By connecting and configuring modules, you can create a workflow that reads data from external sources, prepares it for analysis, applies machine learning algorithms, and generates results.
When an experiment is open in Machine Language Studio, you can see the complete list of current modules in the navigation pane at left. You drag these building blocks into your experiment, and then connect them to create a complete machine learning workflow, called an experiment.
Sometimes modules are updated to add new functionality, or to remove older code. When this happens, any experiments that you created that use the module continue to run. But the next time you open the experiment, you are prompted to upgrade the module, or to use a different module.
For an example of how to build a complete machine learning experiment, see these tutorials:
To make it easier to find related modules, the machine learning tools in Machine Learning Studio are grouped by these categories.
Use these modules to convert data to one of the formats used by other machine learning tools or formats.
Use these modules to read data and models from cloud data sources, including Hadoop clusters, Azure Table storage, and web URLs. You can also use these modules to write results to storage or to a database.
Use these modules to prepare data for analysis. You can change data types, flag columns as features or labels, generate features, and scale or normalize data.
Transform numeric data derived from digital signal processing.
Use joint probability distributions to build features that compactly describe large datasets.
This group provides a variety of tools for data science. For example, you can remove or replace missing values, choose a subset of columns, add a column, or concatenate two datasets.
Divide a dataset by criteria or by size, to create training and test sets, or to isolate certain rows.
Transform numerical data.
Use these modules to identify the best features in your data, using widely researched statistical methods.
This group contains most of the machine learning algorithms supported by Machine Learning.
It also contains modules intended to support the algorithms by training models, generating scores, and evaluating model performance.
After you have trained a model, use these tools to measure the model’s accuracy.
These modules provide the machine learning algorithms, which you can customize by setting parameters. The algorithms in this section are grouped by type:
Use these modules to pass new data through the algorithm, and generate a set of results for evaluation. You can also use the results of scoring as part of a predictive service.
These modules train an initialized machine learning model on data you provide.
These modules give you easy access to a popular open source library for image processing and image classification.
Use these modules to add custom R code to your experiment, or implement a machine learning model based on an R package.
Use these modules to add custom Python code to your experiment.
Use these modules to calculate probability distributions, create custom calculations, and perform a wide variety of other tasks related to numerical variables.
Use these modules to perform feature hashing and named entity recognition, or to preprocess text using natural language processing tools.
Use these modules to assess anomalies in trends, by using algorithms specifically designed for time series data.
Machine Learning Studio modules don't attempt to duplicate data integration tools supported in other tools, such as Azure Data Factory. Instead, the modules provide functionality that is specific to machine learning:
- Normalization, grouping, and scaling of data
- Computing statistical distribution of data
- Conversion to other machine learning formats
- Import of data used for machine learning experiments and export of results
- Text analytics, feature selection, and dimensionality reduction
If you need more sophisticated facilities for data manipulation and storage, see the following:
- Azure Data Factory: Enterprise-ready, cloud data processing pipelines.
- Azure SQL Database: Scalable storage, with integrated access to machine learning.
- CosmosDB: NoSQL data store; import data to Machine Learning Studio.
- Azure Data Lake Analytics: Distributed analytics on big data.
- Stream Analytics: Event processing for the Internet of Things.
- Azure Text Analytics: Multiple options for text processing, and related cognitive services for speech, image, and facial recognition.
- Azure Databricks: Spark-based analytics platform.