Modify Count Table Parameters

Article
05/06/2019

Important

Support for Machine Learning Studio (classic) will end on 31 August 2024. We recommend you transition to Azure Machine Learning by that date.

Beginning 1 December 2021, you will not be able to create new Machine Learning Studio (classic) resources. Through 31 August 2024, you can continue to use the existing Machine Learning Studio (classic) resources.

See information on moving machine learning projects from ML Studio (classic) to Azure Machine Learning.
Learn more about Azure Machine Learning.

ML Studio (classic) documentation is being retired and may not be updated in the future.

Modifies the parameters used to create features from counts

Category: Learning with Counts

Note

Applies to: Machine Learning Studio (classic) only

Similar drag-and-drop modules are available in Azure Machine Learning designer.

Module overview

This article describes how to use the Modify Count Table Parameters module in Machine Learning Studio (classic), to change the way that features are generated from a count table.

In general, to create count-based features, you use Build Counting Transform to process a dataset and create a count table, and from that count table generate a new set of features.

However, if you have already created a count table, you can use the Modify Count Table Parameters module to edit the definition of how the count data is processed. This lets you create a different set of count-based statistics based on the existing data, without having to re-analyze the dataset.

How to configure Modify Count Parameters

Locate the transformation you want to modify, in the Transforms group, and add it to your experiment.

You should have previously run an experiment that created a count transformation.
- To modify a saved transform: Locate the transformation, in the Transforms group, and add it to your experiment.
- To modify a count transformation created within the same experiment: If the transformation has not been saved, but is available as an output in the current experiment (for example, check the output of the Build Counting Transform module), you can use it directly by connecting the modules.
Add the Modify Count Table Parameters module and connect the transformation as an input.
In the Properties pane of the Modify Count Table Parameters module, type a value to use as theGarbage bin threshold.

This value specifies the minimum number of occurrences that must be found for each feature value, in order for counts to be used. If the frequency of the value is less than the garbage bin threshold, the value-label pair is not counted as a discrete item; instead, all items with counts lower than the threshold value are placed in a single "garbage bin".

If you are using a small dataset and you are counting and training on the same data, a good starting value is 1.
For Additional prior pseudo examples, type a number that indicates the number of additional pseudo examples to include. You do not need to provide these examples; the pseudo examples are generated based on the prior distribution.
For Laplacian noise scale, type a positive floating-point value that represents the scale used for introducing noise sampled from a Laplacian distribution. When you set a scale value, some acceptable level of noise is incorporated into the model, so the model is less likely to be affected by unseen values in data.
In Output features include, choose the method to use when creating count-based features for inclusion in the transformation.
- CountsOnly: Create features using counts.
- LogOddsOnly: Create features using the log of the odds ratio.
- BothCountsAndLogOdds: Create features using both counts and log odds.
Select the Ignore back off column option if you want to override the IsBackOff flag in the output when creating features. When you select this option, count-based features are created even if the column doesn’t have significant count values.
Run the experiment. You can then save the output of Modify Count Table Parameters as a new transformation, if desired.

Examples

For examples of how this module, see the Azure AI Gallery:

Learning with Counts: Binary Classification: Demonstrates how to use the learning with counts modules to generate features from columns of categorical values for a binary classification model.
Learning with Counts: Multiclass classification with NYC taxi data:sample Demonstrates how to use the learning with counts modules for performing multiclass classification on the publicly available NYC taxi dataset. The sample uses a multiclass logistic regression learner to model this problem.
Learning with Counts: Binary classification with NYC taxi data: Demonstrates how to use the learning with counts modules for performing binary classification on the publicly available NYC taxi dataset. The sample uses a two-class logistic regression learner to model this problem.

Technical notes

This section contains implementation details, tips, and answers to frequently asked questions.

It is statistically safe to count and train on the same data set if you set the Laplacian noise scale parameter.

Expected inputs

Name	Type	Description
Counting transform	ITransform interface	The counting transform to apply

Module parameters

Name	Type	Range	Optional	Default	Description
Garbage bin threshold	Float	>=0.0f	Required	10.0f	The threshold under which a column value will be featurized against the garbage bin
Additional prior pseudo examples	Float	>=0.0f	Required	42.0f	The additional pseudo examples following prior distributions to be included
Laplacian noise scale	Float	>=0.0f	Required	0.0f	The scale of the Laplacian distribution from which noise is sampled
Output features include	OutputFeatureType		Required	BothCountsAndLogOdds	The features to output
Ignore back off column	Boolean		Required	false	Whether to ignore the IsBackOff column in the output

Outputs

Name	Type	Description
Modified transform	ITransform interface	The modified transform

Exceptions

Exception	Description
Error 0003	Exception occurs if one or more of inputs are null or empty.
Error 0086	Exception occurs when a counting transform is invalid.

For a list of errors specific to Studio (classic) modules, see Machine Learning Error codes.

For a list of API exceptions, see Machine Learning REST API Error Codes.