CountFeatureSelectingEstimator Class

Definition

Selects the slots for which the count of non-default values is greater than or equal to a threshold.

public sealed class CountFeatureSelectingEstimator : Microsoft.ML.IEstimator<Microsoft.ML.ITransformer>
type CountFeatureSelectingEstimator = class
    interface IEstimator<ITransformer>
Public NotInheritable Class CountFeatureSelectingEstimator
Implements IEstimator(Of ITransformer)
Inheritance
CountFeatureSelectingEstimator
Implements

Remarks

Estimator Characteristics

Does this estimator need to look at the data to train its parameters? Yes
Input column data type Vector or scalar of Single, Double or text data types
Output column data type Same as the input column
Exportable to ONNX Yes

This transform uses a set of aggregators to count the number of values for each slot (vector element) that are non-default and non-missing (for the definitions of default and missing, refer to the remarks section in DataKind). If the count value is less than the provided count parameter, that slot is dropped. This transform is useful when applied together with a OneHotHashEncodingTransformer. It can remove the features generated by the hash transform that have no data in the examples.

For example, if we set the count parameter to 3 and fit the estimator, apply the transformer to the following Features column, we would see the second slot, containing: NaN (missing value), 5, 5, 0 (default value) values being dropped because that slot has only two non-default and non-missing values, i.e. the two 5 values. The third slot is being kept, because it has the values 6, 6, 6, NaN; so it has 3 non-default and non-missing.

Features
4,NaN,6
4,5,6
4,5,6
4,0,NaN

This is how the dataset above would look, after the transformation.

Features
4,6
4,6
4,6
4,NaN

Check the See Also section for links to usage examples.

Methods

Fit(IDataView)

Trains and returns a ITransformer.

GetOutputSchema(SchemaShape)

Returns the SchemaShape of the schema which will be produced by the transformer. Used for schema propagation and verification in a pipeline.

Extension Methods

AppendCacheCheckpoint<TTrans>(IEstimator<TTrans>, IHostEnvironment)

Append a 'caching checkpoint' to the estimator chain. This will ensure that the downstream estimators will be trained against cached data. It is helpful to have a caching checkpoint before trainers that take multiple data passes.

WithOnFitDelegate<TTransformer>(IEstimator<TTransformer>, Action<TTransformer>)

Given an estimator, return a wrapping object that will call a delegate once Fit(IDataView) is called. It is often important for an estimator to return information about what was fit, which is why the Fit(IDataView) method returns a specifically typed object, rather than just a general ITransformer. However, at the same time, IEstimator<TTransformer> are often formed into pipelines with many objects, so we may need to build a chain of estimators via EstimatorChain<TLastTransformer> where the estimator for which we want to get the transformer is buried somewhere in this chain. For that scenario, we can through this method attach a delegate that will be called once fit is called.

Applies to

See also