OneHotHashEncodingEstimator Class

Definition

Converts one or more input columns of categorical values into as many output columns of hash-based one-hot encoded vectors.

public sealed class OneHotHashEncodingEstimator : Microsoft.ML.IEstimator<Microsoft.ML.Transforms.OneHotHashEncodingTransformer>
type OneHotHashEncodingEstimator = class
    interface IEstimator<OneHotHashEncodingTransformer>
Public NotInheritable Class OneHotHashEncodingEstimator
Implements IEstimator(Of OneHotHashEncodingTransformer)
Inheritance
OneHotHashEncodingEstimator
Implements

Remarks

Estimator Characteristics

Does this estimator need to look at the data to train its parameters? Yes
Input column data type Scalar or vector of numeric, boolean, text, or key type.
Output column data type Scalar or vector of key, or vector of Single type.
Exportable to ONNX No

The resulting OneHotEncodingTransformer converts one or more input columns into as many output columns of one-hot encoded vectors, where indexing is done by hashing the value and using the hash as an index.

The OneHotEncodingEstimator is often used to convert categorical data into a form that can be provided to a machine learning algorithm.

The output of this transform is specified by OneHotEncodingEstimator.OutputKind:

  • Indicator produces an indicator vector. Each slot in this vector corresponds to a category in the dictionary, so its length is the size of the built dictionary. If a value is not found in the dictionary, the output is the zero vector.

  • Bag produces one vector such that each slot stores the number of occurrences of the corresponding value in the input vector. Each slot in this vector corresponds to a value in the dictionary, so its length is the size of the built dictionary. Indicator and Bag differ simply in how the bit-vectors generated from individual slots in the input column are aggregated: for Indicator they are concatenated and for Bag they are added. When the source column is a Scalar, the Indicator and Bag options are identical.

  • Key produces keys in a KeyDataViewType column. If the input column is a vector, the output contains a vector key type, where each slot of the vector corresponds to the respective slot of the input vector. If a category is not found in the built dictionary, it is assigned the value zero.

  • Binary produces a binary encoded vector to represent the values found in the dictionary that are present in the input column. If a value in the input column is not found in the dictionary, the output is the zero vector.

The OneHotEncodingTransformer can be applied to one or more columns, in which case it builds and uses a separate dictionary for each column that it is applied to.

Check the See Also section for links to usage examples.

Methods

Fit(IDataView)

Trains and returns a OneHotHashEncodingTransformer.

GetOutputSchema(SchemaShape)

Returns the SchemaShape of the schema which will be produced by the transformer. Used for schema propagation and verification in a pipeline.

Extension Methods

AppendCacheCheckpoint<TTrans>(IEstimator<TTrans>, IHostEnvironment)

Append a 'caching checkpoint' to the estimator chain. This will ensure that the downstream estimators will be trained against cached data. It is helpful to have a caching checkpoint before trainers that take multiple data passes.

WithOnFitDelegate<TTransformer>(IEstimator<TTransformer>, Action<TTransformer>)

Given an estimator, return a wrapping object that will call a delegate once Fit(IDataView) is called. It is often important for an estimator to return information about what was fit, which is why the Fit(IDataView) method returns a specifically typed object, rather than just a general ITransformer. However, at the same time, IEstimator<TTransformer> are often formed into pipelines with many objects, so we may need to build a chain of estimators via EstimatorChain<TLastTransformer> where the estimator for which we want to get the transformer is buried somewhere in this chain. For that scenario, we can through this method attach a delegate that will be called once fit is called.

Applies to

See also