FastForestBinaryFeaturizationEstimator Class

Definition

A IEstimator<TTransformer> to transform input feature vector to tree-based features.

public sealed class FastForestBinaryFeaturizationEstimator : Microsoft.ML.Trainers.FastTree.TreeEnsembleFeaturizationEstimatorBase
type FastForestBinaryFeaturizationEstimator = class
    inherit TreeEnsembleFeaturizationEstimatorBase
Public NotInheritable Class FastForestBinaryFeaturizationEstimator
Inherits TreeEnsembleFeaturizationEstimatorBase
Inheritance
FastForestBinaryFeaturizationEstimator

Remarks

Input and Output Columns

The input label column data must be Boolean. The input features column data must be a known-sized vector of Single.

This estimator outputs the following columns:

Output Column Name Column Type Description
Trees Known-sized vector of Single The output values of all trees. Its size is identical to the total number of trees in the tree ensemble model.
Leaves Known-sized vector of Single 0-1 vector representation to the IDs of all leaves where the input feature vector falls into. Its size is the number of total leaves in the tree ensemble model.
Paths Known-sized vector of Single 0-1 vector representation to the paths the input feature vector passed through to reach the leaves. Its size is the number of non-leaf nodes in the tree ensemble model.

Those output columns are all optional and user can change their names. Please set the names of skipped columns to null so that they would not be produced.

Prediction Details

This estimator produces several output columns from a tree ensemble model. Assume that the model contains only one decision tree:

               Node 0
               /    \
             /        \
           /            \
         /                \
       Node 1            Node 2
       /    \            /    \
     /        \        /        \
   /            \     Leaf -3  Node 3
  Leaf -1      Leaf -2         /    \
                             /        \
                            Leaf -4  Leaf -5

Assume that the input feature vector falls into Leaf -1. The output Trees may be a 1-element vector where the only value is the decision value carried by Leaf -1. The output Leaves is a 0-1 vector. If the reached leaf is the $i$-th (indexed by $-(i+1)$ so the first leaf is Leaf -1) leaf in the tree, the $i$-th value in Leaves would be 1 and all other values would be 0. The output Paths is a 0-1 representation of the nodes passed through before reaching the leaf. The $i$-th element in Paths indicates if the $i$-th node (indexed by $i$) is touched. For example, reaching Leaf -1 lead to $[1, 1, 0, 0]$ as the Paths. If there are multiple trees, this estimator just concatenates Trees's, Leaves's, Paths's from all trees (first tree's information comes first in the concatenated vectors).

Check the See Also section for links to usage examples.

Methods

Fit(IDataView)

Produce a TreeEnsembleModelParameters which maps the column called InputColumnName in input to three output columns.

(Inherited from TreeEnsembleFeaturizationEstimatorBase)
GetOutputSchema(SchemaShape)

PretrainedTreeFeaturizationEstimator adds three float-vector columns into inputSchema. Given a feature vector column, the added columns are the prediction values of all trees, the leaf IDs the feature vector falls into, and the paths to those leaves.

(Inherited from TreeEnsembleFeaturizationEstimatorBase)

Extension Methods

AppendCacheCheckpoint<TTrans>(IEstimator<TTrans>, IHostEnvironment)

Append a 'caching checkpoint' to the estimator chain. This will ensure that the downstream estimators will be trained against cached data. It is helpful to have a caching checkpoint before trainers that take multiple data passes.

WithOnFitDelegate<TTransformer>(IEstimator<TTransformer>, Action<TTransformer>)

Given an estimator, return a wrapping object that will call a delegate once Fit(IDataView) is called. It is often important for an estimator to return information about what was fit, which is why the Fit(IDataView) method returns a specifically typed object, rather than just a general ITransformer. However, at the same time, IEstimator<TTransformer> are often formed into pipelines with many objects, so we may need to build a chain of estimators via EstimatorChain<TLastTransformer> where the estimator for which we want to get the transformer is buried somewhere in this chain. For that scenario, we can through this method attach a delegate that will be called once fit is called.

Applies to

See also