FastTreeBinaryClassifier FastTreeBinaryClassifier FastTreeBinaryClassifier Class


Uses a logit-boost boosted tree learner to perform binary classification.

public sealed class FastTreeBinaryClassifier : Microsoft.ML.ILearningPipelineItem, Microsoft.ML.Runtime.EntryPoints.CommonInputs.ITrainerInputWithGroupId
type FastTreeBinaryClassifier = class
    interface CommonInputs.ITrainerInputWithGroupId
    interface CommonInputs.ITrainerInputWithWeight
    interface CommonInputs.ITrainerInputWithLabel
    interface CommonInputs.ITrainerInput
    interface ILearningPipelineItem
Public NotInheritable Class FastTreeBinaryClassifier
Implements CommonInputs.ITrainerInputWithGroupId, ILearningPipelineItem


FastTrees is an efficient implementation of the MART gradient boosting algorithm. Gradient boosting is a machine learning technique for regression problems. It builds each regression tree in a step-wise fashion, using a predefined loss function to measure the error for each step and corrects for it in the next. So this prediction model is actually an ensemble of weaker prediction models. In regression problems, boosting builds a series of of such trees in a step-wise fashion and then selects the optimal tree using an arbitrary differentiable loss function.

MART learns an ensemble of regression trees, which is a decision tree with scalar values in its leaves. A decision (or regression) tree is a binary tree-like flow chart, where at each interior node one decides which of the two child nodes to continue to based on one of the feature values from the input. At each leaf node, a value is returned. In the interior nodes, the decision is based on the test 'x <= v' where x is the value of the feature in the input sample and v is one of the possible values of this feature. The functions that can be produced by a regression tree are all the piece-wise constant functions.

The ensemble of trees is produced by computing, in each step, a regression tree that approximates the gradient of the loss function, and adding it to the previous tree with coefficients that minimize the loss of the new tree. The output of the ensemble produced by MART on a given instance is the sum of the tree outputs.

  • In case of a binary classification problem, the output is converted to a probability by using some form of calibration.
  • In case of a regression problem, the output is the predicted value of the function.
  • In case of a ranking problem, the instances are ordered by the output value of the ensemble.

For more information see:
Wikipedia: Gradient boosting (Gradient tree boosting)
Greedy function approximation: A gradient boosting machine..


FastTreeBinaryClassifier() FastTreeBinaryClassifier() FastTreeBinaryClassifier()


AllowEmptyTrees AllowEmptyTrees AllowEmptyTrees

When a root split is impossible, allow training to proceed

BaggingSize BaggingSize BaggingSize

Number of trees in each bag (0 for disabling bagging)

BaggingTrainFraction BaggingTrainFraction BaggingTrainFraction

Percentage of training examples used in each bag

BaselineAlphaRisk BaselineAlphaRisk BaselineAlphaRisk

Baseline alpha for tradeoffs of risk (0 is normal training)

BaselineScoresFormula BaselineScoresFormula BaselineScoresFormula

Freeform defining the scores that should be used as the baseline ranker

BestStepRankingRegressionTrees BestStepRankingRegressionTrees BestStepRankingRegressionTrees

Use best regression step trees?

Bias Bias Bias

Bias for calculating gradient for each feature bin for a categorical feature.

Bundling Bundling Bundling

Bundle low population bins. Bundle.None(0): no bundling, Bundle.AggregateLowPopulation(1): Bundle low population, Bundle.Adjacent(2): Neighbor low population bundle.

Caching Caching Caching

Whether learner should cache input training data

CategoricalSplit CategoricalSplit CategoricalSplit

Whether to do split based on multiple categorical feature values.

CompressEnsemble CompressEnsemble CompressEnsemble

Compress the tree Ensemble

DiskTranspose DiskTranspose DiskTranspose

Whether to utilize the disk or the data's native transposition facilities (where applicable) when performing the transpose

DropoutRate DropoutRate DropoutRate

Dropout rate for tree regularization

EarlyStoppingMetrics EarlyStoppingMetrics EarlyStoppingMetrics

Early stopping metrics. (For regression, 1: L1, 2:L2; for ranking, 1:NDCG@1, 3:NDCG@3)

EarlyStoppingRule EarlyStoppingRule EarlyStoppingRule

Early stopping rule. (Validation set (/valid) is required.)

EnablePruning EnablePruning EnablePruning

Enable post-training pruning to avoid overfitting. (a validation set is required)

EntropyCoefficient EntropyCoefficient EntropyCoefficient

The entropy (regularization) coefficient between 0 and 1

ExecutionTimes ExecutionTimes ExecutionTimes

Print execution time breakdown to stdout

FeatureColumn FeatureColumn FeatureColumn

Column to use for features

FeatureCompressionLevel FeatureCompressionLevel FeatureCompressionLevel

The level of feature compression to use

FeatureFirstUsePenalty FeatureFirstUsePenalty FeatureFirstUsePenalty

The feature first use penalty coefficient

FeatureFlocks FeatureFlocks FeatureFlocks

Whether to collectivize features during dataset preparation to speed up training

FeatureFraction FeatureFraction FeatureFraction

The fraction of features (chosen randomly) to use on each iteration

FeatureReusePenalty FeatureReusePenalty FeatureReusePenalty

The feature re-use penalty (regularization) coefficient

FeatureSelectSeed FeatureSelectSeed FeatureSelectSeed

The seed of the active feature selection

FilterZeroLambdas FilterZeroLambdas FilterZeroLambdas

Filter zero lambdas during training

GainConfidenceLevel GainConfidenceLevel GainConfidenceLevel

Tree fitting gain confidence requirement (should be in the range [0,1) ).

GetDerivativesSampleRate GetDerivativesSampleRate GetDerivativesSampleRate

Sample each query 1 in k times in the GetDerivatives function

GroupIdColumn GroupIdColumn GroupIdColumn

Column to use for example groupId

HistogramPoolSize HistogramPoolSize HistogramPoolSize

The number of histograms in the pool (between 2 and numLeaves)

LabelColumn LabelColumn LabelColumn

Column to use for labels

LearningRates LearningRates LearningRates

The learning rate

MaxBins MaxBins MaxBins

Maximum number of distinct values (bins) per feature

MaxCategoricalGroupsPerNode MaxCategoricalGroupsPerNode MaxCategoricalGroupsPerNode

Maximum categorical split groups to consider when splitting on a categorical feature. Split groups are a collection of split points. This is used to reduce overfitting when there many categorical features.

MaxCategoricalSplitPoints MaxCategoricalSplitPoints MaxCategoricalSplitPoints

Maximum categorical split points to consider when splitting on a categorical feature.

MaxTreeOutput MaxTreeOutput MaxTreeOutput

Upper bound on absolute value of single tree output

MaxTreesAfterCompression MaxTreesAfterCompression MaxTreesAfterCompression

Maximum Number of trees after compression

MinDocsForCategoricalSplit MinDocsForCategoricalSplit MinDocsForCategoricalSplit

Minimum categorical doc count in a bin to consider for a split.

MinDocsPercentageForCategoricalSplit MinDocsPercentageForCategoricalSplit MinDocsPercentageForCategoricalSplit

Minimum categorical docs percentage in a bin to consider for a split.

MinDocumentsInLeafs MinDocumentsInLeafs MinDocumentsInLeafs

The minimal number of documents allowed in a leaf of a regression tree, out of the subsampled data

MinStepSize MinStepSize MinStepSize

Minimum line search step size

NormalizeFeatures NormalizeFeatures NormalizeFeatures

Normalize option for the feature column

NumLeaves NumLeaves NumLeaves

The max number of leaves in each regression tree

NumPostBracketSteps NumPostBracketSteps NumPostBracketSteps

Number of post-bracket line search steps

NumThreads NumThreads NumThreads

The number of threads to use

NumTrees NumTrees NumTrees

Total number of decision trees to create in the ensemble

OptimizationAlgorithm OptimizationAlgorithm OptimizationAlgorithm

Optimization algorithm to be used (GradientDescent, AcceleratedGradientDescent)

ParallelTrainer ParallelTrainer ParallelTrainer

Allows to choose Parallel FastTree Learning Algorithm

PositionDiscountFreeform PositionDiscountFreeform PositionDiscountFreeform

The discount freeform which specifies the per position discounts of documents in a query (uses a single variable P for position where P=0 is first position)

PrintTestGraph PrintTestGraph PrintTestGraph

Print metrics graph for the first test set

PrintTrainValidGraph PrintTrainValidGraph PrintTrainValidGraph

Print Train and Validation metrics in graph

PruningThreshold PruningThreshold PruningThreshold

The tolerance threshold for pruning

PruningWindowSize PruningWindowSize PruningWindowSize

The moving window size for pruning

RandomStart RandomStart RandomStart

Training starts from random ordering (determined by /r1)

RngSeed RngSeed RngSeed

The seed of the random number generator

Shrinkage Shrinkage Shrinkage


Smoothing Smoothing Smoothing

Smoothing paramter for tree regularization

SoftmaxTemperature SoftmaxTemperature SoftmaxTemperature

The temperature of the randomized softmax distribution for choosing the feature

SparsifyThreshold SparsifyThreshold SparsifyThreshold

Sparsity level needed to use sparse feature representation

SplitFraction SplitFraction SplitFraction

The fraction of features (chosen randomly) to use on each split

TestFrequency TestFrequency TestFrequency

Calculate metric values for train/valid/test every k rounds

TrainingData TrainingData TrainingData

The data to be used for training

UnbalancedSets UnbalancedSets UnbalancedSets

Should we use derivatives optimized for unbalanced sets

UseLineSearch UseLineSearch UseLineSearch

Should we use line search for a step size

UseTolerantPruning UseTolerantPruning UseTolerantPruning

Use window and tolerance for pruning

WeightColumn WeightColumn WeightColumn

Column to use for example weight

WriteLastEnsemble WriteLastEnsemble WriteLastEnsemble

Write the last ensemble instead of the one determined by early stopping


ApplyStep(ILearningPipelineStep, Experiment) ApplyStep(ILearningPipelineStep, Experiment) ApplyStep(ILearningPipelineStep, Experiment)
GetInputData() GetInputData() GetInputData()

Applies to