LightGbmClassifier Class

Gradient Boosted Decision Trees

Inheritance
nimbusml.internal.core.ensemble._lightgbmclassifier.LightGbmClassifier
LightGbmClassifier
nimbusml.base_predictor.BasePredictor
LightGbmClassifier
sklearn.base.ClassifierMixin
LightGbmClassifier

Constructor

LightGbmClassifier(number_of_iterations=100, learning_rate=None, number_of_leaves=None, minimum_example_count_per_leaf=None, booster=None, normalize='Auto', caching='Auto', unbalanced_sets=False, use_softmax=None, sigmoid=0.5, evaluation_metric='Error', maximum_bin_count_per_feature=255, verbose=False, silent=True, number_of_threads=None, early_stopping_round=0, batch_size=1048576, use_categorical_split=None, handle_missing_value=True, minimum_example_count_per_group=100, maximum_categorical_split_point_count=32, categorical_smoothing=10.0, l2_categorical_regularization=10.0, random_state=None, parallel_trainer=None, feature=None, group_id=None, label=None, weight=None, **params)

Parameters

feature

see Columns.

group_id

see Columns.

label

see Columns.

weight

see Columns.

number_of_iterations

Number of iterations.

learning_rate

Determines the size of the step taken in the direction of the gradient in each step of the learning process. This determines how fast or slow the learner converges on the optimal solution. If the step size is too big, you might overshoot the optimal solution. If the step size is too small, training takes longer to converge to the best solution.

number_of_leaves

The maximum number of leaves (terminal nodes) that can be created in any tree. Higher values potentially increase the size of the tree and get better precision, but risk overfitting and requiring longer training times.

minimum_example_count_per_leaf

Minimum number of training instances required to form a leaf. That is, the minimal number of documents allowed in a leaf of regression tree, out of the sub-sampled data. A 'split' means that features in each level of the tree (node) are randomly divided.

booster

Which booster to use. Available options are:

  1. Dart

  2. Gbdt

  3. Goss.

normalize

If Auto, the choice to normalize depends on the preference declared by the algorithm. This is the default choice. If No, no normalization is performed. If Yes, normalization always performed. If Warn, if normalization is needed by the algorithm, a warning message is displayed but normalization is not performed. If normalization is performed, a MaxMin normalizer is used. This normalizer preserves sparsity by mapping zero to zero.

caching

Whether trainer should cache input training data.

unbalanced_sets

Use for multi-class classification when training data is not balanced.

use_softmax

Use softmax loss for the multi classification.

sigmoid

Parameter for the sigmoid function.

evaluation_metric

Evaluation metrics.

maximum_bin_count_per_feature

Maximum number of bucket bin for features.

verbose

Verbose.

silent

Printing running messages.

number_of_threads

Number of parallel threads used to run LightGBM.

early_stopping_round

Rounds of early stopping, 0 will disable it.

batch_size

Number of entries in a batch when loading data.

use_categorical_split

Enable categorical split or not.

handle_missing_value

Enable special handling of missing value or not.

minimum_example_count_per_group

Minimum number of instances per categorical group.

maximum_categorical_split_point_count

Max number of categorical thresholds.

categorical_smoothing

Lapalace smooth term in categorical feature spilt. Avoid the bias of small categories.

l2_categorical_regularization

L2 Regularization for categorical split.

random_state

Sets the random seed for LightGBM to use.

parallel_trainer

Parallel LightGBM Learning Algorithm.

params

Additional arguments sent to compute engine.

Examples


   ###############################################################################
   # LightGbmClassifier
   from nimbusml import Pipeline, FileDataStream
   from nimbusml.datasets import get_dataset
   from nimbusml.ensemble import LightGbmClassifier
   from nimbusml.ensemble.booster import Dart
   from nimbusml.feature_extraction.categorical import OneHotVectorizer

   # data input (as a FileDataStream)
   path = get_dataset('infert').as_filepath()

   data = FileDataStream.read_csv(path)
   print(data.head())
   #    age  case education  induced  parity ... row_num  spontaneous  ...
   # 0   26     1    0-5yrs        1       6 ...       1            2  ...
   # 1   42     1    0-5yrs        1       1 ...       2            0  ...
   # 2   39     1    0-5yrs        2       6 ...       3            0  ...
   # 3   34     1    0-5yrs        2       4 ...       4            0  ...
   # 4   35     1   6-11yrs        1       3 ...       5            1  ...

   # define the training pipeline
   pipeline = Pipeline([
       OneHotVectorizer(columns={'edu': 'education'}),
       LightGbmClassifier(feature=['parity', 'edu'], label='induced',
                          booster=Dart(reg_lambda=0.1))
   ])

   # train, predict, and evaluate
   metrics, predictions = pipeline.fit(data).test(data, output_scores=True)

   # print predictions
   print(predictions.head())
   #   PredictedLabel   Score.0   Score.1   Score.2
   # 0               2  0.070722  0.145439  0.783839
   # 1               0  0.737733  0.260116  0.002150
   # 2               2  0.070722  0.145439  0.783839
   # 3               0  0.490715  0.091749  0.417537
   # 4               0  0.562419  0.197818  0.239763
   # print evaluation metrics
   print(metrics)
   #   Accuracy(micro-avg)  Accuracy(macro-avg)  Log-loss  Log-loss reduction  ...
   # 0             0.641129             0.462618  0.772996          19.151269  ...

Remarks

Light GBM is an open source implementation of boosted trees. It is available in nimbusml as a binary classification trainer, a multi-class trainer, a regression trainer and a ranking trainer.

Reference

GitHub: LightGBM

Methods

decision_function

Returns score values

get_params

Get the parameters for this operator.

predict_proba

Returns probabilities

decision_function

Returns score values

decision_function(X, **params)

get_params

Get the parameters for this operator.

get_params(deep=False)

Parameters

deep
default value: False

predict_proba

Returns probabilities

predict_proba(X, **params)