LightGbmClassifier Class
Gradient Boosted Decision Trees
- Inheritance
-
nimbusml.internal.core.ensemble._lightgbmclassifier.LightGbmClassifierLightGbmClassifiernimbusml.base_predictor.BasePredictorLightGbmClassifiersklearn.base.ClassifierMixinLightGbmClassifier
Constructor
LightGbmClassifier(number_of_iterations=100, learning_rate=None, number_of_leaves=None, minimum_example_count_per_leaf=None, booster=None, normalize='Auto', caching='Auto', unbalanced_sets=False, use_softmax=None, sigmoid=0.5, evaluation_metric='Error', maximum_bin_count_per_feature=255, verbose=False, silent=True, number_of_threads=None, early_stopping_round=0, batch_size=1048576, use_categorical_split=None, handle_missing_value=True, minimum_example_count_per_group=100, maximum_categorical_split_point_count=32, categorical_smoothing=10.0, l2_categorical_regularization=10.0, random_state=None, parallel_trainer=None, feature=None, group_id=None, label=None, weight=None, **params)
Parameters
- feature
see Columns.
- group_id
see Columns.
- label
see Columns.
- weight
see Columns.
- number_of_iterations
Number of iterations.
- learning_rate
Determines the size of the step taken in the direction of the gradient in each step of the learning process. This determines how fast or slow the learner converges on the optimal solution. If the step size is too big, you might overshoot the optimal solution. If the step size is too small, training takes longer to converge to the best solution.
- number_of_leaves
The maximum number of leaves (terminal nodes) that can be created in any tree. Higher values potentially increase the size of the tree and get better precision, but risk overfitting and requiring longer training times.
- minimum_example_count_per_leaf
Minimum number of training instances required to form a leaf. That is, the minimal number of documents allowed in a leaf of regression tree, out of the sub-sampled data. A 'split' means that features in each level of the tree (node) are randomly divided.
- normalize
If Auto
, the choice to normalize depends on the
preference declared by the algorithm. This is the default choice. If
No
, no normalization is performed. If Yes
, normalization always
performed. If Warn
, if normalization is needed by the algorithm, a
warning message is displayed but normalization is not performed. If
normalization is performed, a MaxMin
normalizer is used. This
normalizer preserves sparsity by mapping zero to zero.
- caching
Whether trainer should cache input training data.
- unbalanced_sets
Use for multi-class classification when training data is not balanced.
- use_softmax
Use softmax loss for the multi classification.
- sigmoid
Parameter for the sigmoid function.
- evaluation_metric
Evaluation metrics.
- maximum_bin_count_per_feature
Maximum number of bucket bin for features.
- verbose
Verbose.
- silent
Printing running messages.
- number_of_threads
Number of parallel threads used to run LightGBM.
- early_stopping_round
Rounds of early stopping, 0 will disable it.
- batch_size
Number of entries in a batch when loading data.
- use_categorical_split
Enable categorical split or not.
- handle_missing_value
Enable special handling of missing value or not.
- minimum_example_count_per_group
Minimum number of instances per categorical group.
- maximum_categorical_split_point_count
Max number of categorical thresholds.
- categorical_smoothing
Lapalace smooth term in categorical feature spilt. Avoid the bias of small categories.
- l2_categorical_regularization
L2 Regularization for categorical split.
- random_state
Sets the random seed for LightGBM to use.
- parallel_trainer
Parallel LightGBM Learning Algorithm.
- params
Additional arguments sent to compute engine.
Examples
###############################################################################
# LightGbmClassifier
from nimbusml import Pipeline, FileDataStream
from nimbusml.datasets import get_dataset
from nimbusml.ensemble import LightGbmClassifier
from nimbusml.ensemble.booster import Dart
from nimbusml.feature_extraction.categorical import OneHotVectorizer
# data input (as a FileDataStream)
path = get_dataset('infert').as_filepath()
data = FileDataStream.read_csv(path)
print(data.head())
# age case education induced parity ... row_num spontaneous ...
# 0 26 1 0-5yrs 1 6 ... 1 2 ...
# 1 42 1 0-5yrs 1 1 ... 2 0 ...
# 2 39 1 0-5yrs 2 6 ... 3 0 ...
# 3 34 1 0-5yrs 2 4 ... 4 0 ...
# 4 35 1 6-11yrs 1 3 ... 5 1 ...
# define the training pipeline
pipeline = Pipeline([
OneHotVectorizer(columns={'edu': 'education'}),
LightGbmClassifier(feature=['parity', 'edu'], label='induced',
booster=Dart(reg_lambda=0.1))
])
# train, predict, and evaluate
metrics, predictions = pipeline.fit(data).test(data, output_scores=True)
# print predictions
print(predictions.head())
# PredictedLabel Score.0 Score.1 Score.2
# 0 2 0.070722 0.145439 0.783839
# 1 0 0.737733 0.260116 0.002150
# 2 2 0.070722 0.145439 0.783839
# 3 0 0.490715 0.091749 0.417537
# 4 0 0.562419 0.197818 0.239763
# print evaluation metrics
print(metrics)
# Accuracy(micro-avg) Accuracy(macro-avg) Log-loss Log-loss reduction ...
# 0 0.641129 0.462618 0.772996 19.151269 ...
Remarks
Light GBM is an open source implementation of boosted trees. It is available in nimbusml as a binary classification trainer, a multi-class trainer, a regression trainer and a ranking trainer.
Reference
Methods
decision_function |
Returns score values |
get_params |
Get the parameters for this operator. |
predict_proba |
Returns probabilities |
decision_function
Returns score values
decision_function(X, **params)
get_params
Get the parameters for this operator.
get_params(deep=False)
Parameters
- deep
predict_proba
Returns probabilities
predict_proba(X, **params)