GamRegressor Class

Generalized Additive Models

Inheritance
nimbusml.internal.core.ensemble._gamregressor.GamRegressor
GamRegressor
nimbusml.base_predictor.BasePredictor
GamRegressor
sklearn.base.RegressorMixin
GamRegressor

Constructor

GamRegressor(number_of_iterations=9500, minimum_example_count_per_leaf=10, learning_rate=0.002, normalize='Auto', caching='Auto', pruning_metrics=2, entropy_coefficient=0.0, gain_conf_level=0, number_of_threads=None, disk_transpose=None, maximum_bin_count_per_feature=255, maximum_tree_output=inf, get_derivatives_sample_rate=1, random_state=123, feature_flocks=True, enable_pruning=True, feature=None, label=None, weight=None, **params)

Parameters

feature

see Columns.

label

see Columns.

weight

see Columns.

number_of_iterations

Total number of iterations over all features.

minimum_example_count_per_leaf

Minimum number of training instances required to form a leaf. That is, the minimal number of documents allowed in a leaf of regression tree, out of the sub-sampled data. A 'split' means that features in each level of the tree (node) are randomly divided.

learning_rate

Determines the size of the step taken in the direction of the gradient in each step of the learning process. This determines how fast or slow the learner converges on the optimal solution. If the step size is too big, you might overshoot the optimal solution. If the step size is too small, training takes longer to converge to the best solution.

normalize

Specifies the type of automatic normalization used:

  • "Auto": if normalization is needed, it is performed automatically. This is the default choice.

  • "No": no normalization is performed.

  • "Yes": normalization is performed.

  • "Warn": if normalization is needed, a warning message is displayed, but normalization is not performed.

Normalization rescales disparate data ranges to a standard scale. Feature scaling insures the distances between data points are proportional and enables various optimization methods such as gradient descent to converge much faster. If normalization is performed, a MaxMin normalizer is used. It normalizes values in an interval [a, b] where -1 <= a <= 0 and 0 <= b <= 1 and b - a = 1. This normalizer preserves sparsity by mapping zero to zero.

caching

Whether trainer should cache input training data.

pruning_metrics

Metric for pruning. (For regression, 1: L1, 2:L2; default L2).

entropy_coefficient

The entropy (regularization) coefficient between 0 and 1.

gain_conf_level

Tree fitting gain confidence requirement (should be in the range [0,1) ).

number_of_threads

The number of threads to use.

disk_transpose

Whether to utilize the disk or the data's native transposition facilities (where applicable) when performing the transpose.

maximum_bin_count_per_feature

Maximum number of distinct values (bins) per feature.

maximum_tree_output

Upper bound on absolute value of single output.

get_derivatives_sample_rate

Sample each query 1 in k times in the GetDerivatives function.

random_state

The seed of the random number generator.

feature_flocks

Whether to collectivize features during dataset preparation to speed up training.

enable_pruning

Enable post-training pruning to avoid overfitting. (a validation set is required).

params

Additional arguments sent to compute engine.

Examples


   ###############################################################################
   # GamRegressor
   from nimbusml import Pipeline, FileDataStream
   from nimbusml.datasets import get_dataset
   from nimbusml.ensemble import GamRegressor
   from nimbusml.feature_extraction.categorical import OneHotVectorizer

   # data input (as a FileDataStream)
   path = get_dataset('infert').as_filepath()
   data = FileDataStream.read_csv(path)
   print(data.head())
   #   age  case education  induced  parity  ... row_num  spontaneous  ...
   # 0   26     1    0-5yrs        1       6 ...       1            2  ...
   # 1   42     1    0-5yrs        1       1 ...       2            0  ...
   # 2   39     1    0-5yrs        2       6 ...       3            0  ...
   # 3   34     1    0-5yrs        2       4 ...       4            0  ...
   # 4   35     1   6-11yrs        1       3 ...       5            1  ...

   # define the training pipeline
   pipeline = Pipeline([
       OneHotVectorizer(columns={'edu': 'education'}),
       GamRegressor(feature=['induced', 'edu'], label='age')
   ])

   # train, predict, and evaluate
   metrics, predictions = pipeline.fit(data).test(data, output_scores=True)

   # print predictions
   print(predictions.head())
   #       Score
   # 0  35.390697
   # 1  35.390697
   # 2  34.603725
   # 3  34.603725
   # 4  32.455437

   # print evaluation metrics
   print(metrics)
   #    L1(avg)    L2(avg)  RMS(avg)  Loss-fn(avg)  R Squared
   # 0  4.082632  24.122006  4.911416     24.122006   0.121805

Remarks

Generalized additive models (referred to throughout as GAM) is a class of models expressable as an independent sum of individual functions. nimbusml's GAM learner comes in both binary classification (using logit-boosting) and regression (using least squares) flavors.

In contrast to many formal definitions of GAM, this implementation found it convenient to represent learning over stepwise functions, which betrays the intention that GAM's components be smooth functions. In particular: the learner first discretizes features, and the "step" functions learned will step between the discretization boundaries.

This implementation is based on the this paper, but diverges from it in several important respects: most significantly, in each round of boosting, rather than do one feature at a time, it instead makes a round on all features simultaneously. In each round, it will choose only one split point of each feature to change.

In its current form, the GAM learner has the following advantages and disadvantages: on the one hand, they offer ready interpretability combined with expressive power, but on the other, they are currently slow. We would recommend their usage in the case where the key criteria is interpretability.

Let's talk a bit more about interpretabilty. The next most interpretable model, we might say, is a linear model. But really, let's say that you have a feature with a coefficient of 3.9293, or something. What do you know? You know that generally, perhaps, larger values for that feature are "better." Great. But is 4 better than 3? Is 5 better than 4? To what degree? Are there "shapes" in the distributions hidden because of the reduction of a complex quantity to a single values? These are questions a linear model fundamentally cannot answer, but a GAM model might.

Reference

Generalized additive models, Intelligible Models for Classification and Regression

Methods

get_params

Get the parameters for this operator.

get_params

Get the parameters for this operator.

get_params(deep=False)

Parameters

deep
default value: False