GamRegressor Class
Generalized Additive Models
- Inheritance
-
nimbusml.internal.core.ensemble._gamregressor.GamRegressorGamRegressornimbusml.base_predictor.BasePredictorGamRegressorsklearn.base.RegressorMixinGamRegressor
Constructor
GamRegressor(number_of_iterations=9500, minimum_example_count_per_leaf=10, learning_rate=0.002, normalize='Auto', caching='Auto', pruning_metrics=2, entropy_coefficient=0.0, gain_conf_level=0, number_of_threads=None, disk_transpose=None, maximum_bin_count_per_feature=255, maximum_tree_output=inf, get_derivatives_sample_rate=1, random_state=123, feature_flocks=True, enable_pruning=True, feature=None, label=None, weight=None, **params)
Parameters
- feature
see Columns.
- label
see Columns.
- weight
see Columns.
- number_of_iterations
Total number of iterations over all features.
- minimum_example_count_per_leaf
Minimum number of training instances required to form a leaf. That is, the minimal number of documents allowed in a leaf of regression tree, out of the sub-sampled data. A 'split' means that features in each level of the tree (node) are randomly divided.
- learning_rate
Determines the size of the step taken in the direction of the gradient in each step of the learning process. This determines how fast or slow the learner converges on the optimal solution. If the step size is too big, you might overshoot the optimal solution. If the step size is too small, training takes longer to converge to the best solution.
- normalize
Specifies the type of automatic normalization used:
"Auto"
: if normalization is needed, it is performed automatically. This is the default choice."No"
: no normalization is performed."Yes"
: normalization is performed."Warn"
: if normalization is needed, a warning message is displayed, but normalization is not performed.
Normalization rescales disparate data ranges to a standard scale.
Feature
scaling insures the distances between data points are proportional
and
enables various optimization methods such as gradient descent to
converge
much faster. If normalization is performed, a MaxMin
normalizer
is
used. It normalizes values in an interval [a, b] where -1 <= a <= 0
and 0 <= b <= 1
and b - a = 1
. This normalizer preserves
sparsity by mapping zero to zero.
- caching
Whether trainer should cache input training data.
- pruning_metrics
Metric for pruning. (For regression, 1: L1, 2:L2; default L2).
- entropy_coefficient
The entropy (regularization) coefficient between 0 and 1.
- gain_conf_level
Tree fitting gain confidence requirement (should be in the range [0,1) ).
- number_of_threads
The number of threads to use.
- disk_transpose
Whether to utilize the disk or the data's native transposition facilities (where applicable) when performing the transpose.
- maximum_bin_count_per_feature
Maximum number of distinct values (bins) per feature.
- maximum_tree_output
Upper bound on absolute value of single output.
- get_derivatives_sample_rate
Sample each query 1 in k times in the GetDerivatives function.
- random_state
The seed of the random number generator.
- feature_flocks
Whether to collectivize features during dataset preparation to speed up training.
- enable_pruning
Enable post-training pruning to avoid overfitting. (a validation set is required).
- params
Additional arguments sent to compute engine.
Examples
###############################################################################
# GamRegressor
from nimbusml import Pipeline, FileDataStream
from nimbusml.datasets import get_dataset
from nimbusml.ensemble import GamRegressor
from nimbusml.feature_extraction.categorical import OneHotVectorizer
# data input (as a FileDataStream)
path = get_dataset('infert').as_filepath()
data = FileDataStream.read_csv(path)
print(data.head())
# age case education induced parity ... row_num spontaneous ...
# 0 26 1 0-5yrs 1 6 ... 1 2 ...
# 1 42 1 0-5yrs 1 1 ... 2 0 ...
# 2 39 1 0-5yrs 2 6 ... 3 0 ...
# 3 34 1 0-5yrs 2 4 ... 4 0 ...
# 4 35 1 6-11yrs 1 3 ... 5 1 ...
# define the training pipeline
pipeline = Pipeline([
OneHotVectorizer(columns={'edu': 'education'}),
GamRegressor(feature=['induced', 'edu'], label='age')
])
# train, predict, and evaluate
metrics, predictions = pipeline.fit(data).test(data, output_scores=True)
# print predictions
print(predictions.head())
# Score
# 0 35.390697
# 1 35.390697
# 2 34.603725
# 3 34.603725
# 4 32.455437
# print evaluation metrics
print(metrics)
# L1(avg) L2(avg) RMS(avg) Loss-fn(avg) R Squared
# 0 4.082632 24.122006 4.911416 24.122006 0.121805
Remarks
Generalized additive models
(referred
to throughout as GAM) is a class of models expressable as an
independent
sum of individual functions. nimbusml
's GAM learner comes in both
binary
classification (using logit-boosting) and regression (using least
squares) flavors.
In contrast to many formal definitions of GAM, this implementation found it convenient to represent learning over stepwise functions, which betrays the intention that GAM's components be smooth functions. In particular: the learner first discretizes features, and the "step" functions learned will step between the discretization boundaries.
This implementation is based on the this paper, but diverges from it in several important respects: most significantly, in each round of boosting, rather than do one feature at a time, it instead makes a round on all features simultaneously. In each round, it will choose only one split point of each feature to change.
In its current form, the GAM learner has the following advantages and disadvantages: on the one hand, they offer ready interpretability combined with expressive power, but on the other, they are currently slow. We would recommend their usage in the case where the key criteria is interpretability.
Let's talk a bit more about interpretabilty. The next most interpretable model, we might say, is a linear model. But really, let's say that you have a feature with a coefficient of 3.9293, or something. What do you know? You know that generally, perhaps, larger values for that feature are "better." Great. But is 4 better than 3? Is 5 better than 4? To what degree? Are there "shapes" in the distributions hidden because of the reduction of a complex quantity to a single values? These are questions a linear model fundamentally cannot answer, but a GAM model might.
Reference
Generalized additive models, Intelligible Models for Classification and Regression
Methods
get_params |
Get the parameters for this operator. |
get_params
Get the parameters for this operator.
get_params(deep=False)
Parameters
- deep