PoissonRegressionRegressor Class
Train an Poisson regression model.
- Inheritance
-
nimbusml.internal.core.linear_model._poissonregressionregressor.PoissonRegressionRegressorPoissonRegressionRegressornimbusml.base_predictor.BasePredictorPoissonRegressionRegressorsklearn.base.RegressorMixinPoissonRegressionRegressor
Constructor
PoissonRegressionRegressor(normalize='Auto', caching='Auto', l2_regularization=1.0, l1_regularization=1.0, optimization_tolerance=1e-07, history_size=20, enforce_non_negativity=False, initial_weights_diameter=0.0, maximum_number_of_iterations=2147483647, stochastic_gradient_descent_initilaization_tolerance=0.0, quiet=False, use_threads=True, number_of_threads=None, dense_optimizer=False, feature=None, label=None, weight=None, **params)
Parameters
- feature
see Columns.
- label
see Columns.
- weight
see Columns.
- normalize
Specifies the type of automatic normalization used:
"Auto"
: if normalization is needed, it is performed automatically. This is the default choice."No"
: no normalization is performed."Yes"
: normalization is performed."Warn"
: if normalization is needed, a warning message is displayed, but normalization is not performed.
Normalization rescales disparate data ranges to a standard scale.
Feature
scaling insures the distances between data points are proportional
and
enables various optimization methods such as gradient descent to
converge
much faster. If normalization is performed, a MaxMin
normalizer
is
used. It normalizes values in an interval [a, b] where -1 <= a <= 0
and 0 <= b <= 1
and b - a = 1
. This normalizer preserves
sparsity by mapping zero to zero.
- caching
Whether trainer should cache input training data.
- l2_regularization
L2 regularization weight.
- l1_regularization
L1 regularization weight.
- optimization_tolerance
Tolerance parameter for optimization convergence. Low = slower, more accurate.
- history_size
Memory size for L-BFGS. Lower=faster, less accurate.
The technique used for optimization here is L-BFGS, which uses only a
limited amount of memory to compute the next step direction. This
parameter indicates the number of past positions and gradients to store
for the computation of the next step. Must be greater than or equal to
1
.
- enforce_non_negativity
Enforce non-negative weights. This flag, however, does not put any constraint on the bias term; that is, the bias term can be still a negtaive number.
- initial_weights_diameter
Sets the initial weights diameter that
specifies the range from which values are drawn for the initial
weights. These weights are initialized randomly from within this range.
For example, if the diameter is specified to be d
, then the weights
are uniformly distributed between -d/2
and d/2
. The default
value is 0
, which specifies that all the weights are set to zero.
- maximum_number_of_iterations
Maximum iterations.
- stochastic_gradient_descent_initilaization_tolerance
Run SGD to initialize LR weights, converging to this tolerance.
- quiet
If set to true, produce no output during training.
- use_threads
Whether or not to use threads. Default is true.
- number_of_threads
Number of threads.
- dense_optimizer
If True
, forces densification of the internal
optimization vectors. If False
, enables the logistic regression
optimizer use sparse or dense internal states as it finds appropriate.
Setting denseOptimizer
to True
requires the internal optimizer
to use a dense internal state, which may help alleviate load on the
garbage collector for some varieties of larger problems.
- params
Additional arguments sent to compute engine.
Examples
###############################################################################
# PoissonRegressionRegressor
from nimbusml import Pipeline, FileDataStream
from nimbusml.datasets import get_dataset
from nimbusml.feature_extraction.categorical import OneHotVectorizer
from nimbusml.linear_model import PoissonRegressionRegressor
# data input (as a FileDataStream)
path = get_dataset('infert').as_filepath()
data = FileDataStream.read_csv(path)
print(data.head())
# age case education induced parity ... row_num spontaneous ...
# 0 26 1 0-5yrs 1 6 ... 1 2 ...
# 1 42 1 0-5yrs 1 1 ... 2 0 ...
# 2 39 1 0-5yrs 2 6 ... 3 0 ...
# 3 34 1 0-5yrs 2 4 ... 4 0 ...
# 4 35 1 6-11yrs 1 3 ... 5 1 ...
# define the training pipeline
pipeline = Pipeline([
OneHotVectorizer(columns={'edu': 'education'}),
PoissonRegressionRegressor(feature=['parity', 'edu'], label='age')
])
# train, predict, and evaluate
metrics, predictions = pipeline.fit(data).test(data, output_scores=True)
# print predictions
print(predictions.head())
# Score
# 0 35.158913
# 1 35.191872
# 2 35.158913
# 3 35.172092
# 4 32.845158
# print evaluation metrics
print(metrics)
# L1(avg) L2(avg) RMS(avg) Loss-fn(avg) R Squared
# 0 4.154053 24.429028 4.942573 24.429028 0.110628
Remarks
Poisson regression is a parameterized regression method. It assumes that the log of the conditional mean of the dependent variable follows a linear function of the dependent variables. Assuming that the dependent variable follows a Poisson distribution, the parameters of the regressor can be estimated by maximizing the likelihood of the obtained observations.
Reference
Methods
get_params |
Get the parameters for this operator. |
get_params
Get the parameters for this operator.
get_params(deep=False)
Parameters
- deep