PoissonRegressionRegressor Class

Train an Poisson regression model.

Inheritance
nimbusml.internal.core.linear_model._poissonregressionregressor.PoissonRegressionRegressor
PoissonRegressionRegressor
nimbusml.base_predictor.BasePredictor
PoissonRegressionRegressor
sklearn.base.RegressorMixin
PoissonRegressionRegressor

Constructor

PoissonRegressionRegressor(normalize='Auto', caching='Auto', l2_regularization=1.0, l1_regularization=1.0, optimization_tolerance=1e-07, history_size=20, enforce_non_negativity=False, initial_weights_diameter=0.0, maximum_number_of_iterations=2147483647, stochastic_gradient_descent_initilaization_tolerance=0.0, quiet=False, use_threads=True, number_of_threads=None, dense_optimizer=False, feature=None, label=None, weight=None, **params)

Parameters

feature

see Columns.

label

see Columns.

weight

see Columns.

normalize

Specifies the type of automatic normalization used:

  • "Auto": if normalization is needed, it is performed automatically. This is the default choice.

  • "No": no normalization is performed.

  • "Yes": normalization is performed.

  • "Warn": if normalization is needed, a warning message is displayed, but normalization is not performed.

Normalization rescales disparate data ranges to a standard scale. Feature scaling insures the distances between data points are proportional and enables various optimization methods such as gradient descent to converge much faster. If normalization is performed, a MaxMin normalizer is used. It normalizes values in an interval [a, b] where -1 <= a <= 0 and 0 <= b <= 1 and b - a = 1. This normalizer preserves sparsity by mapping zero to zero.

caching

Whether trainer should cache input training data.

l2_regularization

L2 regularization weight.

l1_regularization

L1 regularization weight.

optimization_tolerance

Tolerance parameter for optimization convergence. Low = slower, more accurate.

history_size

Memory size for L-BFGS. Lower=faster, less accurate. The technique used for optimization here is L-BFGS, which uses only a limited amount of memory to compute the next step direction. This parameter indicates the number of past positions and gradients to store for the computation of the next step. Must be greater than or equal to 1.

enforce_non_negativity

Enforce non-negative weights. This flag, however, does not put any constraint on the bias term; that is, the bias term can be still a negtaive number.

initial_weights_diameter

Sets the initial weights diameter that specifies the range from which values are drawn for the initial weights. These weights are initialized randomly from within this range. For example, if the diameter is specified to be d, then the weights are uniformly distributed between -d/2 and d/2. The default value is 0, which specifies that all the weights are set to zero.

maximum_number_of_iterations

Maximum iterations.

stochastic_gradient_descent_initilaization_tolerance

Run SGD to initialize LR weights, converging to this tolerance.

quiet

If set to true, produce no output during training.

use_threads

Whether or not to use threads. Default is true.

number_of_threads

Number of threads.

dense_optimizer

If True, forces densification of the internal optimization vectors. If False, enables the logistic regression optimizer use sparse or dense internal states as it finds appropriate. Setting denseOptimizer to True requires the internal optimizer to use a dense internal state, which may help alleviate load on the garbage collector for some varieties of larger problems.

params

Additional arguments sent to compute engine.

Examples


   ###############################################################################
   # PoissonRegressionRegressor
   from nimbusml import Pipeline, FileDataStream
   from nimbusml.datasets import get_dataset
   from nimbusml.feature_extraction.categorical import OneHotVectorizer
   from nimbusml.linear_model import PoissonRegressionRegressor

   # data input (as a FileDataStream)
   path = get_dataset('infert').as_filepath()

   data = FileDataStream.read_csv(path)
   print(data.head())
   #    age  case education  induced  parity ... row_num  spontaneous  ...
   # 0   26     1    0-5yrs        1       6 ...       1            2  ...
   # 1   42     1    0-5yrs        1       1 ...       2            0  ...
   # 2   39     1    0-5yrs        2       6 ...       3            0  ...
   # 3   34     1    0-5yrs        2       4 ...       4            0  ...
   # 4   35     1   6-11yrs        1       3 ...       5            1  ...

   # define the training pipeline
   pipeline = Pipeline([
       OneHotVectorizer(columns={'edu': 'education'}),
       PoissonRegressionRegressor(feature=['parity', 'edu'], label='age')
   ])

   # train, predict, and evaluate
   metrics, predictions = pipeline.fit(data).test(data, output_scores=True)

   # print predictions
   print(predictions.head())
   #       Score
   # 0  35.158913
   # 1  35.191872
   # 2  35.158913
   # 3  35.172092
   # 4  32.845158

   # print evaluation metrics
   print(metrics)
   #    L1(avg)    L2(avg)  RMS(avg)  Loss-fn(avg)  R Squared
   # 0  4.154053  24.429028  4.942573     24.429028   0.110628

Remarks

Poisson regression is a parameterized regression method. It assumes that the log of the conditional mean of the dependent variable follows a linear function of the dependent variables. Assuming that the dependent variable follows a Poisson distribution, the parameters of the regressor can be estimated by maximizing the likelihood of the obtained observations.

Reference

Poisson regression

Methods

get_params

Get the parameters for this operator.

get_params

Get the parameters for this operator.

get_params(deep=False)

Parameters

deep
default value: False