SsaSpikeDetector Class
This transform detects the spikes in a seasonal time-series using Singular Spectrum Analysis (SSA).
- Inheritance
-
nimbusml.internal.core.timeseries._ssaspikedetector.SsaSpikeDetectorSsaSpikeDetectornimbusml.base_transform.BaseTransformSsaSpikeDetectorsklearn.base.TransformerMixinSsaSpikeDetector
Constructor
SsaSpikeDetector(training_window_size=100, confidence=99.0, seasonal_window_size=10, side='TwoSided', pvalue_history_length=100, error_function='SignedDifference', columns=None, **params)
Parameters
- columns
see Columns.
- training_window_size
The number of points, N, from the beginning of the sequence used to train the SSA model.
- confidence
The confidence for spike detection in the range [0, 100].
- seasonal_window_size
An upper bound, L, on the largest relevant seasonality in the input time-series, which also determines the order of the autoregression of SSA. It must satisfy 2 < L < N/2.
- side
The argument that determines whether to detect positive or
negative anomalies, or both. Available
options are {Positive
, Negative
, TwoSided
}.
- pvalue_history_length
The size of the sliding window for computing the p-value.
- error_function
The function used to compute the error between the
expected and the observed value. Possible
values are {SignedDifference
, AbsoluteDifference
,
SignedProportion
, AbsoluteProportion
,
SquaredDifference
}.
- params
Additional arguments sent to compute engine.
Examples
###############################################################################
# SsaSpikeDetector
import numpy as np
import pandas as pd
from nimbusml.timeseries import SsaSpikeDetector
# This example creates a time series (list of data with the
# i-th element corresponding to the i-th time slot).
# The estimator is applied to identify spiking points in the series.
# This estimator can account for temporal seasonality in the data.
# Generate sample series data with a recurring
# pattern and a spike within the pattern
seasonality_size = 5
seasonal_data = np.arange(seasonality_size)
data = np.tile(seasonal_data, 3)
data = np.append(data, [100]) # add a spike
data = np.append(data, seasonal_data)
X_train = pd.Series(data, name="ts")
# X_train looks like this
# 0 0
# 1 1
# 2 2
# 3 3
# 4 4
# 5 0
# 6 1
# 7 2
# 8 3
# 9 4
# 10 0
# 11 1
# 12 2
# 13 3
# 14 4
# 15 100
# 16 0
# 17 1
# 18 2
# 19 3
# 20 4
training_seasons = 3
training_size = seasonality_size * training_seasons
ssd = SsaSpikeDetector(confidence=95,
pvalue_history_length=8,
training_window_size=training_size,
seasonal_window_size=seasonality_size + 1) << {'result': 'ts'}
ssd.fit(X_train, verbose=1)
data = ssd.transform(X_train)
print(data)
# ts result.Alert result.Raw Score result.P-Value Score
# 0 0 0.0 -2.531824 5.000000e-01
# 1 1 0.0 -0.008832 5.818072e-03
# 2 2 0.0 0.763040 1.374071e-01
# 3 3 0.0 0.693811 2.797713e-01
# 4 4 0.0 1.442079 1.838294e-01
# 5 0 0.0 -1.844414 1.707238e-01
# 6 1 0.0 0.219578 4.364025e-01
# 7 2 0.0 0.201708 4.505472e-01
# 8 3 0.0 0.157089 4.684456e-01
# 9 4 0.0 1.329494 1.773046e-01
# 10 0 0.0 -1.792391 7.353794e-02
# 11 1 0.0 0.161634 4.999295e-01
# 12 2 0.0 0.092626 4.953789e-01
# 13 3 0.0 0.084648 4.514174e-01
# 14 4 0.0 1.305554 1.202619e-01
# 15 100 1.0 98.207609 1.000000e-08 <-- alert is on, predicted spike
# 16 0 0.0 -13.831450 2.912225e-01
# 17 1 0.0 -1.741884 4.379857e-01
# 18 2 0.0 -0.465426 4.557261e-01
# 19 3 0.0 -16.497133 2.926521e-01
# 20 4 0.0 -29.817375 2.060473e-01
Remarks
Singular Spectrum Analysis (SSA) is a powerful framework for decomposing the time-series into trend, seasonality and noise components as well as forecasting the future values of the time-series. In order to remove the effect of such components on anomaly detection, this transform adds SSA as a time-series modeler component in the detection pipeline.
The SSA component will be trained and it predicts the next expected value on the time-series under normal condition; this expected value is further used to calculate the amount of deviation from the normal (predicted) behavior at that timestamp. The distribution of this deviation is then modeled using Adaptive kernel density estimation.
The p-value score for the current deviation is calculated based on the estimated distribution. The lower its value, the more likely the current point is an outlier.
Methods
get_params |
Get the parameters for this operator. |
get_params
Get the parameters for this operator.
get_params(deep=False)
Parameters
- deep