Finds the most significant periods that exist in a time series.
Very often a metric measuring an application’s traffic is characterized by two significant periods: a weekly and a daily. Given such a time series,
series_periods_detect() shall detect these 2 dominant periods.
The function takes as input a column containing a dynamic array of time series (typically the resulting output of make-series operator), two
real numbers defining the minimal and maximal period size (i.e. number of bins, e.g. for 1h bin the size of a daily period would be 24) to search for, and a
long number defining the total number of periods for the function to search. The function outputs 2 columns:
- periods: a dynamic array containing the periods that have been found (in units of the bin size), ordered by their scores
- scores: a dynamic array containing values between 0 and 1, each measures the significance of a period in its respective position in the periods array
- x: Dynamic array scalar expression which is an array of numeric values, typically the resulting output of make-series or make_list operators.
- min_period: A
realnumber specifying the minimal period to search for.
- max_period: A
realnumber specifying the maximal period to search for.
- num_periods: A
longnumber specifying the maximum required number of periods. This will be the length of the output dynamic arrays.
The algorithm can detect periods containing at least 4 points and at most half of the series length.
You should set the min_period a little below and max_period a little above the periods you expect to find in the time series. For example, if you have an hourly-aggregated signal, and you look for both daily > and weekly periods (that would be 24 & 168 respectively) you can set min_period=0.8*24, max_period=1.2*168, leaving 20% margins around these periods.
The input time series must be regular, i.e. aggregated in constant bins (which is always the case if it has been created using make-series). Otherwise, the output is meaningless.
The following query embeds a snapshot of a month of an application’s traffic, aggregated twice a day (i.e. the bin size is 12 hours).
print y=dynamic([80,139,87,110,68,54,50,51,53,133,86,141,97,156,94,149,95,140,77,61,50,54,47,133,72,152,94,148,105,162,101,160,87,63,53,55,54,151,103,189,108,183,113,175,113,178,90,71,62,62,65,165,109,181,115,182,121,178,114,170]) | project x=range(1, array_length(y), 1), y | render linechart
series_periods_detect() on this series results in the weekly period (14 points long):
print y=dynamic([80,139,87,110,68,54,50,51,53,133,86,141,97,156,94,149,95,140,77,61,50,54,47,133,72,152,94,148,105,162,101,160,87,63,53,55,54,151,103,189,108,183,113,175,113,178,90,71,62,62,65,165,109,181,115,182,121,178,114,170]) | project x=range(1, array_length(y), 1), y | project series_periods_detect(y, 0.0, 50.0, 2)
|[14.0, 0.0]||[0.84, 0.0]|
Note that the daily period that can be also seen in the chart was not found since the sampling is too coarse (12h bin size) so a daily period of 2 bins is bellow the minimum period size of 4 points required by the algorithm.