make-series operator
Create series of specified aggregated values along a specified axis.
T | make-series sum(amount) default=0, avg(price) default=0 on timestamp from datetime(2016-01-01) to datetime(2016-01-10) step 1d by fruit, supplier
Syntax
T | make-series [MakeSeriesParameters]
[Column =] Aggregation [default = DefaultValue] [, ...]
on AxisColumn [from start] [to end] step step
[by
[Column =] GroupExpression [, ...]]
Arguments
Column: Optional name for a result column. Defaults to a name derived from the expression.
DefaultValue: Default value that will be used instead of absent values. If there is no row with specific values of AxisColumn and GroupExpression, then in the results the corresponding element of the array will be assigned a DefaultValue. If DefaultValue is omitted, then 0 is assumed.
Aggregation: A call to an aggregation function such as
count()oravg(), with column names as arguments. See the list of aggregation functions. Only aggregation functions that return numeric results can be used with themake-seriesoperator.AxisColumn: A column on which the series will be ordered, usually of type
datetimeortimespan, but all numeric types are also accepted.start: (optional) The low bound value of the AxisColumn for each of the series to be built. start, end, and step are used to build an array of AxisColumn values within a given range and using specified step. All Aggregation values are ordered respectively to this array. This AxisColumn array is also the last output column in the output that has the same name as AxisColumn. If a start value is not specified, the start is the first bin (step) which has data in each series.
end: (optional) The high bound (non-inclusive) value of the AxisColumn. The last index of the time series is smaller than this value (and will be start plus integer multiple of step that is smaller than end). If end value is not provided, it will be the upper bound of the last bin (step) which has data per each series.
step: The difference between two consecutive elements of the AxisColumn array (that is, the bin size). For a list of possible time intervals, see timespan.
GroupExpression: An expression over the columns that provides a set of distinct values. Typically it's a column name that already provides a restricted set of values.
MakeSeriesParameters: Zero or more (space-separated) parameters in the form of Name
=Value that control the behavior. The following parameters are supported:Name Description kindProduces default result when the input of make-series operator is empty. Value: nonemptyhint.shufflekey=<key>The shufflekeyquery shares the query load on cluster nodes, using a key to partition data. See shuffle queryhint.strategy=shuffleThe shufflestrategy query shares the query load on cluster nodes, where each node will process one partition of the data. See shuffle query
Note
The arrays generated by make-series are limited to 1048576 values (2^20). Trying to generate a larger array with make-series would result in either an error or a truncated array.
Alternate Syntax
T | make-series
[Column =] Aggregation [default = DefaultValue] [, ...]
on AxisColumn in range(start, stop, step)
[by
[Column =] GroupExpression [, ...]]
The generated series from the alternate syntax differs from the main syntax in two aspects:
- The stop value is inclusive.
- Binning the index axis is generated with bin() and not bin_at(), which means that start may not be included in the generated series.
It is recommended to use the main syntax of make-series and not the alternate syntax.
Returns
The input rows are arranged into groups having the same values of the by expressions and the bin_at(AxisColumn, step, start) expression. Then the specified aggregation functions are computed over each group, producing a row for each group. The result contains the by columns, AxisColumn column and also at least one column for each computed aggregate. (Aggregations over multiple columns or non-numeric results are not supported.)
This intermediate result has as many rows as there are distinct combinations of by and bin_at(AxisColumn, step, start) values.
Finally the rows from the intermediate result arranged into groups having the same values of the by expressions and all aggregated values are arranged into arrays (values of dynamic type). For each aggregation, there is one column containing its array with the same name. The last column is an array containing the values of AxisColumn binned according to the specified step.
Note
Although you can provide arbitrary expressions for both the aggregation and grouping expressions, it's more efficient to use simple column names.
List of aggregation functions
| Function | Description |
|---|---|
| avg() | Returns an average value across the group |
| avgif() | Returns an average with the predicate of the group |
| count() | Returns a count of the group |
| countif() | Returns a count with the predicate of the group |
| dcount() | Returns an approximate distinct count of the group elements |
| dcountif() | Returns an approximate distinct count with the predicate of the group |
| max() | Returns the maximum value across the group |
| maxif() | Returns the maximum value with the predicate of the group |
| min() | Returns the minimum value across the group |
| minif() | Returns the minimum value with the predicate of the group |
| percentile() | Returns the percentile value across the group |
| take_any() | Returns a random non-empty value for the group |
| stdev() | Returns the standard deviation across the group |
| sum() | Returns the sum of the elements within the group |
| sumif() | Returns the sum of the elements with the predicate of the group |
| variance() | Returns the variance across the group |
List of series analysis functions
| Function | Description |
|---|---|
| series_fir() | Applies Finite Impulse Response filter |
| series_iir() | Applies Infinite Impulse Response filter |
| series_fit_line() | Finds a straight line that is the best approximation of the input |
| series_fit_line_dynamic() | Finds a line that is the best approximation of the input, returning dynamic object |
| series_fit_2lines() | Finds two lines that are the best approximation of the input |
| series_fit_2lines_dynamic() | Finds two lines that are the best approximation of the input, returning dynamic object |
| series_outliers() | Scores anomaly points in a series |
| series_periods_detect() | Finds the most significant periods that exist in a time series |
| series_periods_validate() | Checks whether a time series contains periodic patterns of given lengths |
| series_stats_dynamic() | Return multiple columns with the common statistics (min/max/variance/stdev/average) |
| series_stats() | Generates a dynamic value with the common statistics (min/max/variance/stdev/average) |
For a complete list of series analysis functions see: Series processing functions
List of series interpolation functions
| Function | Description |
|---|---|
| series_fill_backward() | Performs backward fill interpolation of missing values in a series |
| series_fill_const() | Replaces missing values in a series with a specified constant value |
| series_fill_forward() | Performs forward fill interpolation of missing values in a series |
| series_fill_linear() | Performs linear interpolation of missing values in a series |
- Note: Interpolation functions by default assume
nullas a missing value. Therefore specifydefault=double(null) inmake-seriesif you intend to use interpolation functions for the series.
Examples
A table that shows arrays of the numbers and average prices of each fruit from each supplier ordered by the timestamp with specified range. There's a row in the output for each distinct combination of fruit and supplier. The output columns show the fruit, supplier, and arrays of: count, average, and the whole timeline (from 2016-01-01 until 2016-01-10). All arrays are sorted by the respective timestamp and all gaps are filled with default values (0 in this example). All other input columns are ignored.
T | make-series PriceAvg=avg(Price) default=0
on Purchase from datetime(2016-09-10) to datetime(2016-09-13) step 1d by Supplier, Fruit
let data=datatable(timestamp:datetime, metric: real)
[
datetime(2016-12-31T06:00), 50,
datetime(2017-01-01), 4,
datetime(2017-01-02), 3,
datetime(2017-01-03), 4,
datetime(2017-01-03T03:00), 6,
datetime(2017-01-05), 8,
datetime(2017-01-05T13:40), 13,
datetime(2017-01-06), 4,
datetime(2017-01-07), 3,
datetime(2017-01-08), 8,
datetime(2017-01-08T21:00), 8,
datetime(2017-01-09), 2,
datetime(2017-01-09T12:00), 11,
datetime(2017-01-10T05:00), 5,
];
let interval = 1d;
let stime = datetime(2017-01-01);
let etime = datetime(2017-01-10);
data
| make-series avg(metric) on timestamp from stime to etime step interval
| avg_metric | timestamp |
|---|---|
| [ 4.0, 3.0, 5.0, 0.0, 10.5, 4.0, 3.0, 8.0, 6.5 ] | [ "2017-01-01T00:00:00.0000000Z", "2017-01-02T00:00:00.0000000Z", "2017-01-03T00:00:00.0000000Z", "2017-01-04T00:00:00.0000000Z", "2017-01-05T00:00:00.0000000Z", "2017-01-06T00:00:00.0000000Z", "2017-01-07T00:00:00.0000000Z", "2017-01-08T00:00:00.0000000Z", "2017-01-09T00:00:00.0000000Z" ] |
When the input to make-series is empty, the default behavior of make-series produces an empty result as well.
let data=datatable(timestamp:datetime, metric: real)
[
datetime(2016-12-31T06:00), 50,
datetime(2017-01-01), 4,
datetime(2017-01-02), 3,
datetime(2017-01-03), 4,
datetime(2017-01-03T03:00), 6,
datetime(2017-01-05), 8,
datetime(2017-01-05T13:40), 13,
datetime(2017-01-06), 4,
datetime(2017-01-07), 3,
datetime(2017-01-08), 8,
datetime(2017-01-08T21:00), 8,
datetime(2017-01-09), 2,
datetime(2017-01-09T12:00), 11,
datetime(2017-01-10T05:00), 5,
];
let interval = 1d;
let stime = datetime(2017-01-01);
let etime = datetime(2017-01-10);
data
| limit 0
| make-series avg(metric) default=1.0 on timestamp from stime to etime step interval
| count
| Count |
|---|
| 0 |
Using kind=nonempty in make-series will produce a non-empty result of the default values:
let data=datatable(timestamp:datetime, metric: real)
[
datetime(2016-12-31T06:00), 50,
datetime(2017-01-01), 4,
datetime(2017-01-02), 3,
datetime(2017-01-03), 4,
datetime(2017-01-03T03:00), 6,
datetime(2017-01-05), 8,
datetime(2017-01-05T13:40), 13,
datetime(2017-01-06), 4,
datetime(2017-01-07), 3,
datetime(2017-01-08), 8,
datetime(2017-01-08T21:00), 8,
datetime(2017-01-09), 2,
datetime(2017-01-09T12:00), 11,
datetime(2017-01-10T05:00), 5,
];
let interval = 1d;
let stime = datetime(2017-01-01);
let etime = datetime(2017-01-10);
data
| limit 0
| make-series kind=nonempty avg(metric) default=1.0 on timestamp from stime to etime step interval
| avg_metric | timestamp |
|---|---|
| [ 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0 ] |
[ "2017-01-01T00:00:00.0000000Z", "2017-01-02T00:00:00.0000000Z", "2017-01-03T00:00:00.0000000Z", "2017-01-04T00:00:00.0000000Z", "2017-01-05T00:00:00.0000000Z", "2017-01-06T00:00:00.0000000Z", "2017-01-07T00:00:00.0000000Z", "2017-01-08T00:00:00.0000000Z", "2017-01-09T00:00:00.0000000Z" ] |
Feedback
Submit and view feedback for