series_fit_poly_fl()
The function series_fit_poly_fl() applies a polynomial regression on a series. This function takes a table containing multiple series (dynamic numerical arrays) and generates the best fit high-order polynomial for each series using polynomial regression. This function returns both the polynomial coefficients and the interpolated polynomial over the range of the series.
Note
Use the native function series_fit_poly(). The function below is for reference only.
Note
series_fit_poly_fl()is a UDF (user-defined function). For more information, see usage.- This function contains inline Python and requires enabling the python() plugin on the cluster.
- For linear regression of an evenly spaced series, as created by make-series operator, use the native function series_fit_line().
Syntax
T | invoke series_fit_poly_fl(y_series, y_fit_series, fit_coeff, degree, [x_series, x_istime])
Arguments
- y_series: The name of the input table column containing the dependent variable. That is, the series to fit.
- y_fit_series: The name of the column to store the best fit series.
- fit_coeff: The name of the column to store the best fit polynomial coefficients.
- degree: The required order of the polynomial to fit. For example, 1 for linear regression, 2 for quadratic regression, and so on.
- x_series: The name of the column containing the independent variable, that is, the x or time axis. This parameter is optional, and is needed only for unevenly spaced series. The default value is an empty string, as x is redundant for the regression of an evenly spaced series.
- x_istime: This boolean parameter is optional. This parameter is needed only if x_series is specified and it's a vector of datetime.
Usage
series_fit_poly_fl() is a user-defined function tabular function, to be applied using the invoke operator. You can either embed its code in your query, or install it in your database. There are two usage options: ad hoc and persistent usage. See the below tabs for examples.
For ad hoc usage, embed its code using let statement. No permission is required.
let series_fit_poly_fl=(tbl:(*), y_series:string, y_fit_series:string, fit_coeff:string, degree:int, x_series:string='', x_istime:bool=False)
{
let kwargs = pack('y_series', y_series, 'y_fit_series', y_fit_series, 'fit_coeff', fit_coeff, 'degree', degree, 'x_series', x_series, 'x_istime', x_istime);
let code=
'\n'
'y_series = kargs["y_series"]\n'
'y_fit_series = kargs["y_fit_series"]\n'
'fit_coeff = kargs["fit_coeff"]\n'
'degree = kargs["degree"]\n'
'x_series = kargs["x_series"]\n'
'x_istime = kargs["x_istime"]\n'
'\n'
'def fit(ts_row, x_col, y_col, deg):\n'
' y = ts_row[y_col]\n'
' if x_col == "": # If there is no x column creates sequential range [1, len(y)]\n'
' x = np.arange(len(y)) + 1\n'
' else: # if x column exists check whether its a time column. If so, normalize it to the [1, len(y)] range, else take it as is.\n'
' if x_istime: \n'
' x = pd.to_numeric(pd.to_datetime(ts_row[x_col]))\n'
' x = x - x.min()\n'
' x = x / x.max()\n'
' x = x * (len(x) - 1) + 1\n'
' else:\n'
' x = ts_row[x_col]\n'
' coeff = np.polyfit(x, y, deg)\n'
' p = np.poly1d(coeff)\n'
' z = p(x)\n'
' return z, coeff\n'
'\n'
'result = df\n'
'if len(df):\n'
' result[[y_fit_series, fit_coeff]] = df.apply(fit, axis=1, args=(x_series, y_series, degree,), result_type="expand")\n'
;
tbl
| evaluate python(typeof(*), code, kwargs)
};
//
// Fit fifth order polynomial to a regular (evenly spaced) time series, created with make-series
//
let max_t = datetime(2016-09-03);
demo_make_series1
| make-series num=count() on TimeStamp from max_t-1d to max_t step 5m by OsVer
| extend fnum = dynamic(null), coeff=dynamic(null), fnum1 = dynamic(null), coeff1=dynamic(null)
| invoke series_fit_poly_fl('num', 'fnum', 'coeff', 5)
| render timechart with(ycolumns=num, fnum)
Additional examples
The following examples assume the function is already installed:
Test irregular (unevenly spaced) time series
let max_t = datetime(2016-09-03); demo_make_series1 | where TimeStamp between ((max_t-2d)..max_t) | summarize num=count() by bin(TimeStamp, 5m), OsVer | order by TimeStamp asc | where hourofday(TimeStamp) % 6 != 0 // delete every 6th hour to create unevenly spaced time series | summarize TimeStamp=make_list(TimeStamp), num=make_list(num) by OsVer | extend fnum = dynamic(null), coeff=dynamic(null) | invoke series_fit_poly_fl('num', 'fnum', 'coeff', 8, 'TimeStamp', True) | render timechart with(ycolumns=num, fnum)
Fifth order polynomial with noise on x & y axes
range x from 1 to 200 step 1 | project x = rand()*5 - 2.3 | extend y = pow(x, 5)-8*pow(x, 3)+10*x+6 | extend y = y + (rand() - 0.5)*0.5*y | summarize x=make_list(x), y=make_list(y) | extend y_fit = dynamic(null), coeff=dynamic(null) | invoke series_fit_poly_fl('y', 'y_fit', 'coeff', 5, 'x') |fork (project-away coeff) (project coeff | mv-expand coeff) | render linechart
Feedback
Submit and view feedback for