series_fit_lowess_fl()
The function series_fit_lowess_fl() applies a LOWESS regression on a series. This function takes a table with multiple series (dynamic numerical arrays) and generates a LOWESS Curve, which is a smoothed version of the original series.
Note
series_fit_lowess_fl()is a UDF (user-defined function). For more information, see usage.- This function contains inline Python and requires enabling the python() plugin on the cluster.
Syntax
T | invoke series_fit_lowess_fl(y_series, y_fit_series, [fit_size, x_series, x_istime])
Arguments
- y_series: The name of the input table column containing the dependent variable. This column is the series to fit.
- y_fit_series: The name of the column to store the fitted series.
- fit_size: For each point, the local regression is applied on its respective fit_size closest points. This parameter is optional, default to 5.
- x_series: The name of the column containing the independent variable, that is, the x or time axis. This parameter is optional, and is needed only for unevenly spaced series. The default value is an empty string, as x is redundant for the regression of an evenly spaced series.
- x_istime: This boolean parameter is needed only if x_series is specified and it's a vector of datetime. This parameter is optional, default to False.
Usage
series_fit_lowess_fl() is a user-defined function tabular function, to be applied using the invoke operator. You can either embed its code in your query, or install it in your database. There are two usage options: ad hoc and persistent usage. See the below tabs for examples.
For ad hoc usage, embed its code using let statement. No permission is required.
let series_fit_lowess_fl=(tbl:(*), y_series:string, y_fit_series:string, fit_size:int=5, x_series:string='', x_istime:bool=False)
{
let kwargs = pack('y_series', y_series, 'y_fit_series', y_fit_series, 'fit_size', fit_size, 'x_series', x_series, 'x_istime', x_istime);
let code=
'\n'
'y_series = kargs["y_series"]\n'
'y_fit_series = kargs["y_fit_series"]\n'
'fit_size = kargs["fit_size"]\n'
'x_series = kargs["x_series"]\n'
'x_istime = kargs["x_istime"]\n'
'\n'
'import statsmodels.api as sm\n'
'def lowess_fit(ts_row, x_col, y_col, fsize):\n'
' y = ts_row[y_col]\n'
' fraction = fsize/len(y)\n'
' if x_col == "": # If there is no x column creates sequential range [1, len(y)]\n'
' x = np.arange(len(y)) + 1\n'
' else: # if x column exists check whether its a time column. If so, normalize it to the [1, len(y)] range, else take it as is.\n'
' if x_istime: \n'
' x = pd.to_numeric(pd.to_datetime(ts_row[x_col]))\n'
' x = x - x.min()\n'
' x = x / x.max()\n'
' x = x * (len(x) - 1) + 1\n'
' else:\n'
' x = ts_row[x_col]\n'
' lowess = sm.nonparametric.lowess\n'
' z = lowess(y, x, return_sorted=False, frac=fraction)\n'
' return list(z)\n'
'\n'
'result = df\n'
'result[y_fit_series] = df.apply(lowess_fit, axis=1, args=(x_series, y_series, fit_size))\n'
;
tbl
| evaluate python(typeof(*), code, kwargs)
};
//
// Apply 9 points LOWESS regression on regular time series
//
let max_t = datetime(2016-09-03);
demo_make_series1
| make-series num=count() on TimeStamp from max_t-1d to max_t step 5m by OsVer
| extend fnum = dynamic(null)
| invoke series_fit_lowess_fl('num', 'fnum', 9)
| render timechart
Examples
The following examples assume the function is already installed:
Test irregular time series
The following example tests irregular (unevenly spaced) time series
let max_t = datetime(2016-09-03);
demo_make_series1
| where TimeStamp between ((max_t-1d)..max_t)
| summarize num=count() by bin(TimeStamp, 5m), OsVer
| order by TimeStamp asc
| where hourofday(TimeStamp) % 6 != 0 // delete every 6th hour to create irregular time series
| summarize TimeStamp=make_list(TimeStamp), num=make_list(num) by OsVer
| extend fnum = dynamic(null)
| invoke series_fit_lowess_fl('num', 'fnum', 9, 'TimeStamp', True)
| render timechart
Compare LOWESS versus polynomial fit
The following example contains fifth order polynomial with noise on x and y axes. See comparison of LOWESS versus polynomial fit.
range x from 1 to 200 step 1
| project x = rand()*5 - 2.3
| extend y = pow(x, 5)-8*pow(x, 3)+10*x+6
| extend y = y + (rand() - 0.5)*0.5*y
| summarize x=make_list(x), y=make_list(y)
| extend y_lowess = dynamic(null)
| invoke series_fit_lowess_fl('y', 'y_lowess', 15, 'x')
| extend series_fit_poly(y, x, 5)
| project x, y, y_lowess, y_polynomial=series_fit_poly_y_poly_fit
| render linechart
Feedback
Submit and view feedback for