quantize_fl()
The function quantize_fl() bins metric columns. It quantizes metric columns to categorical labels, based on the K-Means algorithm.
Note
quantize_fl()is a UDF (user-defined function). For more information, see usage.- This function contains inline Python and requires enabling the python() plugin on the cluster.
Syntax
T | invoke quantize_fl(num_bins, in_cols, out_cols, labels)
Arguments
- num_bins: Required number of bins.
- in_cols: Dynamic array containing the names of the columns to quantize.
- out_cols: Dynamic array containing the names of the respective output columns for the binned values.
- labels: Dynamic array containing the label names. This parameter is optional. If Labels isn't supplied, bin ranges will be used.
Usage
quantize_fl() is a user-defined tabular function, to be applied using the invoke operator. You can either embed its code in your query, or install it in your database. There are two usage options: ad hoc and persistent usage. See the below tabs for examples.
For ad hoc usage, embed its code using the let statement. No permission is required.
let quantize_fl=(tbl:(*), num_bins:int, in_cols:dynamic, out_cols:dynamic, labels:dynamic=dynamic(null))
{
let kwargs = pack('num_bins', num_bins, 'in_cols', in_cols, 'out_cols', out_cols, 'labels', labels);
let code = ```if 1:
from sklearn.preprocessing import KBinsDiscretizer
num_bins = kargs["num_bins"]
in_cols = kargs["in_cols"]
out_cols = kargs["out_cols"]
labels = kargs["labels"]
result = df
binner = KBinsDiscretizer(n_bins=num_bins, encode="ordinal", strategy="kmeans")
df_in = df[in_cols]
bdata = binner.fit_transform(df_in)
if labels is None:
for i in range(len(out_cols)): # loop on each column and convert it to binned labels
ii = np.round(binner.bin_edges_[i], 3)
labels = [str(ii[j-1]) + '-' + str(ii[j]) for j in range(1, num_bins+1)]
result.loc[:,out_cols[i]] = np.take(labels, bdata[:, i].astype(int))
else:
result[out_cols] = np.take(labels, bdata.astype(int))
```;
tbl
| evaluate python(typeof(*), code, kwargs)
};
//
union
(range x from 1 to 5 step 1),
(range x from 10 to 15 step 1),
(range x from 20 to 25 step 1)
| extend x_label='', x_bin=''
| invoke quantize_fl(3, pack_array('x'), pack_array('x_label'), pack_array('Low', 'Med', 'High'))
| invoke quantize_fl(3, pack_array('x'), pack_array('x_bin'), dynamic(null))
x x_label x_bin
1 Low 1.0-7.75
2 Low 1.0-7.75
3 Low 1.0-7.75
4 Low 1.0-7.75
5 Low 1.0-7.75
20 High 17.5-25.0
21 High 17.5-25.0
22 High 17.5-25.0
23 High 17.5-25.0
24 High 17.5-25.0
25 High 17.5-25.0
10 Med 7.75-17.5
11 Med 7.75-17.5
12 Med 7.75-17.5
13 Med 7.75-17.5
14 Med 7.75-17.5
15 Med 7.75-17.5
Feedback
Submit and view feedback for