Data transformation expressions in mapping data flow

APPLIES TO: Azure Data Factory Azure Synapse Analytics

Data flows are available both in Azure Data Factory and Azure Synapse Pipelines. This article applies to mapping data flows. If you are new to transformations, please refer to the introductory article Transform data using a mapping data flow.

This article provides details about expressions and functions supported by Azure Data Factory and Azure Synapse Analytics in mapping data flows.

Expression functions

In Data Factory and Synapse pipelines, use the expression language of the mapping data flow feature to configure data transformations.

Expression function Task
abs Absolute value of a number.
acos Calculates a cosine inverse value.
add Adds a pair of strings or numbers. Adds a date to a number of days. Adds a duration to a timestamp. Appends one array of similar type to another. Same as the + operator.
addDays Add days to a date or timestamp. Same as the + operator for date.
addMonths Add months to a date or timestamp. You can optionally pass a timezone.
and Logical AND operator. Same as &&.
asin Calculates an inverse sine value.
atan Calculates a inverse tangent value.
atan2 Returns the angle in radians between the positive x-axis of a plane and the point given by the coordinates.
between Checks if the first value is in between two other values inclusively. Numeric, string and datetime values can be compared
bitwiseAnd Bitwise And operator across integral types. Same as & operator
bitwiseOr Bitwise Or operator across integral types. Same as | operator
bitwiseXor Bitwise Or operator across integral types. Same as | operator
blake2b Calculates the Blake2 digest of set of column of varying primitive datatypes given a bit length which can only be multiples of 8 between 8 & 512. It can be used to calculate a fingerprint for a row
blake2bBinary Calculates the Blake2 digest of set of column of varying primitive datatypes given a bit length which can only be multiples of 8 between 8 & 512. It can be used to calculate a fingerprint for a row
case Based on alternating conditions applies one value or the other. If the number of inputs are even, the other is defaulted to NULL for last condition.
cbrt Calculates the cube root of a number.
ceil Returns the smallest integer not smaller than the number.
coalesce Returns the first not null value from a set of inputs. All inputs should be of the same type.
columnNames Gets the names of all output columns for a stream. You can pass an optional stream name as the second argument.
columns Gets the values of all output columns for a stream. You can pass an optional stream name as the second argument.
compare Compares two values of the same type. Returns negative integer if value1 < value2, 0 if value1 == value2, positive value if value1 > value2.
concat Concatenates a variable number of strings together. Same as the + operator with strings.
concatWS Concatenates a variable number of strings together with a separator. The first parameter is the separator.
cos Calculates a cosine value.
cosh Calculates a hyperbolic cosine of a value.
crc32 Calculates the CRC32 hash of set of column of varying primitive datatypes given a bit length which can only be of values 0(256), 224, 256, 384, 512. It can be used to calculate a fingerprint for a row.
currentDate Gets the current date when this job starts to run. You can pass an optional timezone in the form of 'GMT', 'PST', 'UTC', 'America/Cayman'. The local timezone is used as the default. Refer Java's SimpleDateFormat class for available formats. https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html.
currentTimestamp Gets the current timestamp when the job starts to run with local time zone.
currentUTC Gets the current timestamp as UTC. If you want your current time to be interpreted in a different timezone than your cluster time zone, you can pass an optional timezone in the form of 'GMT', 'PST', 'UTC', 'America/Cayman'. It is defaulted to the current timezone. Refer Java's SimpleDateFormat class for available formats. https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html. To convert the UTC time to a different timezone use fromUTC().
dayOfMonth Gets the day of the month given a date.
dayOfWeek Gets the day of the week given a date. 1 - Sunday, 2 - Monday ..., 7 - Saturday.
dayOfYear Gets the day of the year given a date.
days Duration in milliseconds for number of days.
degrees Converts radians to degrees.
divide Divides pair of numbers. Same as the / operator.
dropLeft Removes as many characters from the left of the string. If the drop requested exceeds the length of the string, an empty string is returned.
dropRight Removes as many characters from the right of the string. If the drop requested exceeds the length of the string, an empty string is returned.
endsWith Checks if the string ends with the supplied string.
equals Comparison equals operator. Same as == operator.
equalsIgnoreCase Comparison equals operator ignoring case. Same as <=> operator.
escape Escapes a string according to a format. Literal values for acceptable format are 'json', 'xml', 'ecmascript', 'html', 'java'.
expr Results in a expression from a string. This is the same as writing this expression in a non-literal form. This can be used to pass parameters as string representations.
factorial Calculates the factorial of a number.
false Always returns a false value. Use the function syntax(false()) if there is a column named 'false'.
floor Returns the largest integer not greater than the number.
fromBase64 Decodes the given base64-encoded string.
fromUTC Converts to the timestamp from UTC. You can optionally pass the timezone in the form of 'GMT', 'PST', 'UTC', 'America/Cayman'. It is defaulted to the current timezone. Refer Java's SimpleDateFormat class for available formats. https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html.
greater Comparison greater operator. Same as > operator.
greaterOrEqual Comparison greater than or equal operator. Same as >= operator.
greatest Returns the greatest value among the list of values as input skipping null values. Returns null if all inputs are null.
hasColumn Checks for a column value by name in the stream. You can pass a optional stream name as the second argument. Column names known at design time should be addressed just by their name. Computed inputs are not supported but you can use parameter substitutions.
hour Gets the hour value of a timestamp. You can pass an optional timezone in the form of 'GMT', 'PST', 'UTC', 'America/Cayman'. The local timezone is used as the default. Refer Java's SimpleDateFormat class for available formats. https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html.
hours Duration in milliseconds for number of hours.
iif Based on a condition applies one value or the other. If other is unspecified it is considered NULL. Both the values must be compatible(numeric, string...).
iifNull Checks if the first parameter is null. If not null, the first parameter is returned. If null, the second parameter is returned. If three parameters are specified, the behavior is the same as iif(isNull(value1), value2, value3) and the third parameter is returned if the first value is not null.
initCap Converts the first letter of every word to uppercase. Words are identified as separated by whitespace.
instr Finds the position(1 based) of the substring within a string. 0 is returned if not found.
isDelete Checks if the row is marked for delete. For transformations taking more than one input stream you can pass the (1-based) index of the stream. The stream index should be either 1 or 2 and the default value is 1.
isError Checks if the row is marked as error. For transformations taking more than one input stream you can pass the (1-based) index of the stream. The stream index should be either 1 or 2 and the default value is 1.
isIgnore Checks if the row is marked to be ignored. For transformations taking more than one input stream you can pass the (1-based) index of the stream. The stream index should be either 1 or 2 and the default value is 1.
isInsert Checks if the row is marked for insert. For transformations taking more than one input stream you can pass the (1-based) index of the stream. The stream index should be either 1 or 2 and the default value is 1.
isMatch Checks if the row is matched at lookup. For transformations taking more than one input stream you can pass the (1-based) index of the stream. The stream index should be either 1 or 2 and the default value is 1.
isNull Checks if the value is NULL.
isUpdate Checks if the row is marked for update. For transformations taking more than one input stream you can pass the (1-based) index of the stream. The stream index should be either 1 or 2 and the default value is 1.
isUpsert Checks if the row is marked for insert. For transformations taking more than one input stream you can pass the (1-based) index of the stream. The stream index should be either 1 or 2 and the default value is 1.
jaroWinkler Gets the JaroWinkler distance between two strings.
lastDayOfMonth Gets the last date of the month given a date.
least Comparison lesser than or equal operator. Same as <= operator.
left Extracts a substring start at index 1 with number of characters. Same as SUBSTRING(str, 1, n).
length Returns the length of the string.
lesser Comparison less operator. Same as < operator.
lesserOrEqual Comparison lesser than or equal operator. Same as <= operator.
levenshtein Gets the levenshtein distance between two strings.
like The pattern is a string that is matched literally. The exceptions are the following special symbols: _ matches any one character in the input (similar to . in posix regular expressions)
locate Finds the position(1 based) of the substring within a string starting a certain position. If the position is omitted it is considered from the beginning of the string. 0 is returned if not found.
log Calculates log value. An optional base can be supplied else a Euler number if used.
log10 Calculates log value based on 10 base.
lower Lowercases a string.
lpad Left pads the string by the supplied padding until it is of a certain length. If the string is equal to or greater than the length, then it is trimmed to the length.
ltrim Left trims a string of leading characters. If second parameter is unspecified, it trims whitespace. Else it trims any character specified in the second parameter.
md5 Calculates the MD5 digest of set of column of varying primitive datatypes and returns a 32 character hex string. It can be used to calculate a fingerprint for a row.
millisecond Gets the millisecond value of a date. You can pass an optional timezone in the form of 'GMT', 'PST', 'UTC', 'America/Cayman'. The local timezone is used as the default. Refer Java's SimpleDateFormat class for available formats. https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html.
milliseconds Duration in milliseconds for number of milliseconds.
minus Subtracts numbers. Subtract number of days from a date. Subtract duration from a timestamp. Subtract two timestamps to get difference in milliseconds. Same as the - operator.
minute Gets the minute value of a timestamp. You can pass an optional timezone in the form of 'GMT', 'PST', 'UTC', 'America/Cayman'. The local timezone is used as the default. Refer Java's SimpleDateFormat class for available formats. https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html.
minutes Duration in milliseconds for number of minutes.
mod Modulus of pair of numbers. Same as the % operator.
month Gets the month value of a date or timestamp.
monthsBetween Gets the number of months between two dates. You can round off the calculation.You can pass an optional timezone in the form of 'GMT', 'PST', 'UTC', 'America/Cayman'. The local timezone is used as the default. Refer Java's SimpleDateFormat class for available formats. https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html.
multiply Multiplies pair of numbers. Same as the * operator.
negate Negates a number. Turns positive numbers to negative and vice versa.
nextSequence Returns the next unique sequence. The number is consecutive only within a partition and is prefixed by the partitionId.
normalize Normalizes the string value to separate accented unicode characters.
not Logical negation operator.
notEquals Comparison not equals operator. Same as != operator.
notNull Checks if the value is not NULL.
null Returns a NULL value. Use the function syntax(null()) if there is a column named 'null'. Any operation that uses will result in a NULL.
or Logical OR operator. Same as ||.
pMod Positive Modulus of pair of numbers.
partitionId Returns the current partition ID the input row is in.
power Raises one number to the power of another.
radians Converts degrees to radians
random Returns a random number given an optional seed within a partition. The seed should be a fixed value and is used in conjunction with the partitionId to produce random values
regexExtract Extract a matching substring for a given regex pattern. The last parameter identifies the match group and is defaulted to 1 if omitted. Use <regex>(back quote) to match a string without escaping.
regexMatch Checks if the string matches the given regex pattern. Use <regex>(back quote) to match a string without escaping.
regexReplace Replace all occurrences of a regex pattern with another substring in the given string Use <regex>(back quote) to match a string without escaping.
regexSplit Splits a string based on a delimiter based on regex and returns an array of strings.
replace Replace all occurrences of a substring with another substring in the given string. If the last parameter is omitted, it is default to empty string.
reverse Reverses a string.
right Extracts a substring with number of characters from the right. Same as SUBSTRING(str, LENGTH(str) - n, n).
rlike Checks if the string matches the given regex pattern.
round Rounds a number given an optional scale and an optional rounding mode. If the scale is omitted, it is defaulted to 0. If the mode is omitted, it is defaulted to ROUND_HALF_UP(5). The values for rounding include
rpad Right pads the string by the supplied padding until it is of a certain length. If the string is equal to or greater than the length, then it is trimmed to the length.
rtrim Right trims a string of trailing characters. If second parameter is unspecified, it trims whitespace. Else it trims any character specified in the second parameter.
second Gets the second value of a date. You can pass an optional timezone in the form of 'GMT', 'PST', 'UTC', 'America/Cayman'. The local timezone is used as the default. Refer Java's SimpleDateFormat class for available formats. https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html.
seconds Duration in milliseconds for number of seconds.
sha1 Calculates the SHA-1 digest of set of column of varying primitive datatypes and returns a 40 character hex string. It can be used to calculate a fingerprint for a row.
sha2 Calculates the SHA-2 digest of set of column of varying primitive datatypes given a bit length which can only be of values 0(256), 224, 256, 384, 512. It can be used to calculate a fingerprint for a row.
sin Calculates a sine value.
sinh Calculates a hyperbolic sine value.
soundex Gets the soundex code for the string.
split Splits a string based on a delimiter and returns an array of strings.
sqrt Calculates the square root of a number.
startsWith Checks if the string starts with the supplied string.
subDays Subtract days from a date or timestamp. Same as the - operator for date.
subMonths Subtract months from a date or timestamp.
substring Extracts a substring of a certain length from a position. Position is 1 based. If the length is omitted, it is defaulted to end of the string.
tan Calculates a tangent value.
tanh Calculates a hyperbolic tangent value.
translate Replace one set of characters by another set of characters in the string. Characters have 1 to 1 replacement.
trim Trims a string of leading and trailing characters. If second parameter is unspecified, it trims whitespace. Else it trims any character specified in the second parameter.
true Always returns a true value. Use the function syntax(true()) if there is a column named 'true'.
typeMatch Matches the type of the column. Can only be used in pattern expressions.number matches short, integer, long, double, float or decimal, integral matches short, integer, long, fractional matches double, float, decimal and datetime matches date or timestamp type.
unescape Unescapes a string according to a format. Literal values for acceptable format are 'json', 'xml', 'ecmascript', 'html', 'java'.
upper Uppercases a string.
uuid Returns the generated UUID.
weekOfYear Gets the week of the year given a date.
weeks Duration in milliseconds for number of weeks.
xor Logical XOR operator. Same as ^ operator.
year Gets the year value of a date.

Aggregate functions

The following functions are only available in aggregate, pivot, unpivot, and window transformations.

Aggregate function Task
approxDistinctCount Gets the approximate aggregate count of distinct values for a column. The optional second parameter is to control the estimation error.
avg Gets the average of values of a column.
avgIf Based on a criteria gets the average of values of a column.
collect Collects all values of the expression in the aggregated group into an array. Structures can be collected and transformed to alternate structures during this process. The number of items will be equal to the number of rows in that group and can contain null values. The number of collected items should be small.
count Gets the aggregate count of values. If the optional column(s) is specified, it ignores NULL values in the count.
countDistinct Gets the aggregate count of distinct values of a set of columns.
countIf Based on a criteria gets the aggregate count of values. If the optional column is specified, it ignores NULL values in the count.
covariancePopulation Gets the population covariance between two columns.
covariancePopulationIf Based on a criteria, gets the population covariance of two columns.
covarianceSample Gets the sample covariance of two columns.
covarianceSampleIf Based on a criteria, gets the sample covariance of two columns.
first Gets the first value of a column group. If the second parameter ignoreNulls is omitted, it is assumed false.
isDistinct Finds if a column or set of columns is distinct. It does not count null as a distinct value
kurtosis Gets the kurtosis of a column.
kurtosisIf Based on a criteria, gets the kurtosis of a column.
last Gets the last value of a column group. If the second parameter ignoreNulls is omitted, it is assumed false.
max Gets the maximum value of a column.
maxIf Based on a criteria, gets the maximum value of a column.
mean Gets the mean of values of a column. Same as AVG.
meanIf Based on a criteria gets the mean of values of a column. Same as avgIf.
min Gets the minimum value of a column.
minIf Based on a criteria, gets the minimum value of a column.
skewness Gets the skewness of a column.
skewnessIf Based on a criteria, gets the skewness of a column.
stddev Gets the standard deviation of a column.
stddevIf Based on a criteria, gets the standard deviation of a column.
stddevPopulation Gets the population standard deviation of a column.
stddevPopulationIf Based on a criteria, gets the population standard deviation of a column.
stddevSample Gets the sample standard deviation of a column.
stddevSampleIf Based on a criteria, gets the sample standard deviation of a column.
sum Gets the aggregate sum of a numeric column.
sumDistinct Gets the aggregate sum of distinct values of a numeric column.
sumDistinctIf Based on criteria gets the aggregate sum of a numeric column. The condition can be based on any column.
sumIf Based on criteria gets the aggregate sum of a numeric column. The condition can be based on any column.
variance Gets the variance of a column.
varianceIf Based on a criteria, gets the variance of a column.
variancePopulation Gets the population variance of a column.
variancePopulationIf Based on a criteria, gets the population variance of a column.
varianceSample Gets the unbiased variance of a column.
varianceSampleIf Based on a criteria, gets the unbiased variance of a column.

Array functions

Array functions perform transformations on data structures that are arrays. These include special keywords to address array elements and indexes:

  • #acc represents a value that you wish to include in your single output when reducing an array
  • #index represents the current array index, along with array index numbers #index2, #index3 ...
  • #item represents the current element value in the array
Array function Task
array Creates an array of items. All items should be of the same type. If no items are specified, an empty string array is the default. Same as a [] creation operator.
at Finds the element at an array index. The index is 1-based. Out of bounds index results in a null value. Finds a value in a map given a key. If the key is not found it returns null.
contains Returns true if any element in the provided array evaluates as true in the provided predicate. Contains expects a reference to one element in the predicate function as #item.
distinct Returns a distinct set of items from an array.
except Returns a difference set of one array from another dropping duplicates.
filter Filters elements out of the array that do not meet the provided predicate. Filter expects a reference to one element in the predicate function as #item.
find Find the first item from an array that match the condition. It takes a filter function where you can address the item in the array as #item. For deeply nested maps you can refer to the parent maps using the #item_n(#item_1, #item_2...) notation.
flatten Flattens array or arrays into a single array. Arrays of atomic items are returned unaltered. The last argument is optional and is defaulted to false to flatten recursively more than one level deep.
in Checks if an item is in the array.
intersect Returns an intersection set of distinct items from 2 arrays.
map Maps each element of the array to a new element using the provided expression. Map expects a reference to one element in the expression function as #item.
mapIf Conditionally maps an array to another array of same or smaller length. The values can be of any datatype including structTypes. It takes a mapping function where you can address the item in the array as #item and current index as #index. For deeply nested maps you can refer to the parent maps using the #item_[n](#item_1, #index_1...) notation.
mapIndex Maps each element of the array to a new element using the provided expression. Map expects a reference to one element in the expression function as #item and a reference to the element index as #index.
mapLoop Loops through from 1 to length to create an array of that length. It takes a mapping function where you can address the index in the array as #index. For deeply nested maps you can refer to the parent maps using the #index_n(#index_1, #index_2...) notation.
reduce Accumulates elements in an array. Reduce expects a reference to an accumulator and one element in the first expression function as #acc and #item and it expects the resulting value as #result to be used in the second expression function.
size Finds the size of an array or map type
slice Extracts a subset of an array from a position. Position is 1 based. If the length is omitted, it is defaulted to end of the string.
sort Sorts the array using the provided predicate function. Sort expects a reference to two consecutive elements in the expression function as #item1 and #item2.
unfold Unfolds an array into a set of rows and repeats the values for the remaining columns in every row.
union Returns a union set of distinct items from 2 arrays.

Cached lookup functions

The following functions are only available when using a cached lookup when you've included a cached sink.

Cached lookup function Task
lookup Looks up the first row from the cached sink using the specified keys that match the keys from the cached sink.
mlookup Looks up the all matching rows from the cached sink using the specified keys that match the keys from the cached sink.
output Returns the first row of the results of the cache sink
outputs Returns the entire output row set of the results of the cache sink

Conversion functions

Conversion functions are used to convert data and test for data types

Conversion function Task
isBitSet Checks if a bit position is set in this bitset
setBitSet Sets bit positions in this bitset
isBoolean Checks if the string value is a boolean value according to the rules of toBoolean()
isByte Checks if the string value is a byte value given an optional format according to the rules of toByte()
isDate Checks if the input date string is a date using an optional input date format. Refer Java's SimpleDateFormat for available formats. If the input date format is omitted, default format is yyyy-[M]M-[d]d. Accepted formats are [ yyyy, yyyy-[M]M, yyyy-[M]M-[d]d, yyyy-[M]M-[d]dT* ]
isShort Checks of the string value is a short value given an optional format according to the rules of toShort()
isInteger Checks of the string value is a integer value given an optional format according to the rules of toInteger()
isLong Checks of the string value is a long value given an optional format according to the rules of toLong()
isNan Check if this is not a number.
isFloat Checks of the string value is a float value given an optional format according to the rules of toFloat()
isDouble Checks of the string value is a double value given an optional format according to the rules of toDouble()
isDecimal Checks of the string value is a decimal value given an optional format according to the rules of toDecimal()
isTimestamp Checks if the input date string is a timestamp using an optional input timestamp format. Refer to Java's SimpleDateFormat for available formats. If the timestamp is omitted the default pattern yyyy-[M]M-[d]d hh:mm:ss[.f...] is used. You can pass an optional timezone in the form of 'GMT', 'PST', 'UTC', 'America/Cayman'. Timestamp supports up to millisecond accuracy with value of 999 Refer to Java's SimpleDateFormat for available formats.
toBase64 Encodes the given string in base64.
toBinary Converts any numeric/date/timestamp/string to binary representation.
toBoolean Converts a value of ('t', 'true', 'y', 'yes', '1') to true and ('f', 'false', 'n', 'no', '0') to false and NULL for any other value.
toByte Converts any numeric or string to a byte value. An optional Java decimal format can be used for the conversion.
toDate Converts input date string to date using an optional input date format. Refer Java's SimpleDateFormat class for available formats. If the input date format is omitted, default format is yyyy-[M]M-[d]d. Accepted formats are :[ yyyy, yyyy-[M]M, yyyy-[M]M-[d]d, yyyy-[M]M-[d]dT* ].
toDecimal Converts any numeric or string to a decimal value. If precision and scale are not specified, it is defaulted to (10,2).An optional Java decimal format can be used for the conversion. An optional locale format in the form of BCP47 language like en-US, de, zh-CN.
toDouble Converts any numeric or string to a double value. An optional Java decimal format can be used for the conversion. An optional locale format in the form of BCP47 language like en-US, de, zh-CN.
toFloat Converts any numeric or string to a float value. An optional Java decimal format can be used for the conversion. Truncates any double.
toInteger Converts any numeric or string to an integer value. An optional Java decimal format can be used for the conversion. Truncates any long, float, double.
toLong Converts any numeric or string to a long value. An optional Java decimal format can be used for the conversion. Truncates any float, double.
toShort Converts any numeric or string to a short value. An optional Java decimal format can be used for the conversion. Truncates any integer, long, float, double.
toString Converts a primitive datatype to a string. For numbers and date a format can be specified. If unspecified the system default is picked.Java decimal format is used for numbers. Refer to Java SimpleDateFormat for all possible date formats; the default format is yyyy-MM-dd.
toTimestamp Converts a string to a timestamp given an optional timestamp format. If the timestamp is omitted the default pattern yyyy-[M]M-[d]d hh:mm:ss[.f...] is used. You can pass an optional timezone in the form of 'GMT', 'PST', 'UTC', 'America/Cayman'. Timestamp supports up to millisecond accuracy with value of 999. Refer Java's SimpleDateFormat class for available formats. https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html.
toUTC Converts the timestamp to UTC. You can pass an optional timezone in the form of 'GMT', 'PST', 'UTC', 'America/Cayman'. It is defaulted to the current timezone. Refer Java's SimpleDateFormat class for available formats. https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html.

Map functions

Map functions perform operations on map data types

Map function Task
associate Creates a map of key/values. All the keys & values should be of the same type. If no items are specified, it is defaulted to a map of string to string type.Same as a [ -> ] creation operator. Keys and values should alternate with each other.
keyValues Creates a map of key/values. The first parameter is an array of keys and second is the array of values. Both arrays should have equal length.
mapAssociation Transforms a map by associating the keys to new values. Returns an array. It takes a mapping function where you can address the item as #key and current value as #value.
reassociate Transforms a map by associating the keys to new values. It takes a mapping function where you can address the item as #key and current value as #value.

Metafunctions

Metafunctions primarily function on metadata in your data flow

Metafunction Task
byItem Find a sub item within a structure or array of structure If there are multiple matches, the first match is returned. If no match it returns a NULL value. The returned value has to be type converted by one of the type conversion actions(? date, ? string ...). Column names known at design time should be addressed just by their name. Computed inputs are not supported but you can use parameter substitutions
byOrigin Selects a column value by name in the origin stream. The second argument is the origin stream name. If there are multiple matches, the first match is returned. If no match it returns a NULL value. The returned value has to be type converted by one of the type conversion functions(TO_DATE, TO_STRING ...). Column names known at design time should be addressed just by their name. Computed inputs are not supported but you can use parameter substitutions.
byOrigins Selects an array of columns by name in the stream. The second argument is the stream where it originated from. If there are multiple matches, the first match is returned. If no match it returns a NULL value. The returned value has to be type converted by one of the type conversion functions(TO_DATE, TO_STRING ...) Column names known at design time should be addressed just by their name. Computed inputs are not supported but you can use parameter substitutions.
byName Selects a column value by name in the stream. You can pass a optional stream name as the second argument. If there are multiple matches, the first match is returned. If no match it returns a NULL value. The returned value has to be type converted by one of the type conversion functions(TO_DATE, TO_STRING ...). Column names known at design time should be addressed just by their name. Computed inputs are not supported but you can use parameter substitutions.
byNames Select an array of columns by name in the stream. You can pass a optional stream name as the second argument. If there are multiple matches, the first match is returned. If there are no matches for a column, the entire output is a NULL value. The returned value requires a type conversion functions (toDate, toString, ...). Column names known at design time should be addressed just by their name. Computed inputs are not supported but you can use parameter substitutions.
byPath Finds a hierarchical path by name in the stream. You can pass an optional stream name as the second argument. If no such path is found it returns null. Column names/paths known at design time should be addressed just by their name or dot notation path. Computed inputs are not supported but you can use parameter substitutions.
byPosition Selects a column value by its relative position(1 based) in the stream. If the position is out of bounds it returns a NULL value. The returned value has to be type converted by one of the type conversion functions(TO_DATE, TO_STRING ...) Computed inputs are not supported but you can use parameter substitutions.
hasPath Checks if a certain hierarchical path exists by name in the stream. You can pass an optional stream name as the second argument. Column names/paths known at design time should be addressed just by their name or dot notation path. Computed inputs are not supported but you can use parameter substitutions.
originColumns Gets all output columns for a origin stream where columns were created. Must be enclosed in another function.
hex Returns a hex string representation of a binary value
unhex Unhexes a binary value from its string representation. This can be used in conjunction with sha2, md5 to convert from string to binary representation

Window functions

The following functions are only available in window transformations.

Windows function Task
cumeDist The CumeDist function computes the position of a value relative to all values in the partition. The result is the number of rows preceding or equal to the current row in the ordering of the partition divided by the total number of rows in the window partition. Any tie values in the ordering will evaluate to the same position.
denseRank Computes the rank of a value in a group of values specified in a window's order by clause. The result is one plus the number of rows preceding or equal to the current row in the ordering of the partition. The values will not produce gaps in the sequence. Dense Rank works even when data is not sorted and looks for change in values.
lag Gets the value of the first parameter evaluated n rows before the current row. The second parameter is the number of rows to look back and the default value is 1. If there are not as many rows a value of null is returned unless a default value is specified.
lead Gets the value of the first parameter evaluated n rows after the current row. The second parameter is the number of rows to look forward and the default value is 1. If there are not as many rows a value of null is returned unless a default value is specified.
nTile The NTile function divides the rows for each window partition into n buckets ranging from 1 to at most n. Bucket values will differ by at most 1. If the number of rows in the partition does not divide evenly into the number of buckets, then the remainder values are distributed one per bucket, starting with the first bucket. The NTile function is useful for the calculation of tertiles, quartiles, deciles, and other common summary statistics. The function calculates two variables during initialization: The size of a regular bucket will have one extra row added to it. Both variables are based on the size of the current partition. During the calculation process the function keeps track of the current row number, the current bucket number, and the row number at which the bucket will change (bucketThreshold). When the current row number reaches bucket threshold, the bucket value is increased by one and the threshold is increased by the bucket size (plus one extra if the current bucket is padded).
rank Computes the rank of a value in a group of values specified in a window's order by clause. The result is one plus the number of rows preceding or equal to the current row in the ordering of the partition. The values will produce gaps in the sequence. Rank works even when data is not sorted and looks for change in values.
rowNumber Assigns a sequential row numbering for rows in a window starting with 1.

Alphabetical listing of all functions

Following is an alphabetical listing of all functions available in mapping data flows.

abs

abs(<value1> : number) => number

Absolute value of a number.

  • abs(-20) -> 20
  • abs(10) -> 10

acos

acos(<value1> : number) => double

Calculates a cosine inverse value.

  • acos(1) -> 0.0

add

add(<value1> : any, <value2> : any) => any

Adds a pair of strings or numbers. Adds a date to a number of days. Adds a duration to a timestamp. Appends one array of similar type to another. Same as the + operator.

  • add(10, 20) -> 30
  • 10 + 20 -> 30
  • add('ice', 'cream') -> 'icecream'
  • 'ice' + 'cream' + ' cone' -> 'icecream cone'
  • add(toDate('2012-12-12'), 3) -> toDate('2012-12-15')
  • toDate('2012-12-12') + 3 -> toDate('2012-12-15')
  • [10, 20] + [30, 40] -> [10, 20, 30, 40]
  • toTimestamp('2019-02-03 05:19:28.871', 'yyyy-MM-dd HH:mm:ss.SSS') + (days(1) + hours(2) - seconds(10)) -> toTimestamp('2019-02-04 07:19:18.871', 'yyyy-MM-dd HH:mm:ss.SSS')

addDays

addDays(<date/timestamp> : datetime, <days to add> : integral) => datetime

Add days to a date or timestamp. Same as the + operator for date.

  • addDays(toDate('2016-08-08'), 1) -> toDate('2016-08-09')

addMonths

addMonths(<date/timestamp> : datetime, <months to add> : integral, [<value3> : string]) => datetime

Add months to a date or timestamp. You can optionally pass a timezone.

  • addMonths(toDate('2016-08-31'), 1) -> toDate('2016-09-30')
  • addMonths(toTimestamp('2016-09-30 10:10:10'), -1) -> toTimestamp('2016-08-31 10:10:10')

and

and(<value1> : boolean, <value2> : boolean) => boolean

Logical AND operator. Same as &&.

  • and(true, false) -> false
  • true && false -> false

approxDistinctCount

approxDistinctCount(<value1> : any, [ <value2> : double ]) => long

Gets the approximate aggregate count of distinct values for a column. The optional second parameter is to control the estimation error.

  • approxDistinctCount(ProductID, .05) => long

array

array([<value1> : any], ...) => array

Creates an array of items. All items should be of the same type. If no items are specified, an empty string array is the default. Same as a [] creation operator.

  • array('Seattle', 'Washington')
  • ['Seattle', 'Washington']
  • ['Seattle', 'Washington'][1]
  • 'Washington'

asin

asin(<value1> : number) => double

Calculates an inverse sine value.

  • asin(0) -> 0.0

associate

reassociate(<value1> : map, <value2> : binaryFunction) => map

Creates a map of key/values. All the keys & values should be of the same type. If no items are specified, it is defaulted to a map of string to string type.Same as a [ -> ] creation operator. Keys and values should alternate with each other.

  • associate('fruit', 'apple', 'vegetable', 'carrot' )=> ['fruit' -> 'apple', 'vegetable' -> 'carrot']

at

at(<value1> : array/map, <value2> : integer/key type) => array

Finds the element at an array index. The index is 1-based. Out of bounds index results in a null value. Finds a value in a map given a key. If the key is not found it returns null.

  • at(['apples', 'pears'], 1) => 'apples'
  • at(['fruit' -> 'apples', 'vegetable' -> 'carrot'], 'fruit') => 'apples'

atan

atan(<value1> : number) => double

Calculates a inverse tangent value.

  • atan(0) -> 0.0

atan2

atan2(<value1> : number, <value2> : number) => double

Returns the angle in radians between the positive x-axis of a plane and the point given by the coordinates.

  • atan2(0, 0) -> 0.0

avg

avg(<value1> : number) => number

Gets the average of values of a column.

  • avg(sales)

avgIf

avgIf(<value1> : boolean, <value2> : number) => number

Based on a criteria gets the average of values of a column.

  • avgIf(region == 'West', sales)

between

between(<value1> : any, <value2> : any, <value3> : any) => boolean

Checks if the first value is in between two other values inclusively. Numeric, string and datetime values can be compared

  • between(10, 5, 24)
  • true
  • between(currentDate(), currentDate() + 10, currentDate() + 20)
  • false

bitwiseAnd

bitwiseAnd(<value1> : integral, <value2> : integral) => integral

Bitwise And operator across integral types. Same as & operator

  • bitwiseAnd(0xf4, 0xef)
  • 0xe4
  • (0xf4 & 0xef)
  • 0xe4

bitwiseOr

bitwiseOr(<value1> : integral, <value2> : integral) => integral

Bitwise Or operator across integral types. Same as | operator

  • bitwiseOr(0xf4, 0xef)
  • 0xff
  • (0xf4 | 0xef)
  • 0xff

bitwiseXor

bitwiseXor(<value1> : any, <value2> : any) => any

Bitwise Or operator across integral types. Same as | operator

  • bitwiseXor(0xf4, 0xef)
  • 0x1b
  • (0xf4 ^ 0xef)
  • 0x1b
  • (true ^ false)
  • true
  • (true ^ true)
  • false

blake2b

blake2b(<value1> : integer, <value2> : any, ...) => string

Calculates the Blake2 digest of set of column of varying primitive datatypes given a bit length which can only be multiples of 8 between 8 & 512. It can be used to calculate a fingerprint for a row

  • blake2b(256, 'gunchus', 8.2, 'bojjus', true, toDate('2010-4-4'))
  • 'c9521a5080d8da30dffb430c50ce253c345cc4c4effc315dab2162dac974711d'

blake2bBinary

blake2bBinary(<value1> : integer, <value2> : any, ...) => binary

Calculates the Blake2 digest of set of column of varying primitive datatypes given a bit length which can only be multiples of 8 between 8 & 512. It can be used to calculate a fingerprint for a row

  • blake2bBinary(256, 'gunchus', 8.2, 'bojjus', true, toDate('2010-4-4'))
  • unHex('c9521a5080d8da30dffb430c50ce253c345cc4c4effc315dab2162dac974711d')

byItem

byItem(<parent column> : any, <column name> : string) => any

Find a sub item within a structure or array of structure If there are multiple matches, the first match is returned. If no match it returns a NULL value. The returned value has to be type converted by one of the type conversion actions(? date, ? string ...). Column names known at design time should be addressed just by their name. Computed inputs are not supported but you can use parameter substitutions

  • byItem( byName('customer'), 'orderItems') ? (itemName as string, itemQty as integer)
  • byItem( byItem( byName('customer'), 'orderItems'), 'itemName') ? string

byName

byName(<column name> : string, [<stream name> : string]) => any

Selects a column value by name in the stream. You can pass a optional stream name as the second argument. If there are multiple matches, the first match is returned. If no match it returns a NULL value. The returned value has to be type converted by one of the type conversion functions(TO_DATE, TO_STRING ...). Column names known at design time should be addressed just by their name. Computed inputs are not supported but you can use parameter substitutions.

  • toString(byName('parent'))
  • toLong(byName('income'))
  • toBoolean(byName('foster'))
  • toLong(byName($debtCol))
  • toString(byName('Bogus Column'))
  • toString(byName('Bogus Column', 'DeriveStream'))

byNames

byNames(<column names> : array, [<stream name> : string]) => any

Select an array of columns by name in the stream. You can pass a optional stream name as the second argument. If there are multiple matches, the first match is returned. If there are no matches for a column, the entire output is a NULL value. The returned value requires a type conversion functions (toDate, toString, ...). Column names known at design time should be addressed just by their name. Computed inputs are not supported but you can use parameter substitutions.

  • toString(byNames(['parent', 'child']))
  • byNames(['parent']) ? string
  • toLong(byNames(['income']))
  • byNames(['income']) ? long
  • toBoolean(byNames(['foster']))
  • toLong(byNames($debtCols))
  • toString(byNames(['a Column']))
  • toString(byNames(['a Column'], 'DeriveStream'))
  • byNames(['orderItem']) ? (itemName as string, itemQty as integer)

byOrigin

byOrigin(<column name> : string, [<origin stream name> : string]) => any

Selects a column value by name in the origin stream. The second argument is the origin stream name. If there are multiple matches, the first match is returned. If no match it returns a NULL value. The returned value has to be type converted by one of the type conversion functions(TO_DATE, TO_STRING ...). Column names known at design time should be addressed just by their name. Computed inputs are not supported but you can use parameter substitutions.

  • toString(byOrigin('ancestor', 'ancestorStream'))

byOrigins

byOrigins(<column names> : array, [<origin stream name> : string]) => any

Selects an array of columns by name in the stream. The second argument is the stream where it originated from. If there are multiple matches, the first match is returned. If no match it returns a NULL value. The returned value has to be type converted by one of the type conversion functions(TO_DATE, TO_STRING ...) Column names known at design time should be addressed just by their name. Computed inputs are not supported but you can use parameter substitutions.

  • toString(byOrigins(['ancestor1', 'ancestor2'], 'ancestorStream'))

byPath

byPath(<value1> : string, [<streamName> : string]) => any

Finds a hierarchical path by name in the stream. You can pass an optional stream name as the second argument. If no such path is found it returns null. Column names/paths known at design time should be addressed just by their name or dot notation path. Computed inputs are not supported but you can use parameter substitutions.

  • byPath('grandpa.parent.child') => column

byPosition

byPosition(<position> : integer) => any

Selects a column value by its relative position(1 based) in the stream. If the position is out of bounds it returns a NULL value. The returned value has to be type converted by one of the type conversion functions(TO_DATE, TO_STRING ...) Computed inputs are not supported but you can use parameter substitutions.

  • toString(byPosition(1))
  • toDecimal(byPosition(2), 10, 2)
  • toBoolean(byName(4))
  • toString(byName($colName))
  • toString(byPosition(1234))

case

case(<condition> : boolean, <true_expression> : any, <false_expression> : any, ...) => any

Based on alternating conditions applies one value or the other. If the number of inputs are even, the other is defaulted to NULL for last condition.

  • case(10 + 20 == 30, 'dumbo', 'gumbo') -> 'dumbo'
  • case(10 + 20 == 25, 'bojjus', 'do' < 'go', 'gunchus') -> 'gunchus'
  • isNull(case(10 + 20 == 25, 'bojjus', 'do' > 'go', 'gunchus')) -> true
  • case(10 + 20 == 25, 'bojjus', 'do' > 'go', 'gunchus', 'dumbo') -> 'dumbo'

cbrt

cbrt(<value1> : number) => double

Calculates the cube root of a number.

  • cbrt(8) -> 2.0

ceil

ceil(<value1> : number) => number

Returns the smallest integer not smaller than the number.

  • ceil(-0.1) -> 0

coalesce

coalesce(<value1> : any, ...) => any

Returns the first not null value from a set of inputs. All inputs should be of the same type.

  • coalesce(10, 20) -> 10
  • coalesce(toString(null), toString(null), 'dumbo', 'bo', 'go') -> 'dumbo'

collect

collect(<value1> : any) => array

Collects all values of the expression in the aggregated group into an array. Structures can be collected and transformed to alternate structures during this process. The number of items will be equal to the number of rows in that group and can contain null values. The number of collected items should be small.

  • collect(salesPerson)
  • collect(firstName + lastName))
  • collect(@(name = salesPerson, sales = salesAmount) )

columnNames

columnNames(<value1> : string) => array

Gets the names of all output columns for a stream. You can pass an optional stream name as the second argument.

  • columnNames()
  • columnNames('DeriveStream')

columns

columns([<stream name> : string]) => any

Gets the values of all output columns for a stream. You can pass an optional stream name as the second argument.

  • columns()
  • columns('DeriveStream')

compare

compare(<value1> : any, <value2> : any) => integer

Compares two values of the same type. Returns negative integer if value1 < value2, 0 if value1 == value2, positive value if value1 > value2.

  • (compare(12, 24) < 1) -> true
  • (compare('dumbo', 'dum') > 0) -> true

concat

concat(<this> : string, <that> : string, ...) => string

Concatenates a variable number of strings together. Same as the + operator with strings.

  • concat('dataflow', 'is', 'awesome') -> 'dataflowisawesome'
  • 'dataflow' + 'is' + 'awesome' -> 'dataflowisawesome'
  • isNull('sql' + null) -> true

concatWS

concatWS(<separator> : string, <this> : string, <that> : string, ...) => string

Concatenates a variable number of strings together with a separator. The first parameter is the separator.

  • concatWS(' ', 'dataflow', 'is', 'awesome') -> 'dataflow is awesome'
  • isNull(concatWS(null, 'dataflow', 'is', 'awesome')) -> true
  • concatWS(' is ', 'dataflow', 'awesome') -> 'dataflow is awesome'

contains

contains(<value1> : array, <value2> : unaryfunction) => boolean

Returns true if any element in the provided array evaluates as true in the provided predicate. Contains expects a reference to one element in the predicate function as #item.

  • contains([1, 2, 3, 4], #item == 3) -> true
  • contains([1, 2, 3, 4], #item > 5) -> false

cos

cos(<value1> : number) => double

Calculates a cosine value.

  • cos(10) -> -0.8390715290764524

cosh

cosh(<value1> : number) => double

Calculates a hyperbolic cosine of a value.

  • cosh(0) -> 1.0

count

count([<value1> : any]) => long

Gets the aggregate count of values. If the optional column(s) is specified, it ignores NULL values in the count.

  • count(custId)
  • count(custId, custName)
  • count()
  • count(iif(isNull(custId), 1, NULL))

countDistinct

countDistinct(<value1> : any, [<value2> : any], ...) => long

Gets the aggregate count of distinct values of a set of columns.

  • countDistinct(custId, custName)

countIf

countIf(<value1> : boolean, [<value2> : any]) => long

Based on a criteria gets the aggregate count of values. If the optional column is specified, it ignores NULL values in the count.

  • countIf(state == 'CA' && commission < 10000, name)

covariancePopulation

covariancePopulation(<value1> : number, <value2> : number) => double

Gets the population covariance between two columns.

  • covariancePopulation(sales, profit)

covariancePopulationIf

covariancePopulationIf(<value1> : boolean, <value2> : number, <value3> : number) => double

Based on a criteria, gets the population covariance of two columns.

  • covariancePopulationIf(region == 'West', sales)

covarianceSample

covarianceSample(<value1> : number, <value2> : number) => double

Gets the sample covariance of two columns.

  • covarianceSample(sales, profit)

covarianceSampleIf

covarianceSampleIf(<value1> : boolean, <value2> : number, <value3> : number) => double

Based on a criteria, gets the sample covariance of two columns.

  • covarianceSampleIf(region == 'West', sales, profit)

crc32

crc32(<value1> : any, ...) => long

Calculates the CRC32 hash of set of column of varying primitive datatypes given a bit length which can only be of values 0(256), 224, 256, 384, 512. It can be used to calculate a fingerprint for a row.

  • crc32(256, 'gunchus', 8.2, 'bojjus', true, toDate('2010-4-4')) -> 3630253689L

cumeDist

cumeDist() => integer

The CumeDist function computes the position of a value relative to all values in the partition. The result is the number of rows preceding or equal to the current row in the ordering of the partition divided by the total number of rows in the window partition. Any tie values in the ordering will evaluate to the same position.

  • cumeDist()

currentDate

currentDate([<value1> : string]) => date

Gets the current date when this job starts to run. You can pass an optional timezone in the form of 'GMT', 'PST', 'UTC', 'America/Cayman'. The local timezone is used as the default. Refer Java's SimpleDateFormat class for available formats. https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html.

  • currentDate() == toDate('2250-12-31') -> false
  • currentDate('PST') == toDate('2250-12-31') -> false
  • currentDate('America/New_York') == toDate('2250-12-31') -> false

currentTimestamp

currentTimestamp() => timestamp

Gets the current timestamp when the job starts to run with local time zone.

  • currentTimestamp() == toTimestamp('2250-12-31 12:12:12') -> false

currentUTC

currentUTC([<value1> : string]) => timestamp

Gets the current timestamp as UTC. If you want your current time to be interpreted in a different timezone than your cluster time zone, you can pass an optional timezone in the form of 'GMT', 'PST', 'UTC', 'America/Cayman'. It is defaulted to the current timezone. Refer Java's SimpleDateFormat class for available formats. https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html. To convert the UTC time to a different timezone use fromUTC().

  • currentUTC() == toTimestamp('2050-12-12 19:18:12') -> false
  • currentUTC() != toTimestamp('2050-12-12 19:18:12') -> true
  • fromUTC(currentUTC(), 'Asia/Seoul') != toTimestamp('2050-12-12 19:18:12') -> true

dayOfMonth

dayOfMonth(<value1> : datetime) => integer

Gets the day of the month given a date.

  • dayOfMonth(toDate('2018-06-08')) -> 8

dayOfWeek

dayOfWeek(<value1> : datetime) => integer

Gets the day of the week given a date. 1 - Sunday, 2 - Monday ..., 7 - Saturday.

  • dayOfWeek(toDate('2018-06-08')) -> 6

dayOfYear

dayOfYear(<value1> : datetime) => integer

Gets the day of the year given a date.

  • dayOfYear(toDate('2016-04-09')) -> 100

days

days(<value1> : integer) => long

Duration in milliseconds for number of days.

  • days(2) -> 172800000L

degrees

degrees(<value1> : number) => double

Converts radians to degrees.

  • degrees(3.141592653589793) -> 180

denseRank

denseRank() => integer

Computes the rank of a value in a group of values specified in a window's order by clause. The result is one plus the number of rows preceding or equal to the current row in the ordering of the partition. The values will not produce gaps in the sequence. Dense Rank works even when data is not sorted and looks for change in values.

  • denseRank()

distinct

distinct(<value1> : array) => array

Returns a distinct set of items from an array.

  • distinct([10, 20, 30, 10]) => [10, 20, 30]

divide

divide(<value1> : any, <value2> : any) => any

Divides pair of numbers. Same as the / operator.

  • divide(20, 10) -> 2
  • 20 / 10 -> 2

dropLeft

dropLeft(<value1> : string, <value2> : integer) => string

Removes as many characters from the left of the string. If the drop requested exceeds the length of the string, an empty string is returned.

  • dropLeft('bojjus', 2) => 'jjus'
  • dropLeft('cake', 10) => ''

dropRight

dropRight(<value1> : string, <value2> : integer) => string

Removes as many characters from the right of the string. If the drop requested exceeds the length of the string, an empty string is returned.

  • dropRight('bojjus', 2) => 'bojj'
  • dropRight('cake', 10) => ''

endsWith

endsWith(<string> : string, <substring to check> : string) => boolean

Checks if the string ends with the supplied string.

  • endsWith('dumbo', 'mbo') -> true

equals

equals(<value1> : any, <value2> : any) => boolean

Comparison equals operator. Same as == operator.

  • equals(12, 24) -> false
  • 12 == 24 -> false
  • 'bad' == 'bad' -> true
  • isNull('good' == toString(null)) -> true
  • isNull(null == null) -> true

equalsIgnoreCase

equalsIgnoreCase(<value1> : string, <value2> : string) => boolean

Comparison equals operator ignoring case. Same as <=> operator.

  • 'abc'<=>'Abc' -> true
  • equalsIgnoreCase('abc', 'Abc') -> true

escape

escape(<string_to_escape> : string, <format> : string) => string

Escapes a string according to a format. Literal values for acceptable format are 'json', 'xml', 'ecmascript', 'html', 'java'.


except

except(<value1> : array, <value2> : array) => array

Returns a difference set of one array from another dropping duplicates.

  • except([10, 20, 30], [20, 40]) => [10, 30]

expr

expr(<expr> : string) => any

Results in a expression from a string. This is the same as writing this expression in a non-literal form. This can be used to pass parameters as string representations.

  • expr('price * discount') => any

factorial

factorial(<value1> : number) => long

Calculates the factorial of a number.

  • factorial(5) -> 120

false

false() => boolean

Always returns a false value. Use the function syntax(false()) if there is a column named 'false'.

  • (10 + 20 > 30) -> false
  • (10 + 20 > 30) -> false()

filter

filter(<value1> : array, <value2> : unaryfunction) => array

Filters elements out of the array that do not meet the provided predicate. Filter expects a reference to one element in the predicate function as #item.

  • filter([1, 2, 3, 4], #item > 2) -> [3, 4]
  • filter(['a', 'b', 'c', 'd'], #item == 'a' || #item == 'b') -> ['a', 'b']

find

find(<value1> : array, <value2> : unaryfunction) => any

Find the first item from an array that match the condition. It takes a filter function where you can address the item in the array as #item. For deeply nested maps you can refer to the parent maps using the #item_n(#item_1, #item_2...) notation.

  • find([10, 20, 30], #item > 10) -> 20
  • find(['azure', 'data', 'factory'], length(#item) > 4) -> 'azure'
  • find([ @( name = 'Daniel', types = [ @(mood = 'jovial', behavior = 'terrific'), @(mood = 'grumpy', behavior = 'bad') ] ), @( name = 'Mark', types = [ @(mood = 'happy', behavior = 'awesome'), @(mood = 'calm', behavior = 'reclusive') ] ) ], contains(#item.types, #item.mood=='happy') /*Filter out the happy kid*/ )
  • @( name = 'Mark', types = [ @(mood = 'happy', behavior = 'awesome'), @(mood = 'calm', behavior = 'reclusive') ] )

first

first(<value1> : any, [<value2> : boolean]) => any

Gets the first value of a column group. If the second parameter ignoreNulls is omitted, it is assumed false.

  • first(sales)
  • first(sales, false)

flatten

flatten(<array> : array, <value2> : array ..., <value2> : boolean) => array

Flattens array or arrays into a single array. Arrays of atomic items are returned unaltered. The last argument is optional and is defaulted to false to flatten recursively more than one level deep.

  • flatten([['bojjus', 'girl'], ['gunchus', 'boy']]) => ['bojjus', 'girl', 'gunchus', 'boy']
  • flatten([[['bojjus', 'gunchus']]] , true) => ['bojjus', 'gunchus']

floor

floor(<value1> : number) => number

Returns the largest integer not greater than the number.

  • floor(-0.1) -> -1

fromBase64

fromBase64(<value1> : string) => string

Decodes the given base64-encoded string.

  • fromBase64('Z3VuY2h1cw==') -> 'gunchus'

fromUTC

fromUTC(<value1> : timestamp, [<value2> : string]) => timestamp

Converts to the timestamp from UTC. You can optionally pass the timezone in the form of 'GMT', 'PST', 'UTC', 'America/Cayman'. It is defaulted to the current timezone. Refer Java's SimpleDateFormat class for available formats. https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html.

  • fromUTC(currentTimestamp()) == toTimestamp('2050-12-12 19:18:12') -> false
  • fromUTC(currentTimestamp(), 'Asia/Seoul') != toTimestamp('2050-12-12 19:18:12') -> true

greater

greater(<value1> : any, <value2> : any) => boolean

Comparison greater operator. Same as > operator.

  • greater(12, 24) -> false
  • ('dumbo' > 'dum') -> true
  • (toTimestamp('2019-02-05 08:21:34.890', 'yyyy-MM-dd HH:mm:ss.SSS') > toTimestamp('2019-02-03 05:19:28.871', 'yyyy-MM-dd HH:mm:ss.SSS')) -> true

greaterOrEqual

greaterOrEqual(<value1> : any, <value2> : any) => boolean

Comparison greater than or equal operator. Same as >= operator.

  • greaterOrEqual(12, 12) -> true
  • ('dumbo' >= 'dum') -> true

greatest

greatest(<value1> : any, ...) => any

Returns the greatest value among the list of values as input skipping null values. Returns null if all inputs are null.

  • greatest(10, 30, 15, 20) -> 30
  • greatest(10, toInteger(null), 20) -> 20
  • greatest(toDate('2010-12-12'), toDate('2011-12-12'), toDate('2000-12-12')) -> toDate('2011-12-12')
  • greatest(toTimestamp('2019-02-03 05:19:28.871', 'yyyy-MM-dd HH:mm:ss.SSS'), toTimestamp('2019-02-05 08:21:34.890', 'yyyy-MM-dd HH:mm:ss.SSS')) -> toTimestamp('2019-02-05 08:21:34.890', 'yyyy-MM-dd HH:mm:ss.SSS')

hasColumn

hasColumn(<column name> : string, [<stream name> : string]) => boolean

Checks for a column value by name in the stream. You can pass a optional stream name as the second argument. Column names known at design time should be addressed just by their name. Computed inputs are not supported but you can use parameter substitutions.

  • hasColumn('parent')

hasPath

hasPath(<value1> : string, [<streamName> : string]) => boolean

Checks if a certain hierarchical path exists by name in the stream. You can pass an optional stream name as the second argument. Column names/paths known at design time should be addressed just by their name or dot notation path. Computed inputs are not supported but you can use parameter substitutions.

  • hasPath('grandpa.parent.child') => boolean

hex

hex(<value1>: binary) => string

Returns a hex string representation of a binary value

  • hex(toBinary([toByte(0x1f), toByte(0xad), toByte(0xbe)])) -> '1fadbe'

hour

hour(<value1> : timestamp, [<value2> : string]) => integer

Gets the hour value of a timestamp. You can pass an optional timezone in the form of 'GMT', 'PST', 'UTC', 'America/Cayman'. The local timezone is used as the default. Refer Java's SimpleDateFormat class for available formats. https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html.

  • hour(toTimestamp('2009-07-30 12:58:59')) -> 12
  • hour(toTimestamp('2009-07-30 12:58:59'), 'PST') -> 12

hours

hours(<value1> : integer) => long

Duration in milliseconds for number of hours.

  • hours(2) -> 7200000L

iif

iif(<condition> : boolean, <true_expression> : any, [<false_expression> : any]) => any

Based on a condition applies one value or the other. If other is unspecified it is considered NULL. Both the values must be compatible(numeric, string...).

  • iif(10 + 20 == 30, 'dumbo', 'gumbo') -> 'dumbo'
  • iif(10 > 30, 'dumbo', 'gumbo') -> 'gumbo'
  • iif(month(toDate('2018-12-01')) == 12, 345.12, 102.67) -> 345.12

iifNull

iifNull(<value1> : any, [<value2> : any], ...) => any

Checks if the first parameter is null. If not null, the first parameter is returned. If null, the second parameter is returned. If three parameters are specified, the behavior is the same as iif(isNull(value1), value2, value3) and the third parameter is returned if the first value is not null.

  • iifNull(10, 20) -> 10
  • iifNull(null, 20, 40) -> 20
  • iifNull('azure', 'data', 'factory') -> 'factory'
  • iifNull(null, 'data', 'factory') -> 'data'

in

in(<array of items> : array, <item to find> : any) => boolean

Checks if an item is in the array.

  • in([10, 20, 30], 10) -> true
  • in(['good', 'kid'], 'bad') -> false

initCap

initCap(<value1> : string) => string

Converts the first letter of every word to uppercase. Words are identified as separated by whitespace.

  • initCap('cool iceCREAM') -> 'Cool Icecream'

instr

instr(<string> : string, <substring to find> : string) => integer

Finds the position(1 based) of the substring within a string. 0 is returned if not found.

  • instr('dumbo', 'mbo') -> 3
  • instr('microsoft', 'o') -> 5
  • instr('good', 'bad') -> 0

intersect

intersect(<value1> : array, <value2> : array) => array

Returns an intersection set of distinct items from 2 arrays.

  • intersect([10, 20, 30], [20, 40]) => [20]

isBitSet

isBitSet (<value1> : array, <value2>:integer ) => boolean

Checks if a bit position is set in this bitset

  • isBitSet(toBitSet([10, 32, 98]), 10) => true

isBoolean

isBoolean(<value1>: string) => boolean

Checks if the string value is a boolean value according to the rules of toBoolean()

  • isBoolean('true') -> true
  • isBoolean('no') -> true
  • isBoolean('microsoft') -> false

isByte

isByte(<value1> : string) => boolean

Checks if the string value is a byte value given an optional format according to the rules of toByte()

  • isByte('123') -> true
  • isByte('chocolate') -> false

isDate

isDate (<value1> : string, [<format>: string]) => boolean

Checks if the input date string is a date using an optional input date format. Refer Java's SimpleDateFormat for available formats. If the input date format is omitted, default format is yyyy-[M]M-[d]d. Accepted formats are [ yyyy, yyyy-[M]M, yyyy-[M]M-[d]d, yyyy-[M]M-[d]dT* ]

  • isDate('2012-8-18') -> true
  • isDate('12/18--234234' -> 'MM/dd/yyyy') -> false

isDecimal

isDecimal (<value1> : string) => boolean

Checks of the string value is a decimal value given an optional format according to the rules of toDecimal()

  • isDecimal('123.45') -> true
  • isDecimal('12/12/2000') -> false

isDelete

isDelete([<value1> : integer]) => boolean

Checks if the row is marked for delete. For transformations taking more than one input stream you can pass the (1-based) index of the stream. The stream index should be either 1 or 2 and the default value is 1.

  • isDelete()
  • isDelete(1)

isDistinct

isDistinct(<value1> : any , <value1> : any) => boolean

Finds if a column or set of columns is distinct. It does not count null as a distinct value

  • isDistinct(custId, custName) => boolean

isDouble

isDouble (<value1> : string, [<format>: string]) => boolean

Checks of the string value is a double value given an optional format according to the rules of toDouble()

  • isDouble('123') -> true
  • isDouble('$123.45' -> '$###.00') -> true
  • isDouble('icecream') -> false

isError

isError([<value1> : integer]) => boolean

Checks if the row is marked as error. For transformations taking more than one input stream you can pass the (1-based) index of the stream. The stream index should be either 1 or 2 and the default value is 1.

  • isError()
  • isError(1)

isFloat

isFloat (<value1> : string, [<format>: string]) => boolean

Checks of the string value is a float value given an optional format according to the rules of toFloat()

  • isFloat('123') -> true
  • isFloat('$123.45' -> '$###.00') -> true
  • isFloat('icecream') -> false

isIgnore

isIgnore([<value1> : integer]) => boolean

Checks if the row is marked to be ignored. For transformations taking more than one input stream you can pass the (1-based) index of the stream. The stream index should be either 1 or 2 and the default value is 1.

  • isIgnore()
  • isIgnore(1)

isInsert

isInsert([<value1> : integer]) => boolean

Checks if the row is marked for insert. For transformations taking more than one input stream you can pass the (1-based) index of the stream. The stream index should be either 1 or 2 and the default value is 1.

  • isInsert()
  • isInsert(1)

isInteger

isInteger (<value1> : string, [<format>: string]) => boolean

Checks of the string value is a integer value given an optional format according to the rules of toInteger()

  • isInteger('123') -> true
  • isInteger('$123' -> '$###') -> true
  • isInteger('microsoft') -> false

isLong

isLong (<value1> : string, [<format>: string]) => boolean

Checks of the string value is a long value given an optional format according to the rules of toLong()

  • isLong('123') -> true
  • isLong('$123' -> '$###') -> true
  • isLong('gunchus') -> false

isMatch

isMatch([<value1> : integer]) => boolean

Checks if the row is matched at lookup. For transformations taking more than one input stream you can pass the (1-based) index of the stream. The stream index should be either 1 or 2 and the default value is 1.

  • isMatch()
  • isMatch(1)

isNan

isNan (<value1> : integral) => boolean

Check if this is not a number.

  • isNan(10.2) => false

isNull

isNull(<value1> : any) => boolean

Checks if the value is NULL.

  • isNull(NULL()) -> true
  • isNull('') -> false

isShort

isShort (<value1> : string, [<format>: string]) => boolean

Checks of the string value is a short value given an optional format according to the rules of toShort()

  • isShort('123') -> true
  • isShort('$123' -> '$###') -> true
  • isShort('microsoft') -> false

isTimestamp

isTimestamp (<value1> : string, [<format>: string]) => boolean

Checks if the input date string is a timestamp using an optional input timestamp format. Refer to Java's SimpleDateFormat for available formats. If the timestamp is omitted the default pattern yyyy-[M]M-[d]d hh:mm:ss[.f...] is used. You can pass an optional timezone in the form of 'GMT', 'PST', 'UTC', 'America/Cayman'. Timestamp supports up to millisecond accuracy with value of 999 Refer to Java's SimpleDateFormat for available formats.

  • isTimestamp('2016-12-31 00:12:00') -> true
  • isTimestamp('2016-12-31T00:12:00' -> 'yyyy-MM-dd\\'T\\'HH:mm:ss' -> 'PST') -> true
  • isTimestamp('2012-8222.18') -> false

isUpdate

isUpdate([<value1> : integer]) => boolean

Checks if the row is marked for update. For transformations taking more than one input stream you can pass the (1-based) index of the stream. The stream index should be either 1 or 2 and the default value is 1.

  • isUpdate()
  • isUpdate(1)

isUpsert

isUpsert([<value1> : integer]) => boolean

Checks if the row is marked for insert. For transformations taking more than one input stream you can pass the (1-based) index of the stream. The stream index should be either 1 or 2 and the default value is 1.

  • isUpsert()
  • isUpsert(1)

jaroWinkler

jaroWinkler(<value1> : string, <value2> : string) => double

Gets the JaroWinkler distance between two strings.

  • jaroWinkler('frog', 'frog') => 1.0

keyValues

keyValues(<value1> : array, <value2> : array) => map

Creates a map of key/values. The first parameter is an array of keys and second is the array of values. Both arrays should have equal length.

  • keyValues(['bojjus', 'appa'], ['gunchus', 'ammi']) => ['bojjus' -> 'gunchus', 'appa' -> 'ammi']

kurtosis

kurtosis(<value1> : number) => double

Gets the kurtosis of a column.

  • kurtosis(sales)

kurtosisIf

kurtosisIf(<value1> : boolean, <value2> : number) => double

Based on a criteria, gets the kurtosis of a column.

  • kurtosisIf(region == 'West', sales)

lag

lag(<value> : any, [<number of rows to look before> : number], [<default value> : any]) => any

Gets the value of the first parameter evaluated n rows before the current row. The second parameter is the number of rows to look back and the default value is 1. If there are not as many rows a value of null is returned unless a default value is specified.

  • lag(amount, 2)
  • lag(amount, 2000, 100)

last

last(<value1> : any, [<value2> : boolean]) => any

Gets the last value of a column group. If the second parameter ignoreNulls is omitted, it is assumed false.

  • last(sales)
  • last(sales, false)

lastDayOfMonth

lastDayOfMonth(<value1> : datetime) => date

Gets the last date of the month given a date.

  • lastDayOfMonth(toDate('2009-01-12')) -> toDate('2009-01-31')

lead

lead(<value> : any, [<number of rows to look after> : number], [<default value> : any]) => any

Gets the value of the first parameter evaluated n rows after the current row. The second parameter is the number of rows to look forward and the default value is 1. If there are not as many rows a value of null is returned unless a default value is specified.

  • lead(amount, 2)
  • lead(amount, 2000, 100)

least

least(<value1> : any, ...) => any

Comparison lesser than or equal operator. Same as <= operator.

  • least(10, 30, 15, 20) -> 10
  • least(toDate('2010-12-12'), toDate('2011-12-12'), toDate('2000-12-12')) -> toDate('2000-12-12')

left

left(<string to subset> : string, <number of characters> : integral) => string

Extracts a substring start at index 1 with number of characters. Same as SUBSTRING(str, 1, n).

  • left('bojjus', 2) -> 'bo'
  • left('bojjus', 20) -> 'bojjus'

length

length(<value1> : string) => integer

Returns the length of the string.

  • length('dumbo') -> 5

lesser

lesser(<value1> : any, <value2> : any) => boolean

Comparison less operator. Same as < operator.

  • lesser(12, 24) -> true
  • ('abcd' < 'abc') -> false
  • (toTimestamp('2019-02-03 05:19:28.871', 'yyyy-MM-dd HH:mm:ss.SSS') < toTimestamp('2019-02-05 08:21:34.890', 'yyyy-MM-dd HH:mm:ss.SSS')) -> true

lesserOrEqual

lesserOrEqual(<value1> : any, <value2> : any) => boolean

Comparison lesser than or equal operator. Same as <= operator.

  • lesserOrEqual(12, 12) -> true
  • ('dumbo' <= 'dum') -> false

levenshtein

levenshtein(<from string> : string, <to string> : string) => integer

Gets the levenshtein distance between two strings.

  • levenshtein('boys', 'girls') -> 4

like

like(<string> : string, <pattern match> : string) => boolean

The pattern is a string that is matched literally. The exceptions are the following special symbols: _ matches any one character in the input (similar to . in posix regular expressions) % matches zero or more characters in the input (similar to .* in posix regular expressions). The escape character is ''. If an escape character precedes a special symbol or another escape character, the following character is matched literally. It is invalid to escape any other character.

  • like('icecream', 'ice%') -> true

locate

locate(<substring to find> : string, <string> : string, [<from index - 1-based> : integral]) => integer

Finds the position(1 based) of the substring within a string starting a certain position. If the position is omitted it is considered from the beginning of the string. 0 is returned if not found.

  • locate('mbo', 'dumbo') -> 3
  • locate('o', 'microsoft', 6) -> 7
  • locate('bad', 'good') -> 0

log

log(<value1> : number, [<value2> : number]) => double

Calculates log value. An optional base can be supplied else a Euler number if used.

  • log(100, 10) -> 2

log10

log10(<value1> : number) => double

Calculates log value based on 10 base.

  • log10(100) -> 2

lookup

lookup(key, key2, ...) => complex[]

Looks up the first row from the cached sink using the specified keys that match the keys from the cached sink.

  • cacheSink#lookup(movieId)

lower

lower(<value1> : string) => string

Lowercases a string.

  • lower('GunChus') -> 'gunchus'

lpad

lpad(<string to pad> : string, <final padded length> : integral, <padding> : string) => string

Left pads the string by the supplied padding until it is of a certain length. If the string is equal to or greater than the length, then it is trimmed to the length.

  • lpad('dumbo', 10, '-') -> '-----dumbo'
  • lpad('dumbo', 4, '-') -> 'dumb'
  • lpad('dumbo', 8, '<>') -> '<><dumbo'

ltrim

ltrim(<string to trim> : string, [<trim characters> : string]) => string

Left trims a string of leading characters. If second parameter is unspecified, it trims whitespace. Else it trims any character specified in the second parameter.

  • ltrim(' dumbo ') -> 'dumbo '
  • ltrim('!--!du!mbo!', '-!') -> 'du!mbo!'

map

map(<value1> : array, <value2> : unaryfunction) => any

Maps each element of the array to a new element using the provided expression. Map expects a reference to one element in the expression function as #item.

  • map([1, 2, 3, 4], #item + 2) -> [3, 4, 5, 6]
  • map(['a', 'b', 'c', 'd'], #item + '_processed') -> ['a_processed', 'b_processed', 'c_processed', 'd_processed']

mapAssociation

mapAssociation(<value1> : map, <value2> : binaryFunction) => array

Transforms a map by associating the keys to new values. Returns an array. It takes a mapping function where you can address the item as #key and current value as #value.

  • mapAssociation(['bojjus' -> 'gunchus', 'appa' -> 'ammi'], @(key = #key, value = #value)) => [@(key = 'bojjus', value = 'gunchus'), @(key = 'appa', value = 'ammi')]

mapIf

mapIf (<value1> : array, <value2> : binaryfunction, <value3>: binaryFunction) => any

Conditionally maps an array to another array of same or smaller length. The values can be of any datatype including structTypes. It takes a mapping function where you can address the item in the array as #item and current index as #index. For deeply nested maps you can refer to the parent maps using the #item_[n](#item_1, #index_1...) notation.

  • mapIf([10, 20, 30], #item > 10, #item + 5) -> [25, 35]
  • mapIf(['icecream', 'cake', 'soda'], length(#item) > 4, upper(#item)) -> ['ICECREAM', 'CAKE']

mapIndex

mapIndex(<value1> : array, <value2> : binaryfunction) => any

Maps each element of the array to a new element using the provided expression. Map expects a reference to one element in the expression function as #item and a reference to the element index as #index.

  • mapIndex([1, 2, 3, 4], #item + 2 + #index) -> [4, 6, 8, 10]

mapLoop

mapLoop(<value1> : integer, <value2> : unaryfunction) => any

Loops through from 1 to length to create an array of that length. It takes a mapping function where you can address the index in the array as #index. For deeply nested maps you can refer to the parent maps using the #index_n(#index_1, #index_2...) notation.

  • mapLoop(3, #index * 10) -> [10, 20, 30]

max

max(<value1> : any) => any

Gets the maximum value of a column.

  • max(sales)

maxIf

maxIf(<value1> : boolean, <value2> : any) => any

Based on a criteria, gets the maximum value of a column.

  • maxIf(region == 'West', sales)

md5

md5(<value1> : any, ...) => string

Calculates the MD5 digest of set of column of varying primitive datatypes and returns a 32 character hex string. It can be used to calculate a fingerprint for a row.

  • md5(5, 'gunchus', 8.2, 'bojjus', true, toDate('2010-4-4')) -> '4ce8a880bd621a1ffad0bca905e1bc5a'

mean

mean(<value1> : number) => number

Gets the mean of values of a column. Same as AVG.

  • mean(sales)

meanIf

meanIf(<value1> : boolean, <value2> : number) => number

Based on a criteria gets the mean of values of a column. Same as avgIf.

  • meanIf(region == 'West', sales)

millisecond

millisecond(<value1> : timestamp, [<value2> : string]) => integer

Gets the millisecond value of a date. You can pass an optional timezone in the form of 'GMT', 'PST', 'UTC', 'America/Cayman'. The local timezone is used as the default. Refer Java's SimpleDateFormat class for available formats. https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html.

  • millisecond(toTimestamp('2009-07-30 12:58:59.871', 'yyyy-MM-dd HH:mm:ss.SSS')) -> 871

milliseconds

milliseconds(<value1> : integer) => long

Duration in milliseconds for number of milliseconds.

  • milliseconds(2) -> 2L

min

min(<value1> : any) => any

Gets the minimum value of a column.

  • min(sales)

minIf

minIf(<value1> : boolean, <value2> : any) => any

Based on a criteria, gets the minimum value of a column.

  • minIf(region == 'West', sales)

minus

minus(<value1> : any, <value2> : any) => any

Subtracts numbers. Subtract number of days from a date. Subtract duration from a timestamp. Subtract two timestamps to get difference in milliseconds. Same as the - operator.

  • minus(20, 10) -> 10
  • 20 - 10 -> 10
  • minus(toDate('2012-12-15'), 3) -> toDate('2012-12-12')
  • toDate('2012-12-15') - 3 -> toDate('2012-12-12')
  • toTimestamp('2019-02-03 05:19:28.871', 'yyyy-MM-dd HH:mm:ss.SSS') + (days(1) + hours(2) - seconds(10)) -> toTimestamp('2019-02-04 07:19:18.871', 'yyyy-MM-dd HH:mm:ss.SSS')
  • toTimestamp('2019-02-03 05:21:34.851', 'yyyy-MM-dd HH:mm:ss.SSS') - toTimestamp('2019-02-03 05:21:36.923', 'yyyy-MM-dd HH:mm:ss.SSS') -> -2072

minute

minute(<value1> : timestamp, [<value2> : string]) => integer

Gets the minute value of a timestamp. You can pass an optional timezone in the form of 'GMT', 'PST', 'UTC', 'America/Cayman'. The local timezone is used as the default. Refer Java's SimpleDateFormat class for available formats. https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html.

  • minute(toTimestamp('2009-07-30 12:58:59')) -> 58
  • minute(toTimestamp('2009-07-30 12:58:59'), 'PST') -> 58

minutes

minutes(<value1> : integer) => long

Duration in milliseconds for number of minutes.

  • minutes(2) -> 120000L

mlookup

mlookup(key, key2, ...) => complex[]

Looks up the all matching rows from the cached sink using the specified keys that match the keys from the cached sink.

  • cacheSink#mlookup(movieId)

mod

mod(<value1> : any, <value2> : any) => any

Modulus of pair of numbers. Same as the % operator.

  • mod(20, 8) -> 4
  • 20 % 8 -> 4

month

month(<value1> : datetime) => integer

Gets the month value of a date or timestamp.

  • month(toDate('2012-8-8')) -> 8

monthsBetween

monthsBetween(<from date/timestamp> : datetime, <to date/timestamp> : datetime, [<roundoff> : boolean], [<time zone> : string]) => double

Gets the number of months between two dates. You can round off the calculation.You can pass an optional timezone in the form of 'GMT', 'PST', 'UTC', 'America/Cayman'. The local timezone is used as the default. Refer Java's SimpleDateFormat class for available formats. https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html.

  • monthsBetween(toTimestamp('1997-02-28 10:30:00'), toDate('1996-10-30')) -> 3.94959677

multiply

multiply(<value1> : any, <value2> : any) => any

Multiplies pair of numbers. Same as the * operator.

  • multiply(20, 10) -> 200
  • 20 * 10 -> 200

negate

negate(<value1> : number) => number

Negates a number. Turns positive numbers to negative and vice versa.

  • negate(13) -> -13

nextSequence

nextSequence() => long

Returns the next unique sequence. The number is consecutive only within a partition and is prefixed by the partitionId.

  • nextSequence() == 12313112 -> false

normalize

normalize(<String to normalize> : string) => string

Normalizes the string value to separate accented unicode characters.

  • regexReplace(normalize('bo²s'), `\p{M}`, '') -> 'boys'

not

not(<value1> : boolean) => boolean

Logical negation operator.

  • not(true) -> false
  • not(10 == 20) -> true

notEquals

notEquals(<value1> : any, <value2> : any) => boolean

Comparison not equals operator. Same as != operator.

  • 12 != 24 -> true
  • 'bojjus' != 'bo' + 'jjus' -> false

notNull

notNull(<value1> : any) => boolean

Checks if the value is not NULL.

  • notNull(NULL()) -> false
  • notNull('') -> true

nTile

nTile([<value1> : integer]) => integer

The NTile function divides the rows for each window partition into n buckets ranging from 1 to at most n. Bucket values will differ by at most 1. If the number of rows in the partition does not divide evenly into the number of buckets, then the remainder values are distributed one per bucket, starting with the first bucket. The NTile function is useful for the calculation of tertiles, quartiles, deciles, and other common summary statistics. The function calculates two variables during initialization: The size of a regular bucket will have one extra row added to it. Both variables are based on the size of the current partition. During the calculation process the function keeps track of the current row number, the current bucket number, and the row number at which the bucket will change (bucketThreshold). When the current row number reaches bucket threshold, the bucket value is increased by one and the threshold is increased by the bucket size (plus one extra if the current bucket is padded).

  • nTile()
  • nTile(numOfBuckets)

null

null() => null

Returns a NULL value. Use the function syntax(null()) if there is a column named 'null'. Any operation that uses will result in a NULL.

  • isNull('dumbo' + null) -> true
  • isNull(10 * null) -> true
  • isNull('') -> false
  • isNull(10 + 20) -> false
  • isNull(10/0) -> true

or

or(<value1> : boolean, <value2> : boolean) => boolean

Logical OR operator. Same as ||.

  • or(true, false) -> true
  • true || false -> true

originColumns

originColumns(<streamName> : string) => any

Gets all output columns for a origin stream where columns were created. Must be enclosed in another function.

  • array(toString(originColumns('source1')))

output

output() => any

Returns the first row of the results of the cache sink

  • cacheSink#output()

outputs

output() => any

Returns the entire output row set of the results of the cache sink

  • cacheSink#outputs()

partitionId

partitionId() => integer

Returns the current partition ID the input row is in.

  • partitionId()

pMod

pMod(<value1> : any, <value2> : any) => any

Positive Modulus of pair of numbers.

  • pmod(-20, 8) -> 4

power

power(<value1> : number, <value2> : number) => double

Raises one number to the power of another.

  • power(10, 2) -> 100

radians

radians(<value1> : number) => double

Converts degrees to radians

  • radians(180) => 3.141592653589793

random

random(<value1> : integral) => long

Returns a random number given an optional seed within a partition. The seed should be a fixed value and is used in conjunction with the partitionId to produce random values

  • random(1) == 1 -> false

rank

rank() => integer

Computes the rank of a value in a group of values specified in a window's order by clause. The result is one plus the number of rows preceding or equal to the current row in the ordering of the partition. The values will produce gaps in the sequence. Rank works even when data is not sorted and looks for change in values.

  • rank()

reassociate

reassociate(<value1> : map, <value2> : binaryFunction) => map

Transforms a map by associating the keys to new values. It takes a mapping function where you can address the item as #key and current value as #value.

  • reassociate(['fruit' -> 'apple', 'vegetable' -> 'tomato'], substring(#key, 1, 1) + substring(#value, 1, 1)) => ['fruit' -> 'fa', 'vegetable' -> 'vt']

reduce

reduce(<value1> : array, <value2> : any, <value3> : binaryfunction, <value4> : unaryfunction) => any

Accumulates elements in an array. Reduce expects a reference to an accumulator and one element in the first expression function as #acc and #item and it expects the resulting value as #result to be used in the second expression function.

  • toString(reduce(['1', '2', '3', '4'], '0', #acc + #item, #result)) -> '01234'

regexExtract

regexExtract(<string> : string, <regex to find> : string, [<match group 1-based index> : integral]) => string

Extract a matching substring for a given regex pattern. The last parameter identifies the match group and is defaulted to 1 if omitted. Use <regex>(back quote) to match a string without escaping.

  • regexExtract('Cost is between 600 and 800 dollars', '(\\d+) and (\\d+)', 2) -> '800'
  • regexExtract('Cost is between 600 and 800 dollars', `(\d+) and (\d+)`, 2) -> '800'

regexMatch

regexMatch(<string> : string, <regex to match> : string) => boolean

Checks if the string matches the given regex pattern. Use <regex>(back quote) to match a string without escaping.

  • regexMatch('200.50', '(\\d+).(\\d+)') -> true
  • regexMatch('200.50', `(\d+).(\d+)`) -> true

regexReplace

regexReplace(<string> : string, <regex to find> : string, <substring to replace> : string) => string

Replace all occurrences of a regex pattern with another substring in the given string Use <regex>(back quote) to match a string without escaping.

  • regexReplace('100 and 200', '(\\d+)', 'bojjus') -> 'bojjus and bojjus'
  • regexReplace('100 and 200', `(\d+)`, 'gunchus') -> 'gunchus and gunchus'

regexSplit

regexSplit(<string to split> : string, <regex expression> : string) => array

Splits a string based on a delimiter based on regex and returns an array of strings.

  • regexSplit('bojjusAgunchusBdumbo', `[CAB]`) -> ['bojjus', 'gunchus', 'dumbo']
  • regexSplit('bojjusAgunchusBdumboC', `[CAB]`) -> ['bojjus', 'gunchus', 'dumbo', '']
  • (regexSplit('bojjusAgunchusBdumboC', `[CAB]`)[1]) -> 'bojjus'
  • isNull(regexSplit('bojjusAgunchusBdumboC', `[CAB]`)[20]) -> true

replace

replace(<string> : string, <substring to find> : string, [<substring to replace> : string]) => string

Replace all occurrences of a substring with another substring in the given string. If the last parameter is omitted, it is default to empty string.

  • replace('doggie dog', 'dog', 'cat') -> 'catgie cat'
  • replace('doggie dog', 'dog', '') -> 'gie '
  • replace('doggie dog', 'dog') -> 'gie '

reverse

reverse(<value1> : string) => string

Reverses a string.

  • reverse('gunchus') -> 'suhcnug'

right(<string to subset> : string, <number of characters> : integral) => string

Extracts a substring with number of characters from the right. Same as SUBSTRING(str, LENGTH(str) - n, n).

  • right('bojjus', 2) -> 'us'
  • right('bojjus', 20) -> 'bojjus'

rlike

rlike(<string> : string, <pattern match> : string) => boolean

Checks if the string matches the given regex pattern.

  • rlike('200.50', `(\d+).(\d+)`) -> true
  • rlike('bogus', `M[0-9]+.*`) -> false

round

round(<number> : number, [<scale to round> : number], [<rounding option> : integral]) => double

Rounds a number given an optional scale and an optional rounding mode. If the scale is omitted, it is defaulted to 0. If the mode is omitted, it is defaulted to ROUND_HALF_UP(5). The values for rounding include 1 - ROUND_UP 2 - ROUND_DOWN 3 - ROUND_CEILING 4 - ROUND_FLOOR 5 - ROUND_HALF_UP 6 - ROUND_HALF_DOWN 7 - ROUND_HALF_EVEN 8 - ROUND_UNNECESSARY.

  • round(100.123) -> 100.0
  • round(2.5, 0) -> 3.0
  • round(5.3999999999999995, 2, 7) -> 5.40

rowNumber

rowNumber() => integer

Assigns a sequential row numbering for rows in a window starting with 1.

  • rowNumber()

rpad

rpad(<string to pad> : string, <final padded length> : integral, <padding> : string) => string

Right pads the string by the supplied padding until it is of a certain length. If the string is equal to or greater than the length, then it is trimmed to the length.

  • rpad('dumbo', 10, '-') -> 'dumbo-----'
  • rpad('dumbo', 4, '-') -> 'dumb'
  • rpad('dumbo', 8, '<>') -> 'dumbo<><'

rtrim

rtrim(<string to trim> : string, [<trim characters> : string]) => string

Right trims a string of trailing characters. If second parameter is unspecified, it trims whitespace. Else it trims any character specified in the second parameter.

  • rtrim(' dumbo ') -> ' dumbo'
  • rtrim('!--!du!mbo!', '-!') -> '!--!du!mbo'

second

second(<value1> : timestamp, [<value2> : string]) => integer

Gets the second value of a date. You can pass an optional timezone in the form of 'GMT', 'PST', 'UTC', 'America/Cayman'. The local timezone is used as the default. Refer Java's SimpleDateFormat class for available formats. https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html.

  • second(toTimestamp('2009-07-30 12:58:59')) -> 59

seconds

seconds(<value1> : integer) => long

Duration in milliseconds for number of seconds.

  • seconds(2) -> 2000L

setBitSet

setBitSet (<value1>: array, <value2>:array) => array

Sets bit positions in this bitset

  • setBitSet(toBitSet([10, 32]), [98]) => [4294968320L, 17179869184L]

sha1

sha1(<value1> : any, ...) => string

Calculates the SHA-1 digest of set of column of varying primitive datatypes and returns a 40 character hex string. It can be used to calculate a fingerprint for a row.

  • sha1(5, 'gunchus', 8.2, 'bojjus', true, toDate('2010-4-4')) -> '46d3b478e8ec4e1f3b453ac3d8e59d5854e282bb'

sha2

sha2(<value1> : integer, <value2> : any, ...) => string

Calculates the SHA-2 digest of set of column of varying primitive datatypes given a bit length which can only be of values 0(256), 224, 256, 384, 512. It can be used to calculate a fingerprint for a row.

  • sha2(256, 'gunchus', 8.2, 'bojjus', true, toDate('2010-4-4')) -> 'afe8a553b1761c67d76f8c31ceef7f71b66a1ee6f4e6d3b5478bf68b47d06bd3'

sin

sin(<value1> : number) => double

Calculates a sine value.

  • sin(2) -> 0.9092974268256817

sinh

sinh(<value1> : number) => double

Calculates a hyperbolic sine value.

  • sinh(0) -> 0.0

size

size(<value1> : any) => integer

Finds the size of an array or map type

  • size(['element1', 'element2']) -> 2
  • size([1,2,3]) -> 3

skewness

skewness(<value1> : number) => double

Gets the skewness of a column.

  • skewness(sales)

skewnessIf

skewnessIf(<value1> : boolean, <value2> : number) => double

Based on a criteria, gets the skewness of a column.

  • skewnessIf(region == 'West', sales)

slice

slice(<array to slice> : array, <from 1-based index> : integral, [<number of items> : integral]) => array

Extracts a subset of an array from a position. Position is 1 based. If the length is omitted, it is defaulted to end of the string.

  • slice([10, 20, 30, 40], 1, 2) -> [10, 20]
  • slice([10, 20, 30, 40], 2) -> [20, 30, 40]
  • slice([10, 20, 30, 40], 2)[1] -> 20
  • isNull(slice([10, 20, 30, 40], 2)[0]) -> true
  • isNull(slice([10, 20, 30, 40], 2)[20]) -> true
  • slice(['a', 'b', 'c', 'd'], 8) -> []

sort

sort(<value1> : array, <value2> : binaryfunction) => array

Sorts the array using the provided predicate function. Sort expects a reference to two consecutive elements in the expression function as #item1 and #item2.

  • sort([4, 8, 2, 3], compare(#item1, #item2)) -> [2, 3, 4, 8]
  • sort(['a3', 'b2', 'c1'], iif(right(#item1, 1) >= right(#item2, 1), 1, -1)) -> ['c1', 'b2', 'a3']

soundex

soundex(<value1> : string) => string

Gets the soundex code for the string.

  • soundex('genius') -> 'G520'

split

split(<string to split> : string, <split characters> : string) => array

Splits a string based on a delimiter and returns an array of strings.

  • split('bojjus,guchus,dumbo', ',') -> ['bojjus', 'guchus', 'dumbo']
  • split('bojjus,guchus,dumbo', '|') -> ['bojjus,guchus,dumbo']
  • split('bojjus, guchus, dumbo', ', ') -> ['bojjus', 'guchus', 'dumbo']
  • split('bojjus, guchus, dumbo', ', ')[1] -> 'bojjus'
  • isNull(split('bojjus, guchus, dumbo', ', ')[0]) -> true
  • isNull(split('bojjus, guchus, dumbo', ', ')[20]) -> true
  • split('bojjusguchusdumbo', ',') -> ['bojjusguchusdumbo']

sqrt

sqrt(<value1> : number) => double

Calculates the square root of a number.

  • sqrt(9) -> 3

startsWith

startsWith(<string> : string, <substring to check> : string) => boolean

Checks if the string starts with the supplied string.

  • startsWith('dumbo', 'du') -> true

stddev

stddev(<value1> : number) => double

Gets the standard deviation of a column.

  • stdDev(sales)

stddevIf

stddevIf(<value1> : boolean, <value2> : number) => double

Based on a criteria, gets the standard deviation of a column.

  • stddevIf(region == 'West', sales)

stddevPopulation

stddevPopulation(<value1> : number) => double

Gets the population standard deviation of a column.

  • stddevPopulation(sales)

stddevPopulationIf

stddevPopulationIf(<value1> : boolean, <value2> : number) => double

Based on a criteria, gets the population standard deviation of a column.

  • stddevPopulationIf(region == 'West', sales)

stddevSample

stddevSample(<value1> : number) => double

Gets the sample standard deviation of a column.

  • stddevSample(sales)

stddevSampleIf

stddevSampleIf(<value1> : boolean, <value2> : number) => double

Based on a criteria, gets the sample standard deviation of a column.

  • stddevSampleIf(region == 'West', sales)

subDays

subDays(<date/timestamp> : datetime, <days to subtract> : integral) => datetime

Subtract days from a date or timestamp. Same as the - operator for date.

  • subDays(toDate('2016-08-08'), 1) -> toDate('2016-08-07')

subMonths

subMonths(<date/timestamp> : datetime, <months to subtract> : integral) => datetime

Subtract months from a date or timestamp.

  • subMonths(toDate('2016-09-30'), 1) -> toDate('2016-08-31')

substring

substring(<string to subset> : string, <from 1-based index> : integral, [<number of characters> : integral]) => string

Extracts a substring of a certain length from a position. Position is 1 based. If the length is omitted, it is defaulted to end of the string.

  • substring('Cat in the hat', 5, 2) -> 'in'
  • substring('Cat in the hat', 5, 100) -> 'in the hat'
  • substring('Cat in the hat', 5) -> 'in the hat'
  • substring('Cat in the hat', 100, 100) -> ''

sum

sum(<value1> : number) => number

Gets the aggregate sum of a numeric column.

  • sum(col)

sumDistinct

sumDistinct(<value1> : number) => number

Gets the aggregate sum of distinct values of a numeric column.

  • sumDistinct(col)

sumDistinctIf

sumDistinctIf(<value1> : boolean, <value2> : number) => number

Based on criteria gets the aggregate sum of a numeric column. The condition can be based on any column.

  • sumDistinctIf(state == 'CA' && commission < 10000, sales)
  • sumDistinctIf(true, sales)

sumIf

sumIf(<value1> : boolean, <value2> : number) => number

Based on criteria gets the aggregate sum of a numeric column. The condition can be based on any column.

  • sumIf(state == 'CA' && commission < 10000, sales)
  • sumIf(true, sales)

tan

tan(<value1> : number) => double

Calculates a tangent value.

  • tan(0) -> 0.0

tanh

tanh(<value1> : number) => double

Calculates a hyperbolic tangent value.

  • tanh(0) -> 0.0

toBase64

toBase64(<value1> : string) => string

Encodes the given string in base64.

  • toBase64('bojjus') -> 'Ym9qanVz'

toBinary

toBinary(<value1> : any) => binary

Converts any numeric/date/timestamp/string to binary representation.

  • toBinary(3) -> [0x11]

toBoolean

toBoolean(<value1> : string) => boolean

Converts a value of ('t', 'true', 'y', 'yes', '1') to true and ('f', 'false', 'n', 'no', '0') to false and NULL for any other value.

  • toBoolean('true') -> true
  • toBoolean('n') -> false
  • isNull(toBoolean('truthy')) -> true

toByte

toByte(<value> : any, [<format> : string], [<locale> : string]) => byte

Converts any numeric or string to a byte value. An optional Java decimal format can be used for the conversion.

  • toByte(123)
  • 123
  • toByte(0xFF)
  • -1
  • toByte('123')
  • 123

toDate

toDate(<string> : any, [<date format> : string]) => date

Converts input date string to date using an optional input date format. Refer Java's SimpleDateFormat class for available formats. If the input date format is omitted, default format is yyyy-[M]M-[d]d. Accepted formats are :[ yyyy, yyyy-[M]M, yyyy-[M]M-[d]d, yyyy-[M]M-[d]dT* ].

  • toDate('2012-8-18') -> toDate('2012-08-18')
  • toDate('12/18/2012', 'MM/dd/yyyy') -> toDate('2012-12-18')

toDecimal

toDecimal(<value> : any, [<precision> : integral], [<scale> : integral], [<format> : string], [<locale> : string]) => decimal(10,0)

Converts any numeric or string to a decimal value. If precision and scale are not specified, it is defaulted to (10,2).An optional Java decimal format can be used for the conversion. An optional locale format in the form of BCP47 language like en-US, de, zh-CN.

  • toDecimal(123.45) -> 123.45
  • toDecimal('123.45', 8, 4) -> 123.4500
  • toDecimal('$123.45', 8, 4,'$###.00') -> 123.4500
  • toDecimal('Ç123,45', 10, 2, 'Ç###,##', 'de') -> 123.45

toDouble

toDouble(<value> : any, [<format> : string], [<locale> : string]) => double

Converts any numeric or string to a double value. An optional Java decimal format can be used for the conversion. An optional locale format in the form of BCP47 language like en-US, de, zh-CN.

  • toDouble(123.45) -> 123.45
  • toDouble('123.45') -> 123.45
  • toDouble('$123.45', '$###.00') -> 123.45
  • toDouble('Ç123,45', 'Ç###,##', 'de') -> 123.45

toFloat

toFloat(<value> : any, [<format> : string], [<locale> : string]) => float

Converts any numeric or string to a float value. An optional Java decimal format can be used for the conversion. Truncates any double.

  • toFloat(123.45) -> 123.45f
  • toFloat('123.45') -> 123.45f
  • toFloat('$123.45', '$###.00') -> 123.45f

toInteger

toInteger(<value> : any, [<format> : string], [<locale> : string]) => integer

Converts any numeric or string to an integer value. An optional Java decimal format can be used for the conversion. Truncates any long, float, double.

  • toInteger(123) -> 123
  • toInteger('123') -> 123
  • toInteger('$123', '$###') -> 123

toLong

toLong(<value> : any, [<format> : string], [<locale> : string]) => long

Converts any numeric or string to a long value. An optional Java decimal format can be used for the conversion. Truncates any float, double.

  • toLong(123) -> 123
  • toLong('123') -> 123
  • toLong('$123', '$###') -> 123

toShort

toShort(<value> : any, [<format> : string], [<locale> : string]) => short

Converts any numeric or string to a short value. An optional Java decimal format can be used for the conversion. Truncates any integer, long, float, double.

  • toShort(123) -> 123
  • toShort('123') -> 123
  • toShort('$123', '$###') -> 123

toString

toString(<value> : any, [<number format/date format> : string]) => string

Converts a primitive datatype to a string. For numbers and date a format can be specified. If unspecified the system default is picked.Java decimal format is used for numbers. Refer to Java SimpleDateFormat for all possible date formats; the default format is yyyy-MM-dd.

  • toString(10) -> '10'
  • toString('engineer') -> 'engineer'
  • toString(123456.789, '##,###.##') -> '123,456.79'
  • toString(123.78, '000000.000') -> '000123.780'
  • toString(12345, '##0.#####E0') -> '12.345E3'
  • toString(toDate('2018-12-31')) -> '2018-12-31'
  • isNull(toString(toDate('2018-12-31', 'MM/dd/yy'))) -> true
  • toString(4 == 20) -> 'false'

toTimestamp

toTimestamp(<string> : any, [<timestamp format> : string], [<time zone> : string]) => timestamp

Converts a string to a timestamp given an optional timestamp format. If the timestamp is omitted the default pattern yyyy-[M]M-[d]d hh:mm:ss[.f...] is used. You can pass an optional timezone in the form of 'GMT', 'PST', 'UTC', 'America/Cayman'. Timestamp supports up to millisecond accuracy with value of 999. Refer Java's SimpleDateFormat class for available formats. https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html.

  • toTimestamp('2016-12-31 00:12:00') -> toTimestamp('2016-12-31 00:12:00')
  • toTimestamp('2016-12-31T00:12:00', 'yyyy-MM-dd\'T\'HH:mm:ss', 'PST') -> toTimestamp('2016-12-31 00:12:00')
  • toTimestamp('12/31/2016T00:12:00', 'MM/dd/yyyy\'T\'HH:mm:ss') -> toTimestamp('2016-12-31 00:12:00')
  • millisecond(toTimestamp('2019-02-03 05:19:28.871', 'yyyy-MM-dd HH:mm:ss.SSS')) -> 871

toUTC

toUTC(<value1> : timestamp, [<value2> : string]) => timestamp

Converts the timestamp to UTC. You can pass an optional timezone in the form of 'GMT', 'PST', 'UTC', 'America/Cayman'. It is defaulted to the current timezone. Refer Java's SimpleDateFormat class for available formats. https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html.

  • toUTC(currentTimestamp()) == toTimestamp('2050-12-12 19:18:12') -> false
  • toUTC(currentTimestamp(), 'Asia/Seoul') != toTimestamp('2050-12-12 19:18:12') -> true

translate

translate(<string to translate> : string, <lookup characters> : string, <replace characters> : string) => string

Replace one set of characters by another set of characters in the string. Characters have 1 to 1 replacement.

  • translate('(bojjus)', '()', '[]') -> '[bojjus]'
  • translate('(gunchus)', '()', '[') -> '[gunchus'

trim

trim(<string to trim> : string, [<trim characters> : string]) => string

Trims a string of leading and trailing characters. If second parameter is unspecified, it trims whitespace. Else it trims any character specified in the second parameter.

  • trim(' dumbo ') -> 'dumbo'
  • trim('!--!du!mbo!', '-!') -> 'du!mbo'

true

true() => boolean

Always returns a true value. Use the function syntax(true()) if there is a column named 'true'.

  • (10 + 20 == 30) -> true
  • (10 + 20 == 30) -> true()

typeMatch

typeMatch(<type> : string, <base type> : string) => boolean

Matches the type of the column. Can only be used in pattern expressions.number matches short, integer, long, double, float or decimal, integral matches short, integer, long, fractional matches double, float, decimal and datetime matches date or timestamp type.

  • typeMatch(type, 'number')
  • typeMatch('date', 'datetime')

unescape

unescape(<string_to_escape> : string, <format> : string) => string

Unescapes a string according to a format. Literal values for acceptable format are 'json', 'xml', 'ecmascript', 'html', 'java'.

  • unescape('{\\\\\"value\\\\\": 10}', 'json')
  • '{\\\"value\\\": 10}'

unfold

unfold (<value1>: array) => any

Unfolds an array into a set of rows and repeats the values for the remaining columns in every row.

  • unfold(addresses) => any
  • unfold( @(name = salesPerson, sales = salesAmount) ) => any

unhex

unhex(<value1>: string) => binary

Unhexes a binary value from its string representation. This can be used in conjunction with sha2, md5 to convert from string to binary representation

  • unhex('1fadbe') -> toBinary([toByte(0x1f), toByte(0xad), toByte(0xbe)])
  • unhex(md5(5, 'gunchus', 8.2, 'bojjus', true, toDate('2010-4-4'))) -> toBinary([toByte(0x4c),toByte(0xe8),toByte(0xa8),toByte(0x80),toByte(0xbd),toByte(0x62),toByte(0x1a),toByte(0x1f),toByte(0xfa),toByte(0xd0),toByte(0xbc),toByte(0xa9),toByte(0x05),toByte(0xe1),toByte(0xbc),toByte(0x5a)])

union

union(<value1>: array, <value2> : array) => array

Returns a union set of distinct items from 2 arrays.

  • union([10, 20, 30], [20, 40]) => [10, 20, 30, 40]

upper

upper(<value1> : string) => string

Uppercases a string.

  • upper('bojjus') -> 'BOJJUS'

uuid

uuid() => string

Returns the generated UUID.

  • uuid()

variance

variance(<value1> : number) => double

Gets the variance of a column.

  • variance(sales)

varianceIf

varianceIf(<value1> : boolean, <value2> : number) => double

Based on a criteria, gets the variance of a column.

  • varianceIf(region == 'West', sales)

variancePopulation

variancePopulation(<value1> : number) => double

Gets the population variance of a column.

  • variancePopulation(sales)

variancePopulationIf

variancePopulationIf(<value1> : boolean, <value2> : number) => double

Based on a criteria, gets the population variance of a column.

  • variancePopulationIf(region == 'West', sales)

varianceSample

varianceSample(<value1> : number) => double

Gets the unbiased variance of a column.

  • varianceSample(sales)

varianceSampleIf

varianceSampleIf(<value1> : boolean, <value2> : number) => double

Based on a criteria, gets the unbiased variance of a column.

  • varianceSampleIf(region == 'West', sales)

weekOfYear

weekOfYear(<value1> : datetime) => integer

Gets the week of the year given a date.

  • weekOfYear(toDate('2008-02-20')) -> 8

weeks

weeks(<value1> : integer) => long

Duration in milliseconds for number of weeks.

  • weeks(2) -> 1209600000L

xor

xor(<value1> : boolean, <value2> : boolean) => boolean

Logical XOR operator. Same as ^ operator.

  • xor(true, false) -> true
  • xor(true, true) -> false
  • true ^ false -> true

year

year(<value1> : datetime) => integer

Gets the year value of a date.

  • year(toDate('2012-8-8')) -> 2012

Next steps

Learn how to use Expression Builder.