Data transformation expressions in mapping data flow
Azure Data Factory
Azure Synapse Analytics
Expression functions
In Data Factory, use the expression language of the mapping data flow feature to configure data transformations.
abs
abs(<value1> : number) => number
Absolute value of a number.
abs(-20) -> 20
abs(10) -> 10
acos
acos(<value1> : number) => double
Calculates a cosine inverse value.
acos(1) -> 0.0
add
add(<value1> : any, <value2> : any) => any
Adds a pair of strings or numbers. Adds a date to a number of days. Adds a duration to a timestamp. Appends one array of similar type to another. Same as the + operator.
add(10, 20) -> 30
10 + 20 -> 30
add('ice', 'cream') -> 'icecream'
'ice' + 'cream' + ' cone' -> 'icecream cone'
add(toDate('2012-12-12'), 3) -> toDate('2012-12-15')
toDate('2012-12-12') + 3 -> toDate('2012-12-15')
[10, 20] + [30, 40] -> [10, 20, 30, 40]
toTimestamp('2019-02-03 05:19:28.871', 'yyyy-MM-dd HH:mm:ss.SSS') + (days(1) + hours(2) - seconds(10)) -> toTimestamp('2019-02-04 07:19:18.871', 'yyyy-MM-dd HH:mm:ss.SSS')
addDays
addDays(<date/timestamp> : datetime, <days to add> : integral) => datetime
Add days to a date or timestamp. Same as the + operator for date.
addDays(toDate('2016-08-08'), 1) -> toDate('2016-08-09')
addMonths
addMonths(<date/timestamp> : datetime, <months to add> : integral, [<value3> : string]) => datetime
Add months to a date or timestamp. You can optionally pass a timezone.
addMonths(toDate('2016-08-31'), 1) -> toDate('2016-09-30')
addMonths(toTimestamp('2016-09-30 10:10:10'), -1) -> toTimestamp('2016-08-31 10:10:10')
and
and(<value1> : boolean, <value2> : boolean) => boolean
Logical AND operator. Same as &&.
and(true, false) -> false
true && false -> false
asin
asin(<value1> : number) => double
Calculates an inverse sine value.
asin(0) -> 0.0
atan
atan(<value1> : number) => double
Calculates a inverse tangent value.
atan(0) -> 0.0
atan2
atan2(<value1> : number, <value2> : number) => double
Returns the angle in radians between the positive x-axis of a plane and the point given by the coordinates.
atan2(0, 0) -> 0.0
case
case(<condition> : boolean, <true_expression> : any, <false_expression> : any, ...) => any
Based on alternating conditions applies one value or the other. If the number of inputs are even, the other is defaulted to NULL for last condition.
case(10 + 20 == 30, 'dumbo', 'gumbo') -> 'dumbo'
case(10 + 20 == 25, 'bojjus', 'do' < 'go', 'gunchus') -> 'gunchus'
isNull(case(10 + 20 == 25, 'bojjus', 'do' > 'go', 'gunchus')) -> true
case(10 + 20 == 25, 'bojjus', 'do' > 'go', 'gunchus', 'dumbo') -> 'dumbo'
cbrt
cbrt(<value1> : number) => double
Calculates the cube root of a number.
cbrt(8) -> 2.0
ceil
ceil(<value1> : number) => number
Returns the smallest integer not smaller than the number.
ceil(-0.1) -> 0
coalesce
coalesce(<value1> : any, ...) => any
Returns the first not null value from a set of inputs. All inputs should be of the same type.
coalesce(10, 20) -> 10
coalesce(toString(null), toString(null), 'dumbo', 'bo', 'go') -> 'dumbo'
collect
collect(<value1> : any) => array
Collects all values of the expression in the aggregated group into an array. Structures can be collected and transformed to alternate structures during this process. The number of items will be equal to the number of rows in that group and can contain null values. The number of collected items should be small.
collect(salesPerson)
collect(firstName + lastName))
collect(@(name = salesPerson, sales = salesAmount) )
columnNames
columnNames(<value1> : string) => array
Gets all output columns for a stream. You can pass an optional stream name as the second argument.
columnNames()
columnNames('DeriveStream')
columns
columns([<stream name> : string]) => any
Gets all output columns for a stream. You can pass an optional stream name as the second argument.
columns()
columns('DeriveStream')
compare
compare(<value1> : any, <value2> : any) => integer
Compares two values of the same type. Returns negative integer if value1 < value2, 0 if value1 == value2, positive value if value1 > value2.
(compare(12, 24) < 1) -> true
(compare('dumbo', 'dum') > 0) -> true
concat
concat(<this> : string, <that> : string, ...) => string
Concatenates a variable number of strings together. Same as the + operator with strings.
concat('dataflow', 'is', 'awesome') -> 'dataflowisawesome'
'dataflow' + 'is' + 'awesome' -> 'dataflowisawesome'
isNull('sql' + null) -> true
concatWS
concatWS(<separator> : string, <this> : string, <that> : string, ...) => string
Concatenates a variable number of strings together with a separator. The first parameter is the separator.
concatWS(' ', 'dataflow', 'is', 'awesome') -> 'dataflow is awesome'
isNull(concatWS(null, 'dataflow', 'is', 'awesome')) -> true
concatWS(' is ', 'dataflow', 'awesome') -> 'dataflow is awesome'
contains
contains(<value1> : array, <value2> : unaryfunction) => boolean
Returns true if any element in the provided array evaluates as true in the provided predicate. Contains expects a reference to one element in the predicate function as #item.
contains([1, 2, 3, 4], #item == 3) -> true
contains([1, 2, 3, 4], #item > 5) -> false
cos
cos(<value1> : number) => double
Calculates a cosine value.
cos(10) -> -0.8390715290764524
cosh
cosh(<value1> : number) => double
Calculates a hyperbolic cosine of a value.
cosh(0) -> 1.0
crc32
crc32(<value1> : any, ...) => long
Calculates the CRC32 hash of set of column of varying primitive datatypes given a bit length which can only be of values 0(256), 224, 256, 384, 512. It can be used to calculate a fingerprint for a row.
crc32(256, 'gunchus', 8.2, 'bojjus', true, toDate('2010-4-4')) -> 3630253689L
currentDate
currentDate([<value1> : string]) => date
Gets the current date when this job starts to run. You can pass an optional timezone in the form of 'GMT', 'PST', 'UTC', 'America/Cayman'. The local timezone is used as the default. Refer Java's SimpleDateFormat
class for available formats. https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html.
currentDate() == toDate('2250-12-31') -> false
currentDate('PST') == toDate('2250-12-31') -> false
currentDate('America/New_York') == toDate('2250-12-31') -> false
currentTimestamp
currentTimestamp() => timestamp
Gets the current timestamp when the job starts to run with local time zone.
currentTimestamp() == toTimestamp('2250-12-31 12:12:12') -> false
currentUTC
currentUTC([<value1> : string]) => timestamp
Gets the current timestamp as UTC. If you want your current time to be interpreted in a different timezone than your cluster time zone, you can pass an optional timezone in the form of 'GMT', 'PST', 'UTC', 'America/Cayman'. It is defaulted to the current timezone. Refer Java's SimpleDateFormat
class for available formats. https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html. To convert the UTC time to a different timezone use fromUTC()
.
currentUTC() == toTimestamp('2050-12-12 19:18:12') -> false
currentUTC() != toTimestamp('2050-12-12 19:18:12') -> true
fromUTC(currentUTC(), 'Asia/Seoul') != toTimestamp('2050-12-12 19:18:12') -> true
dayOfMonth
dayOfMonth(<value1> : datetime) => integer
Gets the day of the month given a date.
dayOfMonth(toDate('2018-06-08')) -> 8
dayOfWeek
dayOfWeek(<value1> : datetime) => integer
Gets the day of the week given a date. 1 - Sunday, 2 - Monday ..., 7 - Saturday.
dayOfWeek(toDate('2018-06-08')) -> 6
dayOfYear
dayOfYear(<value1> : datetime) => integer
Gets the day of the year given a date.
dayOfYear(toDate('2016-04-09')) -> 100
days
days(<value1> : integer) => long
Duration in milliseconds for number of days.
days(2) -> 172800000L
degrees
degrees(<value1> : number) => double
Converts radians to degrees.
degrees(3.141592653589793) -> 180
divide
divide(<value1> : any, <value2> : any) => any
Divides pair of numbers. Same as the /
operator.
divide(20, 10) -> 2
20 / 10 -> 2
endsWith
endsWith(<string> : string, <substring to check> : string) => boolean
Checks if the string ends with the supplied string.
endsWith('dumbo', 'mbo') -> true
equals
equals(<value1> : any, <value2> : any) => boolean
Comparison equals operator. Same as == operator.
equals(12, 24) -> false
12 == 24 -> false
'bad' == 'bad' -> true
isNull('good' == toString(null)) -> true
isNull(null == null) -> true
equalsIgnoreCase
equalsIgnoreCase(<value1> : string, <value2> : string) => boolean
Comparison equals operator ignoring case. Same as <=> operator.
'abc'<=>'Abc' -> true
equalsIgnoreCase('abc', 'Abc') -> true
factorial
factorial(<value1> : number) => long
Calculates the factorial of a number.
factorial(5) -> 120
false
false() => boolean
Always returns a false value. Use the function syntax(false())
if there is a column named 'false'.
(10 + 20 > 30) -> false
(10 + 20 > 30) -> false()
floor
floor(<value1> : number) => number
Returns the largest integer not greater than the number.
floor(-0.1) -> -1
fromBase64
fromBase64(<value1> : string) => string
Encodes the given string in base64.
fromBase64('Z3VuY2h1cw==') -> 'gunchus'
fromUTC
fromUTC(<value1> : timestamp, [<value2> : string]) => timestamp
Converts to the timestamp from UTC. You can optionally pass the timezone in the form of 'GMT', 'PST', 'UTC', 'America/Cayman'. It is defaulted to the current timezone. Refer Java's SimpleDateFormat
class for available formats. https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html.
fromUTC(currentTimestamp()) == toTimestamp('2050-12-12 19:18:12') -> false
fromUTC(currentTimestamp(), 'Asia/Seoul') != toTimestamp('2050-12-12 19:18:12') -> true
greater
greater(<value1> : any, <value2> : any) => boolean
Comparison greater operator. Same as > operator.
greater(12, 24) -> false
('dumbo' > 'dum') -> true
(toTimestamp('2019-02-05 08:21:34.890', 'yyyy-MM-dd HH:mm:ss.SSS') > toTimestamp('2019-02-03 05:19:28.871', 'yyyy-MM-dd HH:mm:ss.SSS')) -> true
greaterOrEqual
greaterOrEqual(<value1> : any, <value2> : any) => boolean
Comparison greater than or equal operator. Same as >= operator.
greaterOrEqual(12, 12) -> true
('dumbo' >= 'dum') -> true
greatest
greatest(<value1> : any, ...) => any
Returns the greatest value among the list of values as input skipping null values. Returns null if all inputs are null.
greatest(10, 30, 15, 20) -> 30
greatest(10, toInteger(null), 20) -> 20
greatest(toDate('2010-12-12'), toDate('2011-12-12'), toDate('2000-12-12')) -> toDate('2011-12-12')
greatest(toTimestamp('2019-02-03 05:19:28.871', 'yyyy-MM-dd HH:mm:ss.SSS'), toTimestamp('2019-02-05 08:21:34.890', 'yyyy-MM-dd HH:mm:ss.SSS')) -> toTimestamp('2019-02-05 08:21:34.890', 'yyyy-MM-dd HH:mm:ss.SSS')
hasColumn
hasColumn(<column name> : string, [<stream name> : string]) => boolean
Checks for a column value by name in the stream. You can pass a optional stream name as the second argument. Column names known at design time should be addressed just by their name. Computed inputs are not supported but you can use parameter substitutions.
hasColumn('parent')
hour
hour(<value1> : timestamp, [<value2> : string]) => integer
Gets the hour value of a timestamp. You can pass an optional timezone in the form of 'GMT', 'PST', 'UTC', 'America/Cayman'. The local timezone is used as the default. Refer Java's SimpleDateFormat
class for available formats. https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html.
hour(toTimestamp('2009-07-30 12:58:59')) -> 12
hour(toTimestamp('2009-07-30 12:58:59'), 'PST') -> 12
hours
hours(<value1> : integer) => long
Duration in milliseconds for number of hours.
hours(2) -> 7200000L
iif
iif(<condition> : boolean, <true_expression> : any, [<false_expression> : any]) => any
Based on a condition applies one value or the other. If other is unspecified it is considered NULL. Both the values must be compatible(numeric, string...).
iif(10 + 20 == 30, 'dumbo', 'gumbo') -> 'dumbo'
iif(10 > 30, 'dumbo', 'gumbo') -> 'gumbo'
iif(month(toDate('2018-12-01')) == 12, 345.12, 102.67) -> 345.12
iifNull
iifNull(<value1> : any, [<value2> : any], ...) => any
Checks if the first parameter is null. If not null, the first parameter is returned. If null, the second parameter is returned. If three parameters are specified, the behavior is the same as iif(isNull(value1), value2, value3) and the third parameter is returned if the first value is not null.
iifNull(10, 20) -> 10
iifNull(null, 20, 40) -> 20
iifNull('azure', 'data', 'factory') -> 'factory'
iifNull(null, 'data', 'factory') -> 'data'
in
in(<array of items> : array, <item to find> : any) => boolean
Checks if an item is in the array.
in([10, 20, 30], 10) -> true
in(['good', 'kid'], 'bad') -> false
initCap
initCap(<value1> : string) => string
Converts the first letter of every word to uppercase. Words are identified as separated by whitespace.
initCap('cool iceCREAM') -> 'Cool Icecream'
instr
instr(<string> : string, <substring to find> : string) => integer
Finds the position(1 based) of the substring within a string. 0 is returned if not found.
instr('dumbo', 'mbo') -> 3
instr('microsoft', 'o') -> 5
instr('good', 'bad') -> 0
isDelete
isDelete([<value1> : integer]) => boolean
Checks if the row is marked for delete. For transformations taking more than one input stream you can pass the (1-based) index of the stream. The stream index should be either 1 or 2 and the default value is 1.
isDelete()
isDelete(1)
isError
isError([<value1> : integer]) => boolean
Checks if the row is marked as error. For transformations taking more than one input stream you can pass the (1-based) index of the stream. The stream index should be either 1 or 2 and the default value is 1.
isError()
isError(1)
isIgnore
isIgnore([<value1> : integer]) => boolean
Checks if the row is marked to be ignored. For transformations taking more than one input stream you can pass the (1-based) index of the stream. The stream index should be either 1 or 2 and the default value is 1.
isIgnore()
isIgnore(1)
isInsert
isInsert([<value1> : integer]) => boolean
Checks if the row is marked for insert. For transformations taking more than one input stream you can pass the (1-based) index of the stream. The stream index should be either 1 or 2 and the default value is 1.
isInsert()
isInsert(1)
isMatch
isMatch([<value1> : integer]) => boolean
Checks if the row is matched at lookup. For transformations taking more than one input stream you can pass the (1-based) index of the stream. The stream index should be either 1 or 2 and the default value is 1.
isMatch()
isMatch(1)
isNull
isNull(<value1> : any) => boolean
Checks if the value is NULL.
isNull(NULL()) -> true
isNull('') -> false
isUpdate
isUpdate([<value1> : integer]) => boolean
Checks if the row is marked for update. For transformations taking more than one input stream you can pass the (1-based) index of the stream. The stream index should be either 1 or 2 and the default value is 1.
isUpdate()
isUpdate(1)
isUpsert
isUpsert([<value1> : integer]) => boolean
Checks if the row is marked for insert. For transformations taking more than one input stream you can pass the (1-based) index of the stream. The stream index should be either 1 or 2 and the default value is 1.
isUpsert()
isUpsert(1)
lastDayOfMonth
lastDayOfMonth(<value1> : datetime) => date
Gets the last date of the month given a date.
lastDayOfMonth(toDate('2009-01-12')) -> toDate('2009-01-31')
least
least(<value1> : any, ...) => any
Comparison lesser than or equal operator. Same as <= operator.
least(10, 30, 15, 20) -> 10
least(toDate('2010-12-12'), toDate('2011-12-12'), toDate('2000-12-12')) -> toDate('2000-12-12')
left
left(<string to subset> : string, <number of characters> : integral) => string
Extracts a substring start at index 1 with number of characters. Same as SUBSTRING(str, 1, n).
left('bojjus', 2) -> 'bo'
left('bojjus', 20) -> 'bojjus'
length
length(<value1> : string) => integer
Returns the length of the string.
length('dumbo') -> 5
lesser
lesser(<value1> : any, <value2> : any) => boolean
Comparison less operator. Same as < operator.
lesser(12, 24) -> true
('abcd' < 'abc') -> false
(toTimestamp('2019-02-03 05:19:28.871', 'yyyy-MM-dd HH:mm:ss.SSS') < toTimestamp('2019-02-05 08:21:34.890', 'yyyy-MM-dd HH:mm:ss.SSS')) -> true
lesserOrEqual
lesserOrEqual(<value1> : any, <value2> : any) => boolean
Comparison lesser than or equal operator. Same as <= operator.
lesserOrEqual(12, 12) -> true
('dumbo' <= 'dum') -> false
levenshtein
levenshtein(<from string> : string, <to string> : string) => integer
Gets the levenshtein distance between two strings.
levenshtein('boys', 'girls') -> 4
like
like(<string> : string, <pattern match> : string) => boolean
The pattern is a string that is matched literally. The exceptions are the following special symbols: _ matches any one character in the input (similar to . in posix
regular expressions)
% matches zero or more characters in the input (similar to .* in posix
regular expressions).
The escape character is ''. If an escape character precedes a special symbol or another escape character, the following character is matched literally. It is invalid to escape any other character.
like('icecream', 'ice%') -> true
locate
locate(<substring to find> : string, <string> : string, [<from index - 1-based> : integral]) => integer
Finds the position(1 based) of the substring within a string starting a certain position. If the position is omitted it is considered from the beginning of the string. 0 is returned if not found.
locate('mbo', 'dumbo') -> 3
locate('o', 'microsoft', 6) -> 7
locate('bad', 'good') -> 0
log
log(<value1> : number, [<value2> : number]) => double
Calculates log value. An optional base can be supplied else a Euler number if used.
log(100, 10) -> 2
log10
log10(<value1> : number) => double
Calculates log value based on 10 base.
log10(100) -> 2
lower
lower(<value1> : string) => string
Lowercases a string.
lower('GunChus') -> 'gunchus'
lpad
lpad(<string to pad> : string, <final padded length> : integral, <padding> : string) => string
Left pads the string by the supplied padding until it is of a certain length. If the string is equal to or greater than the length, then it is trimmed to the length.
lpad('dumbo', 10, '-') -> '-----dumbo'
lpad('dumbo', 4, '-') -> 'dumb'
lpad('dumbo', 8, '<>') -> '<><dumbo'
ltrim
ltrim(<string to trim> : string, [<trim characters> : string]) => string
Left trims a string of leading characters. If second parameter is unspecified, it trims whitespace. Else it trims any character specified in the second parameter.
ltrim(' dumbo ') -> 'dumbo '
ltrim('!--!du!mbo!', '-!') -> 'du!mbo!'
md5
md5(<value1> : any, ...) => string
Calculates the MD5 digest of set of column of varying primitive datatypes and returns a 32 character hex string. It can be used to calculate a fingerprint for a row.
md5(5, 'gunchus', 8.2, 'bojjus', true, toDate('2010-4-4')) -> '4ce8a880bd621a1ffad0bca905e1bc5a'
millisecond
millisecond(<value1> : timestamp, [<value2> : string]) => integer
Gets the millisecond value of a date. You can pass an optional timezone in the form of 'GMT', 'PST', 'UTC', 'America/Cayman'. The local timezone is used as the default. Refer Java's SimpleDateFormat
class for available formats. https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html.
millisecond(toTimestamp('2009-07-30 12:58:59.871', 'yyyy-MM-dd HH:mm:ss.SSS')) -> 871
milliseconds
milliseconds(<value1> : integer) => long
Duration in milliseconds for number of milliseconds.
milliseconds(2) -> 2L
minus
minus(<value1> : any, <value2> : any) => any
Subtracts numbers. Subtract number of days from a date. Subtract duration from a timestamp. Subtract two timestamps to get difference in milliseconds. Same as the - operator.
minus(20, 10) -> 10
20 - 10 -> 10
minus(toDate('2012-12-15'), 3) -> toDate('2012-12-12')
toDate('2012-12-15') - 3 -> toDate('2012-12-12')
toTimestamp('2019-02-03 05:19:28.871', 'yyyy-MM-dd HH:mm:ss.SSS') + (days(1) + hours(2) - seconds(10)) -> toTimestamp('2019-02-04 07:19:18.871', 'yyyy-MM-dd HH:mm:ss.SSS')
toTimestamp('2019-02-03 05:21:34.851', 'yyyy-MM-dd HH:mm:ss.SSS') - toTimestamp('2019-02-03 05:21:36.923', 'yyyy-MM-dd HH:mm:ss.SSS') -> -2072
minute
minute(<value1> : timestamp, [<value2> : string]) => integer
Gets the minute value of a timestamp. You can pass an optional timezone in the form of 'GMT', 'PST', 'UTC', 'America/Cayman'. The local timezone is used as the default. Refer Java's SimpleDateFormat
class for available formats. https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html.
minute(toTimestamp('2009-07-30 12:58:59')) -> 58
minute(toTimestamp('2009-07-30 12:58:59'), 'PST') -> 58
minutes
minutes(<value1> : integer) => long
Duration in milliseconds for number of minutes.
minutes(2) -> 120000L
mod
mod(<value1> : any, <value2> : any) => any
Modulus of pair of numbers. Same as the % operator.
mod(20, 8) -> 4
20 % 8 -> 4
month
month(<value1> : datetime) => integer
Gets the month value of a date or timestamp.
month(toDate('2012-8-8')) -> 8
monthsBetween
monthsBetween(<from date/timestamp> : datetime, <to date/timestamp> : datetime, [<roundoff> : boolean], [<time zone> : string]) => double
Gets the number of months between two dates. You can round off the calculation.You can pass an optional timezone in the form of 'GMT', 'PST', 'UTC', 'America/Cayman'. The local timezone is used as the default. Refer Java's SimpleDateFormat
class for available formats. https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html.
monthsBetween(toTimestamp('1997-02-28 10:30:00'), toDate('1996-10-30')) -> 3.94959677
multiply
multiply(<value1> : any, <value2> : any) => any
Multiplies pair of numbers. Same as the * operator.
multiply(20, 10) -> 200
20 * 10 -> 200
negate
negate(<value1> : number) => number
Negates a number. Turns positive numbers to negative and vice versa.
negate(13) -> -13
nextSequence
nextSequence() => long
Returns the next unique sequence. The number is consecutive only within a partition and is prefixed by the partitionId.
nextSequence() == 12313112 -> false
normalize
normalize(<String to normalize> : string) => string
Normalizes the string value to separate accented unicode characters.
regexReplace(normalize('bo²s'), `\p{M}`, '') -> 'boys'
not
not(<value1> : boolean) => boolean
Logical negation operator.
not(true) -> false
not(10 == 20) -> true
notEquals
notEquals(<value1> : any, <value2> : any) => boolean
Comparison not equals operator. Same as != operator.
12 != 24 -> true
'bojjus' != 'bo' + 'jjus' -> false
notNull
notNull(<value1> : any) => boolean
Checks if the value is not NULL.
notNull(NULL()) -> false
notNull('') -> true
null
null() => null
Returns a NULL value. Use the function syntax(null())
if there is a column named 'null'. Any operation that uses will result in a NULL.
isNull('dumbo' + null) -> true
isNull(10 * null) -> true
isNull('') -> false
isNull(10 + 20) -> false
isNull(10/0) -> true
or
or(<value1> : boolean, <value2> : boolean) => boolean
Logical OR operator. Same as ||.
or(true, false) -> true
true || false -> true
pMod
pMod(<value1> : any, <value2> : any) => any
Positive Modulus of pair of numbers.
pmod(-20, 8) -> 4
partitionId
partitionId() => integer
Returns the current partition id the input row is in.
partitionId()
power
power(<value1> : number, <value2> : number) => double
Raises one number to the power of another.
power(10, 2) -> 100
random
random(<value1> : integral) => long
Returns a random number given an optional seed within a partition. The seed should be a fixed value and is used in conjunction with the partitionId to produce random values
random(1) == 1 -> false
regexExtract
regexExtract(<string> : string, <regex to find> : string, [<match group 1-based index> : integral]) => string
Extract a matching substring for a given regex pattern. The last parameter identifies the match group and is defaulted to 1 if omitted. Use <regex>
(back quote) to match a string without escaping.
regexExtract('Cost is between 600 and 800 dollars', '(\\d+) and (\\d+)', 2) -> '800'
regexExtract('Cost is between 600 and 800 dollars', `(\d+) and (\d+)`, 2) -> '800'
regexMatch
regexMatch(<string> : string, <regex to match> : string) => boolean
Checks if the string matches the given regex pattern. Use <regex>
(back quote) to match a string without escaping.
regexMatch('200.50', '(\\d+).(\\d+)') -> true
regexMatch('200.50', `(\d+).(\d+)`) -> true
regexReplace
regexReplace(<string> : string, <regex to find> : string, <substring to replace> : string) => string
Replace all occurrences of a regex pattern with another substring in the given string Use <regex>
(back quote) to match a string without escaping.
regexReplace('100 and 200', '(\\d+)', 'bojjus') -> 'bojjus and bojjus'
regexReplace('100 and 200', `(\d+)`, 'gunchus') -> 'gunchus and gunchus'
regexSplit
regexSplit(<string to split> : string, <regex expression> : string) => array
Splits a string based on a delimiter based on regex and returns an array of strings.
regexSplit('bojjusAgunchusBdumbo', `[CAB]`) -> ['bojjus', 'gunchus', 'dumbo']
regexSplit('bojjusAgunchusBdumboC', `[CAB]`) -> ['bojjus', 'gunchus', 'dumbo', '']
(regexSplit('bojjusAgunchusBdumboC', `[CAB]`)[1]) -> 'bojjus'
isNull(regexSplit('bojjusAgunchusBdumboC', `[CAB]`)[20]) -> true
replace
replace(<string> : string, <substring to find> : string, [<substring to replace> : string]) => string
Replace all occurrences of a substring with another substring in the given string. If the last parameter is omitted, it is default to empty string.
replace('doggie dog', 'dog', 'cat') -> 'catgie cat'
replace('doggie dog', 'dog', '') -> 'gie '
replace('doggie dog', 'dog') -> 'gie '
reverse
reverse(<value1> : string) => string
Reverses a string.
reverse('gunchus') -> 'suhcnug'
right
right(<string to subset> : string, <number of characters> : integral) => string
Extracts a substring with number of characters from the right. Same as SUBSTRING(str, LENGTH(str) - n, n).
right('bojjus', 2) -> 'us'
right('bojjus', 20) -> 'bojjus'
rlike
rlike(<string> : string, <pattern match> : string) => boolean
Checks if the string matches the given regex pattern.
rlike('200.50', `(\d+).(\d+)`) -> true
rlike('bogus', `M[0-9]+.*`) -> false
round
round(<number> : number, [<scale to round> : number], [<rounding option> : integral]) => double
Rounds a number given an optional scale and an optional rounding mode. If the scale is omitted, it is defaulted to 0. If the mode is omitted, it is defaulted to ROUND_HALF_UP(5). The values for rounding include
1 - ROUND_UP
2 - ROUND_DOWN
3 - ROUND_CEILING
4 - ROUND_FLOOR
5 - ROUND_HALF_UP
6 - ROUND_HALF_DOWN
7 - ROUND_HALF_EVEN
8 - ROUND_UNNECESSARY.
round(100.123) -> 100.0
round(2.5, 0) -> 3.0
round(5.3999999999999995, 2, 7) -> 5.40
rpad
rpad(<string to pad> : string, <final padded length> : integral, <padding> : string) => string
Right pads the string by the supplied padding until it is of a certain length. If the string is equal to or greater than the length, then it is trimmed to the length.
rpad('dumbo', 10, '-') -> 'dumbo-----'
rpad('dumbo', 4, '-') -> 'dumb'
rpad('dumbo', 8, '<>') -> 'dumbo<><'
rtrim
rtrim(<string to trim> : string, [<trim characters> : string]) => string
Right trims a string of trailing characters. If second parameter is unspecified, it trims whitespace. Else it trims any character specified in the second parameter.
rtrim(' dumbo ') -> ' dumbo'
rtrim('!--!du!mbo!', '-!') -> '!--!du!mbo'
second
second(<value1> : timestamp, [<value2> : string]) => integer
Gets the second value of a date. You can pass an optional timezone in the form of 'GMT', 'PST', 'UTC', 'America/Cayman'. The local timezone is used as the default. Refer Java's SimpleDateFormat
class for available formats. https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html.
second(toTimestamp('2009-07-30 12:58:59')) -> 59
seconds
seconds(<value1> : integer) => long
Duration in milliseconds for number of seconds.
seconds(2) -> 2000L
sha1
sha1(<value1> : any, ...) => string
Calculates the SHA-1 digest of set of column of varying primitive datatypes and returns a 40 character hex string. It can be used to calculate a fingerprint for a row.
sha1(5, 'gunchus', 8.2, 'bojjus', true, toDate('2010-4-4')) -> '46d3b478e8ec4e1f3b453ac3d8e59d5854e282bb'
sha2
sha2(<value1> : integer, <value2> : any, ...) => string
Calculates the SHA-2 digest of set of column of varying primitive datatypes given a bit length which can only be of values 0(256), 224, 256, 384, 512. It can be used to calculate a fingerprint for a row.
sha2(256, 'gunchus', 8.2, 'bojjus', true, toDate('2010-4-4')) -> 'afe8a553b1761c67d76f8c31ceef7f71b66a1ee6f4e6d3b5478bf68b47d06bd3'
sin
sin(<value1> : number) => double
Calculates a sine value.
sin(2) -> 0.9092974268256817
sinh
sinh(<value1> : number) => double
Calculates a hyperbolic sine value.
sinh(0) -> 0.0
soundex
soundex(<value1> : string) => string
Gets the soundex
code for the string.
soundex('genius') -> 'G520'
split
split(<string to split> : string, <split characters> : string) => array
Splits a string based on a delimiter and returns an array of strings.
split('bojjus,guchus,dumbo', ',') -> ['bojjus', 'guchus', 'dumbo']
split('bojjus,guchus,dumbo', '|') -> ['bojjus,guchus,dumbo']
split('bojjus, guchus, dumbo', ', ') -> ['bojjus', 'guchus', 'dumbo']
split('bojjus, guchus, dumbo', ', ')[1] -> 'bojjus'
isNull(split('bojjus, guchus, dumbo', ', ')[0]) -> true
isNull(split('bojjus, guchus, dumbo', ', ')[20]) -> true
split('bojjusguchusdumbo', ',') -> ['bojjusguchusdumbo']
sqrt
sqrt(<value1> : number) => double
Calculates the square root of a number.
sqrt(9) -> 3
startsWith
startsWith(<string> : string, <substring to check> : string) => boolean
Checks if the string starts with the supplied string.
startsWith('dumbo', 'du') -> true
subDays
subDays(<date/timestamp> : datetime, <days to subtract> : integral) => datetime
Subtract days from a date or timestamp. Same as the - operator for date.
subDays(toDate('2016-08-08'), 1) -> toDate('2016-08-07')
subMonths
subMonths(<date/timestamp> : datetime, <months to subtract> : integral) => datetime
Subtract months from a date or timestamp.
subMonths(toDate('2016-09-30'), 1) -> toDate('2016-08-31')
substring
substring(<string to subset> : string, <from 1-based index> : integral, [<number of characters> : integral]) => string
Extracts a substring of a certain length from a position. Position is 1 based. If the length is omitted, it is defaulted to end of the string.
substring('Cat in the hat', 5, 2) -> 'in'
substring('Cat in the hat', 5, 100) -> 'in the hat'
substring('Cat in the hat', 5) -> 'in the hat'
substring('Cat in the hat', 100, 100) -> ''
tan
tan(<value1> : number) => double
Calculates a tangent value.
tan(0) -> 0.0
tanh
tanh(<value1> : number) => double
Calculates a hyperbolic tangent value.
tanh(0) -> 0.0
translate
translate(<string to translate> : string, <lookup characters> : string, <replace characters> : string) => string
Replace one set of characters by another set of characters in the string. Characters have 1 to 1 replacement.
translate('(bojjus)', '()', '[]') -> '[bojjus]'
translate('(gunchus)', '()', '[') -> '[gunchus'
trim
trim(<string to trim> : string, [<trim characters> : string]) => string
Trims a string of leading and trailing characters. If second parameter is unspecified, it trims whitespace. Else it trims any character specified in the second parameter.
trim(' dumbo ') -> 'dumbo'
trim('!--!du!mbo!', '-!') -> 'du!mbo'
true
true() => boolean
Always returns a true value. Use the function syntax(true())
if there is a column named 'true'.
(10 + 20 == 30) -> true
(10 + 20 == 30) -> true()
typeMatch
typeMatch(<type> : string, <base type> : string) => boolean
Matches the type of the column. Can only be used in pattern expressions.number matches short, integer, long, double, float or decimal, integral matches short, integer, long, fractional matches double, float, decimal and datetime matches date or timestamp type.
typeMatch(type, 'number')
typeMatch('date', 'datetime')
upper
upper(<value1> : string) => string
Uppercases a string.
upper('bojjus') -> 'BOJJUS'
uuid
uuid() => string
Returns the generated UUID.
uuid()
weekOfYear
weekOfYear(<value1> : datetime) => integer
Gets the week of the year given a date.
weekOfYear(toDate('2008-02-20')) -> 8
weeks
weeks(<value1> : integer) => long
Duration in milliseconds for number of weeks.
weeks(2) -> 1209600000L
xor
xor(<value1> : boolean, <value2> : boolean) => boolean
Logical XOR operator. Same as ^ operator.
xor(true, false) -> true
xor(true, true) -> false
true ^ false -> true
year
year(<value1> : datetime) => integer
Gets the year value of a date.
year(toDate('2012-8-8')) -> 2012
Aggregate functions
The following functions are only available in aggregate, pivot, unpivot, and window transformations.
avg
avg(<value1> : number) => number
Gets the average of values of a column.
avg(sales)
avgIf
avgIf(<value1> : boolean, <value2> : number) => number
Based on a criteria gets the average of values of a column.
avgIf(region == 'West', sales)
count
count([<value1> : any]) => long
Gets the aggregate count of values. If the optional column(s) is specified, it ignores NULL values in the count.
count(custId)
count(custId, custName)
count()
count(iif(isNull(custId), 1, NULL))
countDistinct
countDistinct(<value1> : any, [<value2> : any], ...) => long
Gets the aggregate count of distinct values of a set of columns.
countDistinct(custId, custName)
countIf
countIf(<value1> : boolean, [<value2> : any]) => long
Based on a criteria gets the aggregate count of values. If the optional column is specified, it ignores NULL values in the count.
countIf(state == 'CA' && commission < 10000, name)
covariancePopulation
covariancePopulation(<value1> : number, <value2> : number) => double
Gets the population covariance between two columns.
covariancePopulation(sales, profit)
covariancePopulationIf
covariancePopulationIf(<value1> : boolean, <value2> : number, <value3> : number) => double
Based on a criteria, gets the population covariance of two columns.
covariancePopulationIf(region == 'West', sales)
covarianceSample
covarianceSample(<value1> : number, <value2> : number) => double
Gets the sample covariance of two columns.
covarianceSample(sales, profit)
covarianceSampleIf
covarianceSampleIf(<value1> : boolean, <value2> : number, <value3> : number) => double
Based on a criteria, gets the sample covariance of two columns.
covarianceSampleIf(region == 'West', sales, profit)
first
first(<value1> : any, [<value2> : boolean]) => any
Gets the first value of a column group. If the second parameter ignoreNulls is omitted, it is assumed false.
first(sales)
first(sales, false)
kurtosis
kurtosis(<value1> : number) => double
Gets the kurtosis of a column.
kurtosis(sales)
kurtosisIf
kurtosisIf(<value1> : boolean, <value2> : number) => double
Based on a criteria, gets the kurtosis of a column.
kurtosisIf(region == 'West', sales)
last
last(<value1> : any, [<value2> : boolean]) => any
Gets the last value of a column group. If the second parameter ignoreNulls is omitted, it is assumed false.
last(sales)
last(sales, false)
max
max(<value1> : any) => any
Gets the maximum value of a column.
max(sales)
maxIf
maxIf(<value1> : boolean, <value2> : any) => any
Based on a criteria, gets the maximum value of a column.
maxIf(region == 'West', sales)
mean
mean(<value1> : number) => number
Gets the mean of values of a column. Same as AVG.
mean(sales)
meanIf
meanIf(<value1> : boolean, <value2> : number) => number
Based on a criteria gets the mean of values of a column. Same as avgIf.
meanIf(region == 'West', sales)
min
min(<value1> : any) => any
Gets the minimum value of a column.
min(sales)
minIf
minIf(<value1> : boolean, <value2> : any) => any
Based on a criteria, gets the minimum value of a column.
minIf(region == 'West', sales)
skewness
skewness(<value1> : number) => double
Gets the skewness of a column.
skewness(sales)
skewnessIf
skewnessIf(<value1> : boolean, <value2> : number) => double
Based on a criteria, gets the skewness of a column.
skewnessIf(region == 'West', sales)
stddev
stddev(<value1> : number) => double
Gets the standard deviation of a column.
stdDev(sales)
stddevIf
stddevIf(<value1> : boolean, <value2> : number) => double
Based on a criteria, gets the standard deviation of a column.
stddevIf(region == 'West', sales)
stddevPopulation
stddevPopulation(<value1> : number) => double
Gets the population standard deviation of a column.
stddevPopulation(sales)
stddevPopulationIf
stddevPopulationIf(<value1> : boolean, <value2> : number) => double
Based on a criteria, gets the population standard deviation of a column.
stddevPopulationIf(region == 'West', sales)
stddevSample
stddevSample(<value1> : number) => double
Gets the sample standard deviation of a column.
stddevSample(sales)
stddevSampleIf
stddevSampleIf(<value1> : boolean, <value2> : number) => double
Based on a criteria, gets the sample standard deviation of a column.
stddevSampleIf(region == 'West', sales)
sum
sum(<value1> : number) => number
Gets the aggregate sum of a numeric column.
sum(col)
sumDistinct
sumDistinct(<value1> : number) => number
Gets the aggregate sum of distinct values of a numeric column.
sumDistinct(col)
sumDistinctIf
sumDistinctIf(<value1> : boolean, <value2> : number) => number
Based on criteria gets the aggregate sum of a numeric column. The condition can be based on any column.
sumDistinctIf(state == 'CA' && commission < 10000, sales)
sumDistinctIf(true, sales)
sumIf
sumIf(<value1> : boolean, <value2> : number) => number
Based on criteria gets the aggregate sum of a numeric column. The condition can be based on any column.
sumIf(state == 'CA' && commission < 10000, sales)
sumIf(true, sales)
variance
variance(<value1> : number) => double
Gets the variance of a column.
variance(sales)
varianceIf
varianceIf(<value1> : boolean, <value2> : number) => double
Based on a criteria, gets the variance of a column.
varianceIf(region == 'West', sales)
variancePopulation
variancePopulation(<value1> : number) => double
Gets the population variance of a column.
variancePopulation(sales)
variancePopulationIf
variancePopulationIf(<value1> : boolean, <value2> : number) => double
Based on a criteria, gets the population variance of a column.
variancePopulationIf(region == 'West', sales)
varianceSample
varianceSample(<value1> : number) => double
Gets the unbiased variance of a column.
varianceSample(sales)
varianceSampleIf
varianceSampleIf(<value1> : boolean, <value2> : number) => double
Based on a criteria, gets the unbiased variance of a column.
varianceSampleIf(region == 'West', sales)
Array functions
Array functions perform transformations on data structures that are arrays. These include special keywords to address array elements and indexes:
#acc
represents a value that you wish to include in your single output when reducing an array#index
represents the current array index, along with array index numbers#index2, #index3 ...
#item
represents the current element value in the array
array
array([<value1> : any], ...) => array
Creates an array of items. All items should be of the same type. If no items are specified, an empty string array is the default. Same as a [] creation operator.
array('Seattle', 'Washington')
['Seattle', 'Washington']
['Seattle', 'Washington'][1]
'Washington'
filter
filter(<value1> : array, <value2> : unaryfunction) => array
Filters elements out of the array that do not meet the provided predicate. Filter expects a reference to one element in the predicate function as #item.
filter([1, 2, 3, 4], #item > 2) -> [3, 4]
filter(['a', 'b', 'c', 'd'], #item == 'a' || #item == 'b') -> ['a', 'b']
find
find(<value1> : array, <value2> : unaryfunction) => any
Find the first item from an array that match the condition. It takes a filter function where you can address the item in the array as #item. For deeply nested maps you can refer to the parent maps using the #item_n(#item_1, #item_2...) notation.
find([10, 20, 30], #item > 10) -> 20
find(['azure', 'data', 'factory'], length(#item) > 4) -> 'azure'
find([ @( name = 'Daniel', types = [ @(mood = 'jovial', behavior = 'terrific'), @(mood = 'grumpy', behavior = 'bad') ] ), @( name = 'Mark', types = [ @(mood = 'happy', behavior = 'awesome'), @(mood = 'calm', behavior = 'reclusive') ] ) ], contains(#item.types, #item.mood=='happy') /*Filter out the happy kid*/ )
@( name = 'Mark', types = [ @(mood = 'happy', behavior = 'awesome'), @(mood = 'calm', behavior = 'reclusive') ] )
map
map(<value1> : array, <value2> : unaryfunction) => any
Maps each element of the array to a new element using the provided expression. Map expects a reference to one element in the expression function as #item.
map([1, 2, 3, 4], #item + 2) -> [3, 4, 5, 6]
map(['a', 'b', 'c', 'd'], #item + '_processed') -> ['a_processed', 'b_processed', 'c_processed', 'd_processed']
mapIndex
mapIndex(<value1> : array, <value2> : binaryfunction) => any
Maps each element of the array to a new element using the provided expression. Map expects a reference to one element in the expression function as #item and a reference to the element index as #index.
mapIndex([1, 2, 3, 4], #item + 2 + #index) -> [4, 6, 8, 10]
reduce
reduce(<value1> : array, <value2> : any, <value3> : binaryfunction, <value4> : unaryfunction) => any
Accumulates elements in an array. Reduce expects a reference to an accumulator and one element in the first expression function as #acc and #item and it expects the resulting value as #result to be used in the second expression function.
toString(reduce(['1', '2', '3', '4'], '0', #acc + #item, #result)) -> '01234'
size
size(<value1> : any) => integer
Finds the size of an array or map type
size(['element1', 'element2']) -> 2
size([1,2,3]) -> 3
slice
slice(<array to slice> : array, <from 1-based index> : integral, [<number of items> : integral]) => array
Extracts a subset of an array from a position. Position is 1 based. If the length is omitted, it is defaulted to end of the string.
slice([10, 20, 30, 40], 1, 2) -> [10, 20]
slice([10, 20, 30, 40], 2) -> [20, 30, 40]
slice([10, 20, 30, 40], 2)[1] -> 20
isNull(slice([10, 20, 30, 40], 2)[0]) -> true
isNull(slice([10, 20, 30, 40], 2)[20]) -> true
slice(['a', 'b', 'c', 'd'], 8) -> []
sort
sort(<value1> : array, <value2> : binaryfunction) => array
Sorts the array using the provided predicate function. Sort expects a reference to two consecutive elements in the expression function as #item1 and #item2.
sort([4, 8, 2, 3], compare(#item1, #item2)) -> [2, 3, 4, 8]
sort(['a3', 'b2', 'c1'], iif(right(#item1, 1) >= right(#item2, 1), 1, -1)) -> ['c1', 'b2', 'a3']
Conversion functions
Conversion functions are used to convert data and data types
toBase64
toBase64(<value1> : string) => string
Encodes the given string in base64.
toBase64('bojjus') -> 'Ym9qanVz'
toBinary
toBinary(<value1> : any) => binary
Converts any numeric/date/timestamp/string to binary representation.
toBinary(3) -> [0x11]
toBoolean
toBoolean(<value1> : string) => boolean
Converts a value of ('t', 'true', 'y', 'yes', '1') to true and ('f', 'false', 'n', 'no', '0') to false and NULL for any other value.
toBoolean('true') -> true
toBoolean('n') -> false
isNull(toBoolean('truthy')) -> true
toByte
toByte(<value> : any, [<format> : string], [<locale> : string]) => byte
Converts any numeric or string to a byte value. An optional Java decimal format can be used for the conversion.
toByte(123)
123
toByte(0xFF)
-1
toByte('123')
123
toDate
toDate(<string> : any, [<date format> : string]) => date
Converts input date string to date using an optional input date format. Refer Java's SimpleDateFormat
class for available formats. If the input date format is omitted, default format is yyyy-[M]M-[d]d. Accepted formats are :[ yyyy, yyyy-[M]M, yyyy-[M]M-[d]d, yyyy-[M]M-[d]dT* ].
toDate('2012-8-18') -> toDate('2012-08-18')
toDate('12/18/2012', 'MM/dd/yyyy') -> toDate('2012-12-18')
toDecimal
toDecimal(<value> : any, [<precision> : integral], [<scale> : integral], [<format> : string], [<locale> : string]) => decimal(10,0)
Converts any numeric or string to a decimal value. If precision and scale are not specified, it is defaulted to (10,2).An optional Java decimal format can be used for the conversion. An optional locale format in the form of BCP47 language like en-US, de, zh-CN.
toDecimal(123.45) -> 123.45
toDecimal('123.45', 8, 4) -> 123.4500
toDecimal('$123.45', 8, 4,'$###.00') -> 123.4500
toDecimal('Ç123,45', 10, 2, 'Ç###,##', 'de') -> 123.45
toDouble
toDouble(<value> : any, [<format> : string], [<locale> : string]) => double
Converts any numeric or string to a double value. An optional Java decimal format can be used for the conversion. An optional locale format in the form of BCP47 language like en-US, de, zh-CN.
toDouble(123.45) -> 123.45
toDouble('123.45') -> 123.45
toDouble('$123.45', '$###.00') -> 123.45
toDouble('Ç123,45', 'Ç###,##', 'de') -> 123.45
toFloat
toFloat(<value> : any, [<format> : string], [<locale> : string]) => float
Converts any numeric or string to a float value. An optional Java decimal format can be used for the conversion. Truncates any double.
toFloat(123.45) -> 123.45f
toFloat('123.45') -> 123.45f
toFloat('$123.45', '$###.00') -> 123.45f
toInteger
toInteger(<value> : any, [<format> : string], [<locale> : string]) => integer
Converts any numeric or string to an integer value. An optional Java decimal format can be used for the conversion. Truncates any long, float, double.
toInteger(123) -> 123
toInteger('123') -> 123
toInteger('$123', '$###') -> 123
toLong
toLong(<value> : any, [<format> : string], [<locale> : string]) => long
Converts any numeric or string to a long value. An optional Java decimal format can be used for the conversion. Truncates any float, double.
toLong(123) -> 123
toLong('123') -> 123
toLong('$123', '$###') -> 123
toShort
toShort(<value> : any, [<format> : string], [<locale> : string]) => short
Converts any numeric or string to a short value. An optional Java decimal format can be used for the conversion. Truncates any integer, long, float, double.
toShort(123) -> 123
toShort('123') -> 123
toShort('$123', '$###') -> 123
toString
toString(<value> : any, [<number format/date format> : string]) => string
Converts a primitive datatype to a string. For numbers and date a format can be specified. If unspecified the system default is picked.Java decimal format is used for numbers. Refer to Java SimpleDateFormat for all possible date formats; the default format is yyyy-MM-dd.
toString(10) -> '10'
toString('engineer') -> 'engineer'
toString(123456.789, '##,###.##') -> '123,456.79'
toString(123.78, '000000.000') -> '000123.780'
toString(12345, '##0.#####E0') -> '12.345E3'
toString(toDate('2018-12-31')) -> '2018-12-31'
isNull(toString(toDate('2018-12-31', 'MM/dd/yy'))) -> true
toString(4 == 20) -> 'false'
toTimestamp
toTimestamp(<string> : any, [<timestamp format> : string], [<time zone> : string]) => timestamp
Converts a string to a timestamp given an optional timestamp format. If the timestamp is omitted the default pattern yyyy-[M]M-[d]d hh:mm:ss[.f...] is used. You can pass an optional timezone in the form of 'GMT', 'PST', 'UTC', 'America/Cayman'. Timestamp supports up to millisecond accuracy with value of 999. Refer Java's SimpleDateFormat
class for available formats. https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html.
toTimestamp('2016-12-31 00:12:00') -> toTimestamp('2016-12-31 00:12:00')
toTimestamp('2016-12-31T00:12:00', 'yyyy-MM-dd\'T\'HH:mm:ss', 'PST') -> toTimestamp('2016-12-31 00:12:00')
toTimestamp('12/31/2016T00:12:00', 'MM/dd/yyyy\'T\'HH:mm:ss') -> toTimestamp('2016-12-31 00:12:00')
millisecond(toTimestamp('2019-02-03 05:19:28.871', 'yyyy-MM-dd HH:mm:ss.SSS')) -> 871
toUTC
toUTC(<value1> : timestamp, [<value2> : string]) => timestamp
Converts the timestamp to UTC. You can pass an optional timezone in the form of 'GMT', 'PST', 'UTC', 'America/Cayman'. It is defaulted to the current timezone. Refer Java's SimpleDateFormat
class for available formats. https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html.
toUTC(currentTimestamp()) == toTimestamp('2050-12-12 19:18:12') -> false
toUTC(currentTimestamp(), 'Asia/Seoul') != toTimestamp('2050-12-12 19:18:12') -> true
Metafunctions
Metafunctions primarily function on metadata in your data flow
byItem
byItem(<parent column> : any, <column name> : string) => any
Find a sub item within a structure or array of structure If there are multiple matches, the first match is returned. If no match it returns a NULL value. The returned value has to be type converted by one of the type conversion actions(? date, ? string ...). Column names known at design time should be addressed just by their name. Computed inputs are not supported but you can use parameter substitutions
byItem( byName('customer'), 'orderItems') ? (itemName as string, itemQty as integer)
-
byItem( byItem( byName('customer'), 'orderItems'), 'itemName') ? string
-
byOrigin
byOrigin(<column name> : string, [<origin stream name> : string]) => any
Selects a column value by name in the origin stream. The second argument is the origin stream name. If there are multiple matches, the first match is returned. If no match it returns a NULL value. The returned value has to be type converted by one of the type conversion functions(TO_DATE, TO_STRING ...). Column names known at design time should be addressed just by their name. Computed inputs are not supported but you can use parameter substitutions.
toString(byOrigin('ancestor', 'ancestorStream'))
byOrigins
byOrigins(<column names> : array, [<origin stream name> : string]) => any
Selects an array of columns by name in the stream. The second argument is the stream where it originated from. If there are multiple matches, the first match is returned. If no match it returns a NULL value. The returned value has to be type converted by one of the type conversion functions(TO_DATE, TO_STRING ...) Column names known at design time should be addressed just by their name. Computed inputs are not supported but you can use parameter substitutions.
toString(byOrigins(['ancestor1', 'ancestor2'], 'ancestorStream'))
byName
byName(<column name> : string, [<stream name> : string]) => any
Selects a column value by name in the stream. You can pass a optional stream name as the second argument. If there are multiple matches, the first match is returned. If no match it returns a NULL value. The returned value has to be type converted by one of the type conversion functions(TO_DATE, TO_STRING ...). Column names known at design time should be addressed just by their name. Computed inputs are not supported but you can use parameter substitutions.
toString(byName('parent'))
toLong(byName('income'))
toBoolean(byName('foster'))
toLong(byName($debtCol))
toString(byName('Bogus Column'))
toString(byName('Bogus Column', 'DeriveStream'))
byNames
byNames(<column names> : array, [<stream name> : string]) => any
Select an array of columns by name in the stream. You can pass a optional stream name as the second argument. If there are multiple matches, the first match is returned. If there are no matches for a column, the entire output is a NULL value. The returned value requires a type conversion functions (toDate, toString, ...). Column names known at design time should be addressed just by their name. Computed inputs are not supported but you can use parameter substitutions.
toString(byNames(['parent', 'child']))
byNames(['parent']) ? string
toLong(byNames(['income']))
byNames(['income']) ? long
toBoolean(byNames(['foster']))
toLong(byNames($debtCols))
toString(byNames(['a Column']))
toString(byNames(['a Column'], 'DeriveStream'))
byNames(['orderItem']) ? (itemName as string, itemQty as integer)
byPosition
byPosition(<position> : integer) => any
Selects a column value by its relative position(1 based) in the stream. If the position is out of bounds it returns a NULL value. The returned value has to be type converted by one of the type conversion functions(TO_DATE, TO_STRING ...) Computed inputs are not supported but you can use parameter substitutions.
toString(byPosition(1))
toDecimal(byPosition(2), 10, 2)
toBoolean(byName(4))
toString(byName($colName))
toString(byPosition(1234))
Cached lookup functions
The following functions are only available when using a cached lookup when you've included a cached sink.
lookup
lookup(key, key2, ...) => complex[]
Looks up the first row from the cached sink using the specified keys that match the keys from the cached sink.
cacheSink#lookup(movieId)
mlookup
mlookup(key, key2, ...) => complex[]
Looks up the all matching rows from the cached sink using the specified keys that match the keys from the cached sink.
cacheSink#mlookup(movieId)
output
output() => any
Returns the first row of the results of the cache sink
cacheSink#output()
outputs
output() => any
Returns the entire output row set of the results of the cache sink
cacheSink#outputs()
Window functions
The following functions are only available in window transformations.
cumeDist
cumeDist() => integer
The CumeDist function computes the position of a value relative to all values in the partition. The result is the number of rows preceding or equal to the current row in the ordering of the partition divided by the total number of rows in the window partition. Any tie values in the ordering will evaluate to the same position.
cumeDist()
denseRank
denseRank() => integer
Computes the rank of a value in a group of values specified in a window's order by clause. The result is one plus the number of rows preceding or equal to the current row in the ordering of the partition. The values will not produce gaps in the sequence. Dense Rank works even when data is not sorted and looks for change in values.
denseRank()
lag
lag(<value> : any, [<number of rows to look before> : number], [<default value> : any]) => any
Gets the value of the first parameter evaluated n rows before the current row. The second parameter is the number of rows to look back and the default value is 1. If there are not as many rows a value of null is returned unless a default value is specified.
lag(amount, 2)
lag(amount, 2000, 100)
lead
lead(<value> : any, [<number of rows to look after> : number], [<default value> : any]) => any
Gets the value of the first parameter evaluated n rows after the current row. The second parameter is the number of rows to look forward and the default value is 1. If there are not as many rows a value of null is returned unless a default value is specified.
lead(amount, 2)
lead(amount, 2000, 100)
nTile
nTile([<value1> : integer]) => integer
The NTile
function divides the rows for each window partition into n
buckets ranging from 1 to at most n
. Bucket values will differ by at most 1. If the number of rows in the partition does not divide evenly into the number of buckets, then the remainder values are distributed one per bucket, starting with the first bucket. The NTile
function is useful for the calculation of tertiles
, quartiles, deciles, and other common summary statistics. The function calculates two variables during initialization: The size of a regular bucket will have one extra row added to it. Both variables are based on the size of the current partition. During the calculation process the function keeps track of the current row number, the current bucket number, and the row number at which the bucket will change (bucketThreshold). When the current row number reaches bucket threshold, the bucket value is increased by one and the threshold is increased by the bucket size (plus one extra if the current bucket is padded).
nTile()
nTile(numOfBuckets)
rank
rank() => integer
Computes the rank of a value in a group of values specified in a window's order by clause. The result is one plus the number of rows preceding or equal to the current row in the ordering of the partition. The values will produce gaps in the sequence. Rank works even when data is not sorted and looks for change in values.
rank()
rowNumber
rowNumber() => integer
Assigns a sequential row numbering for rows in a window starting with 1.
rowNumber()