parse operator

Evaluates a string expression and parses its value into one or more calculated columns.

T | parse Text with "ActivityName=" name ", ActivityType=" type

Syntax

T | parse [kind=regex|simple|relaxed] Expression with * (StringConstant ColumnName [: ColumnType]) *...

Arguments

  • T: The input table.

  • kind:

    • simple (the default) : StringConstant is a regular string value and the match is strict which means that all string delimiters should appear in the parsed string and all extended columns must match the required types.

    • regex : StringConstant may be a regular expression and the match is strict which means that all string delimiters (which can be a regex for this mode) should appear in the parsed string and all extended columns must match the required types.

    • relaxed : StringConstant is a regular string value and the match is relaxed which means that all string delimiters should appear in the parsed string but extended columns may match the required types partially (extended columns that didn't match the required types will get the value null).

  • Expression: An expression that evaluates to a string.

  • ColumnName: The name of a column to assign a value (taken out of the string expression) to.

  • ColumnType: should be Optional scalar type that indicates the type to convert the value to (by default it is string type).

Returns

The input table, extended according to the list of columns that are provided to the operator.

Tips

  • Use project instead, if you also want to drop or rename some columns.

  • Use * in the pattern in order to skip junk values (can't be used after string column)

  • The parse pattern may start with ColumnName and not only with StringConstant.

  • If the parsed Expression is not of type string , it will be converted to type string.

  • If regex mode is used, there is an option to add regex flags in order to control the whole regex used in parse.

  • In regex mode, parse will translate the pattern to a regex and use RE2 syntax in order to do the matching using numbered captured groups which are handled internally. So for example, this parse statement :

      parse kind=regex Col with * <regex1> var1:string <regex2> var2:long
    

    The regex that will be generated by the parse internally is .*?<regex1>(.*?)<regex2>(\-\d+).

    • * was translated to .*?.

    • string was translated to .*?.

    • long was translated to \-\d+.

Examples

The parse operator provides a streamlined way to extend a table by using multiple extract applications on the same string expression. This is most useful when the table has a string column that contains several values that you want to break into individual columns, such as a column that was produced by a developer trace ("printf"/"Console.WriteLine") statement.

In the example below, assume that the column EventText of table Traces contains strings of the form Event: NotifySliceRelease (resourceName={0}, totalSlices= {1}, sliceNumber={2}, lockTime={3}, releaseTime={4}, previousLockTime={5}). The operation below will extend the table with 6 columns: resourceName , totalSlices, sliceNumber, lockTime, releaseTime, previouLockTime, Month and Day.

eventText
Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=23, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01, previousLockTime=02/17/2016 08:39:01)
Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=15, lockTime=02/17/2016 08:40:00, releaseTime=02/17/2016 08:40:00, previousLockTime=02/17/2016 08:39:00)
Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=20, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01, previousLockTime=02/17/2016 08:39:01)
Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=22, lockTime=02/17/2016 08:41:01, releaseTime=02/17/2016 08:41:00, previousLockTime=02/17/2016 08:40:01)
Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=27, sliceNumber=16, lockTime=02/17/2016 08:41:00, releaseTime=02/17/2016 08:41:00, previousLockTime=02/17/2016 08:40:00)
Traces  
| parse eventText with * "resourceName=" resourceName ", totalSlices=" totalSlices:long * "sliceNumber=" sliceNumber:long * "lockTime=" lockTime ", releaseTime=" releaseTime:date "," * "previousLockTime=" previouLockTime:date ")" *  
| project resourceName ,totalSlices , sliceNumber , lockTime , releaseTime , previouLockTime
resourceName totalSlices sliceNumber lockTime releaseTime previouLockTime
PipelineScheduler 27 15 02/17/2016 08:40:00 2016-02-17 08:40:00.0000000 2016-02-17 08:39:00.0000000
PipelineScheduler 27 23 02/17/2016 08:40:01 2016-02-17 08:40:01.0000000 2016-02-17 08:39:01.0000000
PipelineScheduler 27 20 02/17/2016 08:40:01 2016-02-17 08:40:01.0000000 2016-02-17 08:39:01.0000000
PipelineScheduler 27 16 02/17/2016 08:41:00 2016-02-17 08:41:00.0000000 2016-02-17 08:40:00.0000000
PipelineScheduler 27 22 02/17/2016 08:41:01 2016-02-17 08:41:00.0000000 2016-02-17 08:40:01.0000000

for regex mode :

Traces  
| parse kind = regex eventText with "(.*?)[a-zA-Z]*=" resourceName @", totalSlices=\s*\d+\s*.*?sliceNumber=" sliceNumber:long  ".*?(previous)?lockTime=" lockTime ".*?releaseTime=" releaseTime ".*?previousLockTime=" previouLockTime:date "\\)"  
| project resourceName , sliceNumber , lockTime , releaseTime , previouLockTime
resourceName sliceNumber lockTime releaseTime previouLockTime
PipelineScheduler 15 02/17/2016 08:40:00, 02/17/2016 08:40:00, 2016-02-17 08:39:00.0000000
PipelineScheduler 23 02/17/2016 08:40:01, 02/17/2016 08:40:01, 2016-02-17 08:39:01.0000000
PipelineScheduler 20 02/17/2016 08:40:01, 02/17/2016 08:40:01, 2016-02-17 08:39:01.0000000
PipelineScheduler 16 02/17/2016 08:41:00, 02/17/2016 08:41:00, 2016-02-17 08:40:00.0000000
PipelineScheduler 22 02/17/2016 08:41:01, 02/17/2016 08:41:00, 2016-02-17 08:40:01.0000000

for regex mode using regex flags:

if we are interested in getting the resourceName only and we use this query:

Traces
| parse kind = regex  EventText with * "resourceName=" resourceName ',' *
| project resourceName
resourceName
PipelineScheduler, totalSlices=27, sliceNumber=23, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01
PipelineScheduler, totalSlices=27, sliceNumber=15, lockTime=02/17/2016 08:40:00, releaseTime=02/17/2016 08:40:00
PipelineScheduler, totalSlices=27, sliceNumber=22, lockTime=02/17/2016 08:41:01, releaseTime=02/17/2016 08:41:00
PipelineScheduler, totalSlices=27, sliceNumber=20, lockTime=02/17/2016 08:40:01, releaseTime=02/17/2016 08:40:01
PipelineScheduler, totalSlices=27, sliceNumber=16, lockTime=02/17/2016 08:41:00, releaseTime=02/17/2016 08:41:00

we don't get the expected results since the default mode is greedy. or even if we had few records where the resourceName appears sometimes lower-case sometimes upper-case so we may get nulls for some values. in order to get the wanted result, we may run this one with the non-greedy (U) and disable case-sensitive (i) regex flags:

Traces
| parse kind = regex flags = Ui EventText with * "RESOURCENAME=" resourceName ',' *
| project resourceName
resourceName
PipelineScheduler
PipelineScheduler
PipelineScheduler
PipelineScheduler
PipelineScheduler

in this example for relaxed mode: totalSlices extended column is required to be of type long, but in the parsed string it has the value nonValidLongValue. releaseTime extended column has the same issue, the value nonValidDateTime can't be parsed as datetime. in this case, these two extended columns will get the value null while the other ones (like sliceNumber) still gets the correct values.

using kind = simple for the same query below gives null for all extended columns because it is strict on extended columns (that's the difference between relaxed and simple mode, in relaxed mode, extended columns can be matched partially).

print x=1  
| parse kind=relaxed "Event: NotifySliceRelease (resourceName=PipelineScheduler, totalSlices=nonValidLongValue, sliceNumber=23, lockTime=02/17/2016 08:40:01, releaseTime=nonValidDateTime, previousLockTime=02/17/2016 08:39:01)"       with * "resourceName=" resourceName ", totalSlices=" totalSlices:long * "sliceNumber=" sliceNumber:long * "lockTime=" lockTime ", releaseTime=" releaseTime:date "," * "previousLockTime=" previouLockTime:date ")" *
| project resourceName ,totalSlices , sliceNumber , lockTime , releaseTime , previouLockTime
resourceName totalSlices sliceNumber lockTime releaseTime previouLockTime
PipelineScheduler 23 02/17/2016 08:40:01 2016-02-17 08:39:01.0000000