Transformation functions in Power Query for data wrangling
APPLIES TO: Azure Data Factory Azure Synapse Analytics
Data Wrangling in Azure Data Factory allows you to do code-free agile data preparation and wrangling at cloud scale by translating Power Query
M scripts into Data Flow script. ADF integrates with Power Query Online and makes Power Query
M functions available for data wrangling via Spark execution using the data flow Spark infrastructure.
Power Query in ADF is currently available in public preview
Currently not all Power Query M functions are supported for data wrangling despite being available during authoring. While building your mash-ups, you'll be prompted with the following error message if a function isn't supported:
UserQuery : Expression.Error: The transformation logic is not supported as it requires dynamic access to rows of data, which cannot be scaled out.
Below is a list of supported Power Query M functions.
- Selection: Table.SelectColumns
- Removal: Table.RemoveColumns
- Renaming: Table.RenameColumns, Table.PrefixColumns, Table.TransformColumnNames
- Reordering: Table.ReorderColumns
Use M function Table.SelectRows to filter on the following conditions:
- Equality and inequality
- Numeric, text, and date comparisons (but not DateTime)
- Numeric information such as Number.IsEven/Odd
- Text containment using Text.Contains, Text.StartsWith, or Text.EndsWith
- Date ranges including all the 'IsIn' Date functions)
- Combinations of these using and, or, or not conditions
Adding and Transforming Columns
- Numeric arithmetic
- Text concatenation
- Date and Time Arithmetic (Arithmetic operators, Date.AddDays, Date.AddMonths, Date.AddQuarters, Date.AddWeeks, Date.AddYears)
- Durations can be used for date and time arithmetic, but must be transformed into another type before written to a sink (Arithmetic operators, #duration, Duration.Days, Duration.Hours, Duration.Minutes, Duration.Seconds, Duration.TotalDays, Duration.TotalHours, Duration.TotalMinutes, Duration.TotalSeconds)
- Most standard, scientific, and trigonometric numeric functions (All functions under Operations, Rounding, and Trigonometry except Number.Factorial, Number.Permutations, and Number.Combinations)
- Replacement (Replacer.ReplaceText, Replacer.ReplaceValue, Text.Replace, Text.Remove)
- Positional text extraction (Text.PositionOf, Text.Length, Text.Start, Text.End, Text.Middle, Text.ReplaceRange, Text.RemoveRange)
- Basic text formatting (Text.Lower, Text.Upper, Text.Trim/Start/End, Text.PadStart/End, Text.Reverse)
- Date/Time Functions (Date.Day, Date.Month, Date.Year Time.Hour, Time.Minute, Time.Second, Date.DayOfWeek, Date.DayOfYear, Date.DaysInMonth)
- If expressions (but branches must have matching types)
- Row filters as a logical column
- Number, text, logical, date, and datetime constants
- Power Query will generate a nested join (Table.NestedJoin; users can also manually write Table.AddJoinColumn). Users must then expand the nested join column into a non-nested join (Table.ExpandTableColumn, not supported in any other context).
- The M function Table.Join can be written directly to avoid the need for an additional expansion step, but the user must ensure that there are no duplicate column names among the joined tables
- Supported Join Kinds: Inner, LeftOuter, RightOuter, FullOuter
- Both Value.Equals and Value.NullableEquals are supported as key equality comparers
Use Table.Group to aggregate values.
- Must be used with an aggregation function
- Supported aggregation functions: List.Sum, List.Count, List.Average, List.Min, List.Max, List.StandardDeviation, List.First, List.Last
Use Table.Sort to sort values.
Known unsupported functions
|Table.PromoteHeaders||Not supported. The same result can be achieved by setting "First row as header" in the dataset.|
|Table.CombineColumns||This is a common scenario that isn't directly supported but can be achieved by adding a new column that concatenates two given columns. For example, Table.AddColumn(RemoveEmailColumn, "Name", each [FirstName] & " " & [LastName])|
|Table.TransformColumnTypes||This is supported in most cases. The following scenarios are unsupported: transforming string to currency type, transforming string to time type, transforming string to Percentage type.|
|Table.NestedJoin||Just doing a join will result in a validation error. The columns must be expanded for it to work.|
|Table.Distinct||Remove duplicate rows isn't supported.|
|Table.RemoveLastN||Remove bottom rows isn't supported.|
|Table.RowCount||Not supported, but can be achieved by adding a custom column containing the value 1, then aggregating that column with List.Sum. Table.Group is supported.|
|Row level error handling||Row level error handling is currently not supported. For example, to filter out non-numeric values from a column, one approach would be to transform the text column to a number. Every cell which fails to transform will be in an error state and need to be filtered. This scenario isn't possible in scaled-out M.|
M script workarounds
SplitColumn there is an alternate for split by length and by position
- Table.AddColumn(Source, "First characters", each Text.Start([Email], 7), type text)
- Table.AddColumn(#"Inserted first characters", "Text range", each Text.Middle([Email], 4, 9), type text)
This option is accessible from the Extract option in the ribbon
- Table.AddColumn(RemoveEmailColumn, "Name", each [FirstName] & " " & [LastName])
Learn how to create a data wrangling Power Query in ADF.