question

SimonBernick-1368 avatar image
1 Vote"
SimonBernick-1368 asked Lz-3068 commented

Covid-19 Vaccination Effort How to use table.buffer PLEASE HELP

I am working on the Covid-19 vaccination effort and I could really use some help as I do not normally write code. I am using power query to load data from a set of CSV files, but the data refreshes are taking too long. I have turned off privacy settings, stopped background refresh, and combined all my queries into one query (below) to speed things up, but its still taking 4 minutes to load 600k lines (500MB of CSV data).

I want to use table.buffer to speed this up, but I can not figure out where to put it. If I try putting it in front of the source like this table.buffer(Source = Folder.Files("T:\Restricted\2_Vaccination_Project\Reporting\Simon\Power Pivot Import")) then I get an error message "Token Equal Expected".

Please tell me where I am supposed to put table.buffer (please be explicit, I have not used power query before today)!


 let
 WithAddedKey =
 Table.AddKey(
 #"Names",
 {"User Name"},
 true),
 Source = Folder.Files("T:\Restricted\2_Vaccination_Project\Reporting\Simon\Power Pivot Import"),
 #"Filtered Rows" = Table.SelectRows(Source, each Text.StartsWith([Extension], ".csv")),
 #"Filtered Hidden Files2" = Table.SelectRows(#"Filtered Rows", each [Attributes]?[Hidden]? <> true),
 #"Invoke Custom Function2" = Table.AddColumn(#"Filtered Hidden Files2", "Transform File from Grab&Clean Data", each #"Transform File from Grab&Clean Data"([Content])),
 #"Renamed Columns2" = Table.RenameColumns(#"Invoke Custom Function2", {"Name", "Source.Name"}),
 #"Removed Other Columns2" = Table.SelectColumns(#"Renamed Columns2", {"Source.Name", "Transform File from Grab&Clean Data"}),
 #"Expanded Table Column2" = Table.ExpandTableColumn(#"Removed Other Columns2", "Transform File from Grab&Clean Data", Table.ColumnNames(#"Transform File from Grab&Clean Data"(#"Sample File (2)"))),
 #"Removed Columns" = Table.RemoveColumns(#"Expanded Table Column2",{"Team", "Source.Name", "Invitee Name", "Invitee Time Zone", "Invitee accepted marketing emails", "Event Type Name", "Location", "Canceled By", "Cancellation reason", "Question 1", "Question 2", "Question 6", "Response 6", "Question 7", "Response 7", "UTM Campaign", "UTM Source", "UTM Medium", "UTM Term", "UTM Content", "Salesforce UUID", "Event Price", "Payment Currency", "Guest Email(s)", "Response 2", "Question 3", "Response 3", "Question 4", "Response 4", "Question 5", "Response 5"}),
 #"Split Start Date/Time" = Table.SplitColumn(Table.TransformColumnTypes(#"Removed Columns", {<!-- -->{"Start Date & Time", type text}}, "en-US"), "Start Date & Time", Splitter.SplitTextByEachDelimiter({" "}, QuoteStyle.Csv, false), {"Start Date & Time.1", "Start Date & Time.2"}),
 #"Changed Type1" = Table.TransformColumnTypes(#"Split Start Date/Time",{<!-- -->{"Start Date & Time.1", type date}, {"Start Date & Time.2", type time}}),
 #"Split Created Date Time" = Table.SplitColumn(Table.TransformColumnTypes(#"Changed Type1", {<!-- -->{"Event Created Date & Time", type text}}, "en-US"), "Event Created Date & Time", Splitter.SplitTextByEachDelimiter({" "}, QuoteStyle.Csv, false), {"Event Created Date & Time.1", "Event Created Date & Time.2"}),
 #"Changed Type2" = Table.TransformColumnTypes(#"Split Created Date Time",{<!-- -->{"Event Created Date & Time.1", type date}, {"Event Created Date & Time.2", type time}}),
 #"Renamed Columns" = Table.RenameColumns(#"Changed Type2",{<!-- -->{"Response 1", "Phone 2"}, {"Start Date & Time.1", "Start Date"}, {"Start Date & Time.2", "Start Time"}, {"Event Created Date & Time.1", "Created Date"}, {"Event Created Date & Time.2", "Created Time"}}),
 #"Removed Duplicates" = Table.Distinct(#"Renamed Columns"),
 #"Merged Queries" = Table.NestedJoin(#"Removed Duplicates", {"User Name"}, Names, {"User Name"}, "Names", JoinKind.LeftOuter),
 #"Expanded Names" = Table.ExpandTableColumn(#"Merged Queries", "Names", {"Clean Name"}, {"Clean Name"}),
 #"Remove Unclean Name" = Table.RemoveColumns(#"Expanded Names",{"User Name"}),
 #"Reordered Columns" = Table.ReorderColumns(#"Remove Unclean Name",{"Clean Name", "Invitee First Name", "Invitee Last Name", "Invitee Email", "Text Reminder Number", "Phone 2", "Start Date", "Start Time", "End Date & Time", "Created Date", "Created Time", "Canceled"}),
 #"Uppercased Text" = Table.TransformColumns(#"Reordered Columns",{<!-- -->{"Canceled", Text.Upper, type text}}),
 #"Filtered Rows1" = Table.SelectRows(#"Uppercased Text", each [Canceled] = "FALSE"),
 #"Filtered Rows2" = Table.SelectRows(#"Filtered Rows1", each [Clean Name] <> "Test"),
 #"Grouped Rows" = Table.Group(#"Filtered Rows2", {"Clean Name", "Start Date", "Created Date"}, {<!-- -->{"Count", each Table.RowCount(_), type number}})
 in
 #"Grouped Rows"
power-query-not-supported
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Lz-3068 avatar image
0 Votes"
Lz-3068 answered

Hi @SimonBernick-1368

To fix error "Token Equal Expected" you get with: table.buffer(Source = Folder.Files("T:\Restricted\2_Vaccination_Project\Reporting\Simon\Power Pivot Import")), do:

 Source = Table.Buffer(Folder.Files("T:\Restricted\2_Vaccination_Project\Reporting\Simon\Power Pivot Import")),

Couple of points:
- Not sure it makes sense to buffer Folder.Files(...) as you later filter on csv [Extension]...
- You can remove the 1st step (WithAddedKey=...) as it's not ref. anywhere in your query => it's not computed
- No idea what "disk" T: really is, but if it's a mapped network drive this impacts the time to read your CSVs
- If sub-folder 'Power Pivot Import' would only contain .csv files you would not need to filter for csv [Extension]

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

SimonBernick-1368 avatar image
0 Votes"
SimonBernick-1368 answered

It looks like buffering the source won't help me because its a folder not a file. How do I get it to buffer each file in the folder, not the folder itself? Also I tried as you suggested, buffering a few different steps in the query (see below for an example) but when I did that the query would never finish refreshing (it loaded all the data, then just sat there doing nothing).


thebufferedTable = table.buffer(#"Invoke Custom Function2")

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Lz-3068 avatar image
0 Votes"
Lz-3068 answered Lz-3068 edited

@SimonBernick-1368

Buffering can be counterproductive (might be what you experienced). I would recommend you read Power Query Memory Usage, Dataflow Container Size And Refresh Performance

I did not examine your query in details, however noticed a few things. You obvioulsy use the From Folder wizard. Although this is a very good "tool" it's generic and tries to cover most scenario. Consequently some of the auto. generated steps might perfectly fit in your scenario, just an example with your query:
Step #"Expanded Table Column2" expand ALL the columns in each table from [Transform File from Grab&Clean Data] and just after step #"Removed Columns" removes a bunch of columns that were unecessarily expanded

FYI, an example of the code I would use to combine .csv from a folder - without using the From Folder wizard - assuming I only need columns "UID","Order_No","Order_DateTime","Acc_ID","Product","Qty" and have to split "Order_DateTime" as "OrderDate" and "OrderTime"

 let
     Source = Folder.Files("D:\FolderForQuery"),
     RequiredColumns = Table.SelectColumns(Source,{"Content", "Name", "Extension"}),
     CsvExtension = Table.SelectRows(RequiredColumns, each ([Extension] = ".csv")),
     NoHiddenFiles = Table.SelectRows(CsvExtension, each [Attributes]?[Hidden]? <> true),
     RemovedExtension = Table.SelectColumns(NoHiddenFiles,{"Content", "Name"}),
     RenamedAsSourceName = Table.RenameColumns(RemovedExtension,{<!-- -->{"Name", "SourceName"}}),
     TableFromCsv = Table.AddColumn(RenamedAsSourceName, "TableFromCsv", each
         Table.PromoteHeaders(
             // Delimiter, Columns & Encoding might be different
             Csv.Document([Content],[Delimiter=";", Columns=15, Encoding=65001, QuoteStyle=QuoteStyle.None]),
             [PromoteAllScalars=true]
         ), type table
     ),
     RemovedContent = Table.SelectColumns(TableFromCsv,{"SourceName", "TableFromCsv"}),
     ExpandedRequiredColumns = Table.ExpandTableColumn(RemovedContent, "TableFromCsv",
         {"UID", "Order_No", "Order_DateTime", "Acc_ID", "Product", "Qty"}
     ),
     SplitOrderDateTime = Table.SplitColumn(ExpandedRequiredColumns, "Order_DateTime",
         Splitter.SplitTextByEachDelimiter({" "}, QuoteStyle.Csv, false), {"Order_Date", "Order_Time"}
     ),
     ChangedTypes = Table.TransformColumnTypes(SplitOrderDateTime,
         {
             {"UID", Int64.Type}, {"Order_No", Int64.Type}, {"Order_Date", type date},
             {"Order_Time", type time}, {"Acc_ID", Int64.Type}, {"Product", type text}, {"Qty", Int64.Type}
         }
     )
 in
     ChangedTypes


EDIT: Re. How do I get it to buffer each file in the folder? In the above query code (my understanding...):

 TableFromCsv = Table.AddColumn(RenamedAsSourceName, "TableFromCsv", each
     Table.PromoteHeaders(
         // Delimiter, Columns & Encoding might be different
         Csv.Document(
             Binary.Buffer([Content]),
             [Delimiter=";", Columns=15, Encoding=65001, QuoteStyle=QuoteStyle.None]),
         [PromoteAllScalars=true]
     ), type table
 )


5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

SimonBernick-1368 avatar image
0 Votes"
SimonBernick-1368 answered Lz-3068 commented

@Lz-3068 thanks, i simplified the expanding columns step like you suggested. However when I tried:

 #"Invoke Custom Function2" = Table.AddColumn(#"Filtered Hidden Files2", "Transform File from Grab&Clean Data", each #"Transform File from Grab&Clean Data"(binary.buffer([Content])))

the query would no longer refresh. It just stayed spinning after it loaded the data.



· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

@SimonBernick-1368

I don't think this is the issue otherwise this should raise an error. PQ (M) language is case-sensitive. I wrote Binary.Buffer and above I can read binary.buffer

As mentioned in the article I shared it is possible that buffering (Tables or Binaries) causes more problem than anything else in your scenario

0 Votes 0 ·