question

RobS-3576 avatar image
0 Votes"
RobS-3576 asked RobS-3576 commented

Power Query super slow after sorting and filtering?

I am really having trouble with the slow speed of Power Query after I make some necessary transformations. First, I am using

Microsoft 365 MSO (16.0.14326.20384) 64-bit
Edition Windows 10 Pro
Version 21H1
Installed on ‎7/‎27/‎2021
OS build 19043.1237

Computer has 64GB of RAM.

I am working with a large .csv dataset that has 36 million rows. What I have noticed is the following - if I leave it totally unfiltered or unsorted, I can make transformations quickly, often very quickly. For example, if I add an index column, it takes about 1 second. But as soon as I filter the data or sort it, the same index column takes many minutes, so many that I have never let it finish. A more complex custom column using M Code took over and hour before I stopped it These transformations are necessary, particularly the sorting, so I can't leave them out. Btw, I have similar problems with smaller datasets, though not as dramatic, as every step after sorting and filtering is much slower.

Is there a reason why the filtering and sorting slows everything down so much? And can anything be done to improve the speed?

Thank you!

power-query-not-supported
· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Going forward, please post questions about Power Query here:
https://aka.ms/PQCommunity

0 Votes 0 ·

1 Answer

Ehren avatar image
2 Votes"
Ehren answered RobS-3576 commented

When you're previewing the raw file, PQ only needs to read the first few hundred rows in order to render the preview. However, as soon as you do an operation like Sort, PQ now needs to read all 36 million rows in order to display the first few hundred rows, since it's possible that the first few hundred sorted rows are waaaay down at the end of the list.

The same is true for certain other operations. They require reading the full data source, and thus will be slow.

However, some operations (such as Filter) are different. They can operate over the data in what's called a streaming manner, reading only as many rows as are needed to populate the first few hundred rows of the preview. If I filter for rows where the customer is "Contoso", PQ only needs to iterate over the rows until it's found a few hundred rows that match the filter; the rest of the file can be ignored.

Suggestions:

  • Always do streaming operations up front, when possible, leaving non-streaming operations until later.

  • When creating your queries, consider first doing a "Keep First Rows" operation and limiting the number of rows you're working against. Then, once you've added all the steps you need, remove the "Keep First Rows" step.


· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

That is very helpful - thank you!

0 Votes 0 ·