When you're previewing the raw file, PQ only needs to read the first few hundred rows in order to render the preview. However, as soon as you do an operation like Sort, PQ now needs to read all 36 million rows in order to display the first few hundred rows, since it's possible that the first few hundred sorted rows are waaaay down at the end of the list.
The same is true for certain other operations. They require reading the full data source, and thus will be slow.
However, some operations (such as Filter) are different. They can operate over the data in what's called a streaming manner, reading only as many rows as are needed to populate the first few hundred rows of the preview. If I filter for rows where the customer is "Contoso", PQ only needs to iterate over the rows until it's found a few hundred rows that match the filter; the rest of the file can be ignored.
Suggestions:
- Always do streaming operations up front, when possible, leaving non-streaming operations until later.
- When creating your queries, consider first doing a "Keep First Rows" operation and limiting the number of rows you're working against. Then, once you've added all the steps you need, remove the "Keep First Rows" step.