Folder source processing

DWN 6 Reputation points
2021-03-29T11:05:34.08+00:00

Greetings!

I'm new to Power BI and am regretting not diving in earlier, but loving it.

This question is based on an amazingly informative article I found at https://learn.microsoft.com/en-us/power-query/custom-function?source=docs

However, there seems to be an "elephant in the room" regarding the methodology of processing files using the folder source in power Query. I'm hoping someone can show me it's not really there. How would you pass context specific attributes to a function? File name is one very important example, not to mention size, date, etc.! When processing files, file name is most often the only attribute you have to identify the source of the data and adjust downflow logic accordingly. The methodology presented results in the file content being presented to a function in a parameter, but all of the file specific metadata seems to be unavailable down-stream.

I've run into the same problem trying to use linked queries. When creating a linked query from the converted table column in the folder source query, the key being used by the referencing query is the row ID! This doesn't work because if you are modeling the the downstream query based on the expected filename (or some part of it), that row ID will change later if another file comes in at a later time, changing the row IDs!

But maybe I'm approaching this wrong. I'm trying to normalize the data in files from different sources that all have varying schema. Business rules based on parts of the the file name control what column names to expect and how to use them.

I hope I'm just missing something basic due to my current knowledge level and would appreciate any information you can provide.

Thanks and stay safe!

Not Monitored
Not Monitored
Tag not monitored by Microsoft.
35,965 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Ehren (MSFT) 1,781 Reputation points Microsoft Employee
    2021-04-09T21:30:13.837+00:00

    Hi there. Let's start with a simplified example. Let's say we have the following M, which defines a custom function and also invokes it.

    let
        f = (x) => x + 1
    in
        f(2)
    

    If you wanted to add a parameter to the function, you could modify the code like so:

    let
        f = (x, y) => x + y
    in
        f(1, 2)
    

    In terms of the queries that get generated as part of combining files from a folder, doing this is a bit more tricky. Here are the steps:

    1. Create a parameter that represents the additional info you want to pass downstream (for example, FileName)
    2. Set the parameter's value to the name of the sample file
    3. Use the new parameter in your "Transform Sample File" query (for example, add a custom column that contains the FileName)
    4. The "Transform File" function will now contain an additional parameter. This will cause downstream errors because the invocations of "Transform File" will not be passing this new parameter.
    5. Update the invocations of "Transform File" to provide this parameter. For example, when invoking the function against a folder of files, simply pass [Name] in addition to [Content]. When invoking the function against the "Sample File", pass FileName.

    Hopefully this helps. Let us know if you have further questions.