question

ClearScreen-0216 avatar image
0 Votes"
ClearScreen-0216 asked Lz-3068 commented

Delete middle initial from name

I have a Power Query/M question. I have a list of full names, some of which have middle initials (without a period) and some of which do not. Names (and the possible initial) are delimited by spaces. Some of the names have multiple first or last names which I would want to keep. So you might have a mix of names like

John Smith
John Q Smith
Maggie Q Smith
Anne Margaret Smith
Ann Margaret Q Smith
Anne Margaret Nixon Smith

I would like to return all the names but not the initials. In other languages I would use regex but I cannot do so in PQ, and I'm just beginning to learn M. Has anyone come across something similar?

power-query-not-supported
· 4
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi @ClearScreen-0216. Clarification required. Does this reflect what you expect?

110267-demo.png


0 Votes 0 ·
demo.png (36.5 KiB)

Yes, though I would merge the columns in the end.

0 Votes 0 ·

@ClearScreen-0216

IMHO only your 1st example is no problem, all others are "ambiguous": John Q could be FirstName, Q Simth could be LastName...

I really don't see which logic could make it (consistently). If you do let me know and I'll look at implementing it in PQ

0 Votes 0 ·
Show more comments
Lz-3068 avatar image
1 Vote"
Lz-3068 answered ClearScreen-0216 commented

@ClearScreen-0216. You're welcome & Thanks for posting back
If that did it, the following does the same with List.Accumulate

 let
     Source = Table.FromList(
         {"John Smith","John Q Smith","Maggie Q Smith","Anne Margaret Smith",
          "Anne Margaret Q Smith","Anne Margaret Y Nixon Smith"}, null,
         type table [Full Name = text]
     ),
     RemovedInitial = Table.AddColumn(Source, "Clean Full Name", each
         List.Accumulate(Text.Split([Full Name], " "), null,
             (state,current)=> if Text.Length(current) > 1
                 then Text.Combine({state, current}, " ") else state
         ),
         type text
     ),
     RemovedColumn = Table.SelectColumns(RemovedInitial,{"Clean Full Name"}),
     RenamedColumn = Table.RenameColumns(RemovedColumn,{<!-- -->{"Clean Full Name", "Full Name"}})
 in
     RenamedColumn


A couple of articles re. List.Accumulate:
- Power Query List.Accumulate – Unleashed
- BIN2DEC in Power Query: Implementations of List.Accumulate. Part 1


· 3
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

I like that version, it seems more elegant. Here is my custom function based on it.

 (FullName as text) as text =>
 let
    SplittedByDelim = Text.Split(FullName, " "),
     RemovedInitial = List.Accumulate(SplittedByDelim, null,
         (state, current) => if Text.Length(current) > 1 then Text.Combine({state, current}, " ") else state)
 in
     RemovedInitial
0 Votes 0 ·

@ClearScreen-0216
(As with Excel) There's not necessarily a correlation between the conciseness or elegance of a formula/function and its efficiency...

Thanks for posting your function (if you want to shorten it let me know) => Can help others :)

0 Votes 0 ·

Of course, and I haven't done any speed testing. I updated the function slightly so that it will not delete the initial if it is the first character of the string (eg, "A John Smith"). If you see any problems with the function, let me know.

 (FullName as text) as text =>
 let
     SplittedByDelim = Text.Split(FullName, " "),
     RemovedInitial = List.Accumulate(List.Skip(SplittedByDelim,1), SplittedByDelim{0},
         (state, current) => if Text.Length(current) > 1 then Text.Combine({state, current}, " ") else state)
 in
     RemovedInitial
0 Votes 0 ·
Lz-3068 avatar image
1 Vote"
Lz-3068 answered ClearScreen-0216 commented

@ClearScreen-0216

OK, so the following does it assuming you don't have a Full Name like "A John Smith" where A isn't in the middle but will be remove anyway. If you need to keep it as "A John Smith" in this case let me know.

As you begin with PQ I decomposed the steps with (hopefully) clear step names inside Table.AddColumn


 let
 // Table for demo.
     Source = Table.FromList(
         {"John Smith","John Q Smith","Maggie Q Smith","Anne Margaret Smith",
          "Anne Margaret Q Smith","Anne Margaret Y Nixon Smith"}, null,
         type table [Full Name = text]
     ),
 //
     RemovedInitial = Table.AddColumn(Source, "Clean Full Name", each
         let
             SplittedByDelim = Text.Split([Full Name], " "),
             SelectedByLength = List.Select(SplittedByDelim, each Text.Length(_) > 1),
             CombinedItems = Text.Combine(SelectedByLength, " ")
         in
             CombinedItems,
             type text
     ),
     RemovedColumn = Table.SelectColumns(RemovedInitial,{"Clean Full Name"}),
     RenamedColumns = Table.RenameColumns(RemovedColumn,{
                 {"Clean Full Name", "Full Name"}})
 in
     RenamedColumns

Any question let me know

· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Thanks, that did the trick! Being new to M I wasn't quite sure how to (essentially) nest "each loops" so this is a good way to learn. I altered it somewhat to generate a new column so I could compare the two and with 248 names (56 of them unique) it seems to work just fine. I'll convert it to a custom function at some point.

Thanks again.

0 Votes 0 ·
Lz-3068 avatar image
0 Votes"
Lz-3068 answered Lz-3068 commented

@ClearScreen-0216. To shorten your 1st function

 (FullName as text) as text =>
 let
     SplittedByDelim = Text.Split(FullName, " ")
 in
     List.Accumulate(SplittedByDelim, null,
         (state, current) => if Text.Length(current) > 1
             then Text.Combine({state, current}, " ")
             else state
     )

OR, even shorter

 (FullName as text) as text =>
     List.Accumulate(Text.Split(FullName, " "), null,
         (state, current) => if Text.Length(current) > 1
             then Text.Combine({state, current}, " ")
             else state
     )

· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

@ClearScreen-0216
No comment on your updated function that won't delete the initial if it is the first character of the string (eg, "A John Smith"). Just be aware that List.Skip(xyz, 1) and List.Skip(xyz) does the same :)

FYI, the former PQ forum is a massive source of information. You can also search the PowerBI PQ community keeping in mind that PowerBI is usually ahead of Excel PQ in term of functions/functionnalities
And if you want a couple of links to valuable Blogs, let me know

0 Votes 0 ·

@ClearScreen-0216. On reflection a comment

SplittedByDelim{0} does the same as List.First(SplittedByDelim), so far so good

If you've read the "doc" re. the Evaluation Model you probably know that Tables, Records and Lists are lazily evaluated. In your function the first item of SplittedByDelim has to be evaluated in any case - if not immediatly, at least during Text.Combine - so any of the above syntaxes makes no difference AFAIK

In other scenarios, List.First(SplittedByDelim) vs SplittedByDelim{0} might makes a difference in term of evaluation, although I'm not sure at all: What I don't know is if the PQ engine is good enough to "recognize" SplittedByDelim{0} as actually meaning List.First(SplittedByDelim) and consequently evaluating the first lazily OR if in any case SplittedByDelim{0} is eagerly evaluated. For that reason, instead of xyz{0} I always do List.First(xyz)

0 Votes 0 ·