Trying to get subcaptures with the following string and regular expression, I'm surprised by the result I get. Can someone explain me the result. Is the NFA engine doing bad job ?
I'm working under Powershell. So, I imagine with, using the Windows .NET framework.
My string :
"p1 p2 p3,' '' ', p4"
I put it between quotes at it includes several couples of apostrophes. One with 3 spaces between, the second, immediatly following with 2 spaces inside.
This string wholy match the following regular expression :
"p1 p2 p3,' '' ', p4" -match "^(\w*)\s+(\w+)(([^' ]*)|('[^']*')|\s?)*$"
results to :
Name Value ---- ----- 5 ' ' 4 3 2 p2 1 p1 0 p1 p2 p3,' '' ', p4
I would have expected much more. That is (not listing 1st level subgroup with the alternative 2nd level subgroups)
Why "p3,", "' '","' '" and "," are not captured ?
"p1 p2 p3,' '' ', p4" -match "^(\w*)\s+(\w+)(?:([^' ]*)|('[^']*')|\s?)*$"
so with a non capturing 1st level group, do not gives a better result.
It simply gives one less (empty) submatch. That is, expectedly, minus the 1st level subgroup I guess.
4 ' '
0 p1 p2 p3,' '' ', p4
Do I miss the equivalent of the "global" Ecmascript flag of regexp objects ?
However I get the same result using javacript.
So what's wrong ?
Thank for the help.