question

SuzanaEree-2102 avatar image
0 Votes"
SuzanaEree-2102 asked ·

How to run PowerShell, as a Batch Process file txt/html with REGEX (search and replace)?

hello. I have more regex to run on multiple html files from Folder1. I must run more REGEX with search and replace, for example:

SEARCH: (?-s)(".+?") REPLACE BY: $0
SEARCH: (^.*?)=(.*$) Replace by: \1\r\n\2
SEARCH: ^.(.*)$ REPLACE BY: \1

I mage a PowerShellp script, I add those 3 regex search and replace formulas, but is not working. Can anyone help me?


 $sourceFiles = Get-ChildItem 'c:\Folder1'  
 $destinationFolder = 'c:\Folder1'
 foreach ($file in $sourceFiles) {
 $sourceContent = Get-Content $file.FullName -Raw

 $contentToInsert = [regex]::match($sourceContent,"(?-s)(".+?")").value
 $destinationContent = Get-Content $destinationFolder\$($file.Name) -Raw
 $destinationContent = $destinationContent -replace '$0',$contentToInsert

 $contentToInsert = [regex]::match($sourceContent,"(^.*?)=(.*$)").value
 $destinationContent = Get-Content $destinationFolder\$($file.Name) -Raw
 $destinationContent = $destinationContent -replace '\1\r\n\2',$contentToInsert

 $contentToInsert = [regex]::match($sourceContent,"^.(.*)$").value
 $destinationContent = Get-Content $destinationFolder\$($file.Name) -Raw
 $destinationContent = $destinationContent -replace '\1',$contentToInsert

 Set-Content -Path $destinationFolder\$($file.Name) -Value $destinationContent -Encoding UTF8
 } #end foreach file

windows-server-powershell
10 |1000 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

SuzanaEree-2102 avatar image
0 Votes"
SuzanaEree-2102 answered ·

got it. This is the solution:

 $path = 'c:\Folder1\file1.html'
 $result = 'c:\Folder1\result.html'
 Get-Content -Path $path | ForEach-Object{ 
     $one = $_ -replace '(?<=<li>)\s+','CARPET' #replace First Regex with the word CARPET
     $two = $one -replace 'CARPET','DOOR' #replace the word CARPET with DOOR
     ($three = $two -replace 'DOOR','BEAUTIFUL') | Out-File -FilePath $result -Append #replace the word DOOR with BEAUTIFUL
     "Final = $three"
 }


·
10 |1000 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

RichMatheisen-8856 avatar image
0 Votes"
RichMatheisen-8856 answered ·

Where is $destinationcontent? I don't see it defined anywhere -- or assigned any value!

Your regex on line #6 doesn't work. It won't even run! You have a double-quoted string that contains double-quotes. Change the opening and closing double-quotes to apostrophes (single-quotes).

The regex on line #9 looks suspect. It will match "shortest possible nothing" at the beginning of the line, followed by and equal sign, followed (possibly) by nothing. IOW it will happily match "=abc" or even "=". If you want "something" to be there change move the "^" outside the 1st group and change the pattern in the 1st group to ".+?". If you expect to always find "something" after the "=" change the 2nd pattern to ".+?$".

The regex on line #12 looks like you intend to remember the 2nd thru last characters on the line. But the pattern in the group will also match nothing. Is that what you intended?

· 1 ·
10 |1000 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

good day, sir @RichMatheisen-8856 I modify and add the line missing with $destinationcontent

Anyway, the regex are good, but seems to work only in notepad++, I believe those particular regex expressions are not working in windows PowerShell.

The problem are not the regex itself, but the powershell code. I need to run more regex at once, for search and replace. So you can put any regex there, the important thing is for the powershell code to work good.

0 Votes 0 ·
RichMatheisen-8856 avatar image
0 Votes"
RichMatheisen-8856 answered ·

The only regex that won't actually work is the one on line #6. It's an easy fix to make it work, though. The other regex may produce what you want, but they also have the possibility of producing results you don't want. The choice of fixing them is, however, up to you.

Why are you constantly reloading the $destinationcontent variable? You're never saving the result of modifying it, except for the very last modification. Shouldn't you get the contents of each file (which you do on line #4), run each regex, and then save the accumulated modifications???

I see that you're loading the contents of $sourceContent using Get-Content with the "-Raw" switch. In your 1st regex you explicitly turn off "single line mode". Is it your intent to match only the data on the 1st line? Turning off single-line mode means that the "." matches any character except "\n".

You also use "[regex]::match" in all your code. Are you only interested in finding the 1st match? What if there are more?

·
10 |1000 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

SuzanaEree-2102 avatar image
0 Votes"
SuzanaEree-2102 answered ·

@RichMatheisen-8856
a I don't know PowerShell to good, I am a beginner. So, I update an old code to my expectations.

Basically, I have more Regex to make Search and Replace, in the same folder, for several html files. I want to run all at once, by order. For example (but it can be any regex)

  1. Search: (?-s)(".+?") Replace by: Anything

  2. Search: (^.*?)=(.*$) Replace by: \1

  3. Search: ^.(.*)$ Replace by: \1

So, this is what I aimed to do with powershell. I want to integrate more regex S/R as to modify several files.

Maybe my PowerShell code is not very good, at least I tried a variant.

Can you, or anybody else, to make another good PowerShell code, or to update mine?



· 1 ·
10 |1000 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Can you provide a small sample of a file that contains the data to operate on, and another file containing what the data should look like at the end of the script you posted?

I know you say that the regexes all work, but I don't think the backreferences (e.g. "\1") and special variables (e.g., "$0") work the way you expect them to in the context in which you're using them.

0 Votes 0 ·
SuzanaEree-2102 avatar image
0 Votes"
SuzanaEree-2102 answered ·

The text from file.html

<ul id="sidebarNavigation">
<li><a href="https://mywebsite.com/page-1.html" title="Page 1">Page 1 (34)</a></li>
<li> <a href="https://mywebsite.com/page-2.html" title="Page 2">Page 2 (29)</a></li>
<li><a href="https://mywebsite.com/page-3.html" title="Page-3">Page 3 (11)</a></li>
</ul>

Next. I have to run 2 regex on this example (but can be much more regex). Those must be run in this order:

First regex: (this will delete all the space after <li> )

SEARCH: (?<=<li>)\K\h*

Replace: ( leave empty)

Second regex: (this will add an empty space at the beginning at every line and one space at the end of each line

SEARCH: (^\h*$)|(^)|((?<!")$)

SEARCH: \x20


·
10 |1000 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

IanXue-MSFT avatar image
0 Votes"
IanXue-MSFT answered ·

Hi @SuzanaEree-2102 ,

The two regexes don't work as neither '\K' nor '\h' is valid in powershell. To match the space you can use '\s' or the space character ' '.

 $path = 'D:\temp\file.html'
 $result = 'D:\temp\result.html'
 Get-Content -Path $path | ForEach-Object{ 
     ' ' + ($_ -replace '(?<=<li>)\s+','') + ' ' | Out-File -FilePath $result -Append     
     }

Best Regards,
Ian Xue
============================================
If the Answer is helpful, please click "Accept Answer" and upvote it.
Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

·
10 |1000 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

SuzanaEree-2102 avatar image
0 Votes"
SuzanaEree-2102 answered ·

thanks, @IanXue-MSFT works, but as to understand much better the Batch process (multiple actions, one by one in order), I will give another example, with 4 simple regex that must run in order I give:

SEARCH: (?<=<li>)\s+ REPLACE BY: BEBE
SEARCH: BEBE REPLACE BY: OANA
SEARCH: OANA REPLACE BY: \s
SEARCH: <li>\s REPLACE BY: '' (by nothing, as to remove the space)

 $path = 'c:\Folder1\file1.html'
  $result = 'c:\Folder1\result.html'
  Get-Content -Path $path | ForEach-Object{ 
     ($_ -replace '(?<=<li>)\s+','BEBE') | Out-File -FilePath $result -Append  
     ($_ -replace 'BEBE','OANA')  | Out-File -FilePath $result -Append   
     ($_ -replace 'OANA','\s')  | Out-File -FilePath $result -Append 
     ($_ -replace '<li>\s','')  | Out-File -FilePath $result -Append 
          }

It is not working. What did i do wrong?

·
10 |1000 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

RichMatheisen-8856 avatar image
0 Votes"
RichMatheisen-8856 answered ·

The output of each replace operation is the string resulting from the replace. If we're to assume that the 1st regex put BEBE into the string you'd see that in output file. But for the 2nd, 3rd, and 4th regex you're operating on the original string (i.e. the on from your input file), not the resulting string from the 1sr regex. Because of that you won't find BEBE or OANA -- so the string from the file is sent to your output file. The 4th regex should work (if the 1st one did) so it should have erased the "<li> " and written the result to the output file.

If you want to see the progression of changes you'd have to operate on the result of the previous replace operation, not the original string.

Like this:

 $one = "XXXX"
 ($two = $one -replace "XX$","xx") | Out-String
 ($three = $two -replace "^XX","aa") | Out-String
 ($four = $three -replace "ax","BB") | Out-String
 "Final = $four"
·
10 |1000 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

SuzanaEree-2102 avatar image
0 Votes"
SuzanaEree-2102 answered ·

thanks, @RichMatheisen-8856 can you please write the entire code, also with $Path, $Result, Get content..etc, so as to ACCEPT it as a complete answer ?

also, if you can give some comments for other people that will view this post. For example *** XX will replace AA *** , *** ax will be replace by BB ***

·
10 |1000 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

SuzanaEree-2102 avatar image
0 Votes"
SuzanaEree-2102 answered ·

Ok, so I have this lines on File1.html

<ul id="sidebarNavigation">
<li><a href="https://mywebsite.com/page-1.html"; title="Page 1">Page 1 (34)</a></li>
<li> <a href="https://mywebsite.com/page-2.html"; title="Page 2">Page 2 (29)</a></li>
<li><a href="https://mywebsite.com/page-3.html"; title="Page-3">Page 3 (11)</a></li>
</ul>

@RichMatheisen-8856 I made an update on your last code, just for using 2 regex formulas

 $path = 'c:\Folder1\file1.html'
 $result = 'c:\Folder1\result.html'
 Get-Content -Path $path | ForEach-Object{ 
        $one = ($_ -replace '(?<=<li>)\s+','CARPET')   | Out-File -FilePath $result -Append    
        ($two = $one -replace 'CARPET','DOOR')   | Out-File -FilePath $result -Append 
 'Final = $two'
      }

The PowerShell code seems to be good, but the problem is that the second regex, on the $two doesn't make the replacement from CARPET to DOOR. I don't know why.

· 1 ·
10 |1000 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Probably the misplaced left parenthesis on line #4. The way you coded it it's the value returned from the Out-File that's placed into the variable $one instead of the intended result of the replacement.

Also, on line #6 you'll have to use double-quotes if you want interpolation to happen within the string.

 $path = "C:\junk\eree-2.txt"
 $result = "C:\junk\eree-rep.txt"
 Get-Content -Path $path | ForEach-Object{
     ($one = $_ -replace '(?<=<li>)\s+','CARPET')   | Out-File -FilePath $result -Append    
     ($two = $one -replace 'CARPET','DOOR') | Out-File -FilePath $result -Append
     # just a separator to make the output easier to read
     " "| Out-File -FilePath $result -Append
 }

The 3rd line of you input is the only one the regex's have any effect on.

0 Votes 0 ·