question

SuzanaEree-2102 avatar image
0 Votes"
SuzanaEree-2102 asked SuzanaEree-2102 commented

Powershell: Select and Delete everything that is framed in the comments html tags with some exceptions

I have this html code, that is which is placed in multiple pages on location c:\Folder2

And I want to select/delete everything that falls between the comments <!-- ARTICOL START --> and <!-- ARTICOL FINAL --> except all those <p class=..</p> lines. Can this be done with Powershell?

   <!-- ARTICOL START -->
    
 <div align="justify">
         <table width="682" border="0">
           <tr>
             <td><h1 class="den_articol" itemprop="sfe">My text here</h1></td>
           </tr>
           <tr>
             <td class="text_dreapta">On Ianuarie 14, 2014, in <a href="https://neculaifantanaru.com/en/qualities-of-a-leader.html" title="See al articles from  Qualities of a leader" class="external" rel="category tag">Qualities of a leader</a>, by Author</td>
           </tr>
         </table>
       <h2 class="text_obisnuit2"><img src="index_files/sfa.jpg" width="718" height="605" id="sfs" usemap="#m_dgrnt" alt="hip" /><map name="tfAbonament" id="m_34">
 <area shape="rect" coords="259,545,457,582" href="#plata" alt="" />
 </map></h2>
         <p class="den_articol">Why this text text?</p>
 <p class="text_obisnuit">test text text</p>
         <p class="text_obisnuit">test text text</p>
   <p class="text_obisnuit2">test text text</p>
     </div>
     <p align="justify" class="text_obisnuit style3">&nbsp;</p>
       
        <!-- ARTICOL FINAL -->

The output should be:

        <!-- ARTICOL START -->
    
         <p class="den_articol">Why this text text?</p>
 <p class="text_obisnuit">test text text</p>
         <p class="text_obisnuit">test text text</p>
   <p class="text_obisnuit2">test text text</p>
       
        <!-- ARTICOL FINAL -->
windows-server-powershellpower-query-not-supported
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

IanXue-MSFT avatar image
0 Votes"
IanXue-MSFT answered SuzanaEree-2102 commented

Hi,

Please see if this works.

 $sourcedir = "C:\Folder2\"
 $resultsdir = "C:\Output\"
 Get-ChildItem -Path $sourcedir -Filter *.html | ForEach-Object{
     $output=@()
     $content = Get-Content -Path $_.FullName
     $start = $content | Where-Object {$_ -match '<!-- ARTICOL START -->'} 
     $final = $content | Where-Object {$_ -match '<!-- ARTICOL FINAL -->'} 
     for($i=0;$i -lt $content.Count;$i++){
         if(($i -gt $content.IndexOf($start)) -and ($i -lt $content.IndexOf($final))){
             if($content[$i] -notmatch '<p class='){
                 continue
             }
         }
         $output += $content[$i]
     }
     $output | Out-File -FilePath $resultsdir\$($_.name)
 }

Best Regards,
Ian Xue
============================================
If the Answer is helpful, please click "Accept Answer" and upvote it.
Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

super answer, thanks a lot @IanXue-MSFT

0 Votes 0 ·
IanXue-MSFT avatar image
0 Votes"
IanXue-MSFT answered SuzanaEree-2102 edited

Hi,

You can try something like below.

 $file = 'c:\Folder2\file.html'
 $output=@()
 $content = Get-Content -Path $file
 $start = $content | Where-Object {$_ -match '<!-- ARTICOL START -->'} 
 $final = $content | Where-Object {$_ -match '<!-- ARTICOL FINAL -->'} 
 for($i=0;$i -lt $content.Count;$i++){
     if(($i -gt $content.IndexOf($start)) -and ($i -lt $content.IndexOf($final))){
         if($content[$i] -notmatch '<p class='){
             continue
         }
     }
     $output += $content[$i]
 }
 $output

Best Regards,
Ian Xue
============================================
If the Answer is helpful, please click "Accept Answer" and upvote it.
Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

· 6
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

hello @IanXue-MSFT Thanks. I test your poweshell code, but nothing happen when I run it.

0 Votes 0 ·

It gives the output as expected on my side.

96269-image.png


0 Votes 0 ·
image.png (26.3 KiB)

yes sir, the code runs very good. But nothing changes in the html file.

I need to change all my html files in that folder, not only to run in the output in PowerShell :)

0 Votes 0 ·
Show more comments