question

JoeMontgomery-6930 avatar image
0 Votes"
JoeMontgomery-6930 asked YijingSun-MSFT commented

HTML Visualizer

I am trying to access content on a website page using:

 Uri uri = new Uri("http://www.xyz.com/abc");            // just a psuedo website
 HtmlWeb web = new HtmlWeb();
 HtmlAgilityPack.HtmlDocument doc = web.Load(uri);

Using the debugger, I look at doc.Text with the HTML visualizer and get a window saying:

'your web browser has restricted this file from showing active content .... Click here for options'

I click there & get a drop down menu saying 'allow blocked content'. I click & get a message saying:

'Allowing script or ActiveX controls can be useful ....... Are you sure you want to let this file run active content'

I click Yes & then see the website page content in human readable form - all well & good.

But, I want to programmatically access the page content - specifically searching each line for a certain word.

Is this doable?? I don't have a clue if it is nor do I have the foggest idea of where to start.

Any help will be much appreciated -- TIA Joe


dotnet-aspnet-general
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

YijingSun-MSFT avatar image
0 Votes"
YijingSun-MSFT answered

Hi @JoeMontgomery-6930 ,
Your codes are load the doc.If you want to search,you could use "contain()" .You could refer to below articles:
https://stackoverflow.com/questions/846994/how-to-use-html-agility-pack
https://stackoverflow.com/questions/33834908/html-agility-pack-search-through-site-for-a-specified-string-of-words
Best regards,
Yijing Sun


If the answer is helpful, please click "Accept Answer" and upvote it.

Note: Please follow the steps in our  documentation  to enable e-mail notifications if you want to receive the related email notification for this thread.

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

JoeMontgomery-6930 avatar image
0 Votes"
JoeMontgomery-6930 answered YijingSun-MSFT commented

I read the links, but no help. I believe doc.Text is a script that needs to run to reveal the page contents.

Tried to upload doc.Text, but site will not allow for some stange reason. Neither could I attach a file
named Prime.txt


· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi @JoeMontgomery-6930 ,
You could load the html snippet txt file into memory and then turn this stream of html text into a queryable document.
More details,you could refer to below article:
https://articles.runtings.co.uk/2009/11/easily-extracting-links-from-snippet-of.html
Best regards,
Yijing Sun

0 Votes 0 ·