Download html which also have hyperlinks which points to other html to download as well and go deep to download

Question

Hi there,
How would you do this?
Point to a start HTML main webpage, download this main and then retrieve all the links and download the sub html pages and then all the subpages retrieved do the same thing and get the links and then also download this.
It's a recursive procedure call that will get all the pages regardless of how deep the links to other pages.
Is there a way you can do this in c# ?

thanks.

Accepted Answer

Hi @nellie ,

According to your description, I think it can be implemented in C#.

First, you can use WebClient to download html resources.

using System.Net;  
  
using (WebClient client = new WebClient ()) // WebClient class inherits IDisposable  
{  
    client.DownloadFile("http://yoursite.com/page.html", @"C:\localfile.html");  
  
    // Or you can get the file content without saving it  
    string htmlCode = client.DownloadString("http://yoursite.com/page.html");  
}

And then use Html Agility Pack to traverse all tags in the resource, and then filter to obtain downloadable hyperlink addresses. But there may be other problems, so you need to do some exception handling.

public static int i = 1;  
    public static void downloadRes(string url)  
    {  
        using (WebClient client = new WebClient())  
        {  
            client.DownloadFile(url, "D:\localfile" + i++ + ".html");  
            HtmlWeb hw = new HtmlWeb();  
            HtmlDocument doc = hw.Load(url);  
            foreach (HtmlNode link in doc.DocumentNode.SelectNodes("//a[@href]"))  
            {  
                string href = link.Attributes["href"].Value.ToString();  
                if (href.StartsWith("https"))  
                {  
                    downloadRes(href);  
                }  
            }  
        }  
    }

Hope this can help you.

Best regards,
Xudong Peng

If the answer is helpful, please click "Accept Answer" and upvote it.

Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

Share via

Download html which also have hyperlinks which points to other html to download as well and go deep to download

0 additional answers