MSDN Code Gallery Download Report

In an attempt to divert my brain and as a break from SharePoint development, I wrote  a simple utility this weekend that generates a consolidated report of download counts for any resource housed at MSDN Code Gallery, including projects, releases, and files.  Here is a screenshot:

image

The user interface takes a resource (or project) name and adds it to a list of projects for which the download count report is generated.  Using a rich text control, the tool generates a simple report of the downloads for each element.

Here is a class diagram which illustrates the basics:

image

I used the Html Agility Pack for scraping the MSDN Code Gallery pages, which is what the tool relies on for generating the reports.  Here is a simple example of how to use the library:

    1: GalleryResources galleryResources = new GalleryResources(true);
    2:  
    3: if (galleryResources.IsValidResource("notarealresource"))
    4: {
    5:     galleryResources.RegisterResource("mschart");
    6: }
    7:  
    8: if (galleryResources.IsValidResource("mschart"))
    9: {
   10:     galleryResources.RegisterResource("mschart");    
   11: }
   12:  
   13: if (galleryResources.IsValidResource("mpFx"))
   14: {
   15:     galleryResources.RegisterResource("mpFx");
   16: }
   17:  
   18: foreach (GalleryResource galleryResource in galleryResources)
   19: {
   20:     Debug.WriteLine(galleryResource.Name);
   21:     foreach (Release release in galleryResource)
   22:     {
   23:         Debug.WriteLine(release.Name);
   24:         foreach (ReleaseFile file in release)
   25:         {
   26:             Debug.WriteLine(file.Name);
   27:             Debug.WriteLine(file.DownloadCount);
   28:         }
   29:     }
   30: }
   31:  
   32: galleryResources.Save();

The resources are persisted to a local store.  The boolean handed to the constructor simply instructs the constructors to attempt to load the currently registered resources from a file.  The call to Save on line 32 saves the registered resources back out to that file.

IsValidResource attempts to load the resource from Code Gallery.  It detects whether the resource exists by a simple algorithm which uses the HTML header tags to determine if the request was routed to the Code Gallery home page, which indicates the resource doesn’t exist:

    1: public bool IsValidResource(string name)
    2: {
    3:     try
    4:     {
    5:         HtmlWeb web = new HtmlWeb();
    6:  
    7:         string url = string.Format(Settings.Default.CodeGalleryResource, name);
    8:  
    9:         HtmlDocument document = web.Load(url);
   10:  
   11:         HtmlNodeCollection headers = document.DocumentNode.SelectNodes("//h1");
   12:  
   13:         foreach (HtmlNode node in headers)
   14:         {
   15:             if (node.InnerText.Equals(Settings.Default.CodeGallery))
   16:             {
   17:                 return false;
   18:             }
   19:         }
   20:  
   21:         return true;
   22:     }
   23:     catch (Exception exception)
   24:     {
   25:         Trace.WriteLine(exception.Message);
   26:     }
   27:  
   28:     return false;
   29: }

RegisterResource simply adds the resource to the list of resources for which download count reports will be generated:

    1: public void RegisterResource(string name)
    2: {
    3:     if (ContainsResource(name))
    4:     {
    5:         return;
    6:     }
    7:  
    8:     GalleryResource galleryResource = new GalleryResource(name);
    9:  
   10:     _GalleryResources.Add(galleryResource);
   11:  
   12: }
   13:  

The foreach loop is where the action happens.  Using Code Gallery’s RSS feeds:

image

I consume the Releases feed:

    1: private void LoadReleases()
    2: {
    3:     try
    4:     {
    5:         Releases = new List<Release>();
    6:  
    7:         using (XmlReader reader = XmlReader.Create(string.Format(Settings.Default.CodeGalleryRssFeed, Name)))
    8:         {
    9:             SyndicationFeed feed = SyndicationFeed.Load(reader);
   10:  
   11:             if (feed != null)
   12:             {
   13:                 foreach (SyndicationItem item in feed.Items)
   14:                 {
   15:                     string itemTitle = item.Title.Text;
   16:                     itemTitle = itemTitle.Replace("RELEASED:", string.Empty);
   17:                     itemTitle = itemTitle.Replace("CREATED RELEASE:", string.Empty);
   18:                     itemTitle = itemTitle.Replace("UPDATED RELEASE:", string.Empty);
   19:  
   20:                     bool found = false;
   21:  
   22:                     foreach (Release existingRelease in Releases)
   23:                     {
   24:                         if (existingRelease.Name.Equals(itemTitle))
   25:                         {
   26:                             found = true;
   27:                             break;
   28:                         }
   29:                     }
   30:  
   31:                     if (!found)
   32:                     {
   33:                         Debug.Assert(item.Links.Count == 1);
   34:                         Release release = new Release(itemTitle, item.Links[0].Uri.ToString());
   35:                         Releases.Add(release);
   36:                     }
   37:                 }
   38:             }
   39:         }
   40:     }
   41:     catch (Exception exception)
   42:     {
   43:         Trace.Write(exception.Message);
   44:     }
   45:  
   46: }
   47:  

I check for duplicates, stripping out the different variations that the feed returns.  When I find a unique release that doesn’t exist in the collection, I add it to the list of releases.  You see that on line 34.  When you access the Release object’s release files the lazy load does the work to scrape the screen:

    1: private void LoadReleaseFiles()
    2: {
    3:     Files = new List<ReleaseFile>();
    4:  
    5:     HtmlWeb htmlWeb = new HtmlWeb();
    6:     HtmlDocument htmlDocument = htmlWeb.Load(Link);
    7:  
    8:     HtmlNodeCollection nodes = htmlDocument.DocumentNode.SelectNodes("//div[@class='FileListItemDiv']");
    9:  
   10:     foreach (HtmlNode node in nodes)
   11:     {
   12:         string innerText = node.InnerText;
   13:  
   14:         innerText = innerText.Replace("\r", "");
   15:  
   16:         string[] components = innerText.Split('\n');
   17:  
   18:         Debug.Assert(components.Length == NUM_FILE_INFO_COMPONENTS);
   19:  
   20:         string releaseName = components[FILE_INFO_NAME].Trim();
   21:  
   22:         releaseName = HttpUtility.HtmlDecode(releaseName);
   23:  
   24:         string releaseTemp = components[FILE_INFO_DOWNLOADS].Replace("downloads", "").Trim();
   25:  
   26:         int releaseDownloadCount;
   27:  
   28:         if (!int.TryParse(releaseTemp, out releaseDownloadCount))
   29:         {
   30:             releaseDownloadCount = -1;
   31:         }
   32:  
   33:         ReleaseFile releaseFile = new ReleaseFile(releaseName, releaseDownloadCount);
   34:  
   35:         Files.Add(releaseFile);
   36:     }
   37: }

I have to do some guess-work here because I don’t own these pages, but basically I determined which div contained the release’s file information.  I do some parsing, figure out which component of the markup contains the file name and download count and add a new ReleaseFile to the collection.

That’s pretty much it.  I will get the source posted to Code Gallery tomorrow evening.