Understanding the Feed Download Engine
Understanding the Feed Download Engine
The Feed Download Engine is used for downloading Really Simple Syndication (RSS) feeds from Web sites. This overview explains how the Feed Download Engine works and how to control it through the Windows RSS Platform API.
This topic contains the following sections.
- Feed Synchronization
- Automatic Updates
- Interval and TTL
- Download Thread Count
- Switching to Manual
- HTTP 410 Gone
- Feed Enclosures
- Removing Archived Items
- Synchronization Events
- Related Topics
When synchronizing a feed, the Feed Download Engine performs the following tasks:
- Connect to the Web site, and download the XML source of the feed. The Feed Download Engine downloads feeds and enclosures via HTTP or Secure Hypertext Transfer Protocol (HTTPS) protocols only.
- Transform the feed source into the Windows RSS Platform native format, which is based on RSS 2.0 with additional namespace extensions. (The native format is essentially a superset of all supported formats.) To do this, the Windows RSS Platform requires Microsoft XML (MSXML) 3.0 SP5 or later.
- Merge new feed items with existing feed items in the feed store.
- Purge older items from the feed store when the predetermined maximum number of items have been received.
- Optionally, schedule downloads of enclosures with Background Intelligent Transfer Service (BITS).
To limit its impact on servers, the Feed Download Engine implements HTTP conditional GET combined with Delta encoding in HTTP (RFC3229) . This implementation allows the server to transfer a minimal description of changes instead of transferring an entirely new instance of a resource cached on the client. The engine also supports compression using the HTTP gzip support of Microsoft Win32 Internet (WinInet).
A successful synchronization means that the feed was successfully downloaded, verified, transformed into the native format, and merged into the store. A server response of
HTTP 304 Not Modified in response to a HTTP conditional GET (If-Modified-Since, If-None-Match, ETag, and so on) also constitutes success.
When Windows Internet Explorer navigates to a feed, it checks for a subscription in the Common Feed List. If a subscription is found, Internet Explorer renders the page based on the cached contents. To manually synchronize a feed, click the Refresh button (or hit F5). The new content will be displayed after the operation is finished.
Although feeds can be synchronized as they are viewed, the full power of the Windows RSS Platform can only be experienced when the Feed Download Engine is set to update subscriptions automatically.
|To enable automatic updates:|
var FBSA_ENABLE = 1; var fm = new ActiveXObject("Microsoft.FeedsManager"); fm.BackgroundSync(FBSA_ENABLE);
When automatic updates are enabled, the Windows RSS Platform creates a Scheduled Task that recurs every 5 minutes after it starts. This interval is for stability purposes only; the task is always recreated after it has completed successfully. For example, if the Feed Download Engine determines that the computer is not connected to the network, the synchronization task will stop. The task interval ensures that the Feed Download Engine will try again in 5 minutes.
Interval and TTL
When the Feed Download Engine launches, it queues up the "pending" feeds for which LastDownloadTime is zero, or for which the synchronization interval has lapsed. The engine processes up to four of these feeds in parallel. At the end of successful download, LastDownloadTime is set to the current time. Feeds that have missed a synchronization interval are given a higher priority in the pending feed queue.
The default synchronization interval is 24 hours (or 1440 minutes); however, individual feeds may be updated more regularly, based on user preferences. Additionally, the time to live (TTL) value of the feed also has a direct affect on how often a feed may be updated, even if the default download interval is decreased.
The following properties, listed in order of priority, directly affect how often a feed is checked for updates. These settings apply to automatic updates only. (An application can initiate a forced update of individual feeds at any time via the Download method.)
- LastDownloadTime—a read-only property that indicates when the feed was last downloaded.
- Ttl—an absolute minimum TTL value that is determined by the author of the feed. The Feed Download Engine will not attempt to automatically update the feed more often than this.
- Interval—a per-feed setting that overrides the default interval for the platform.
- DefaultInterval—the default interval for the platform. The minimum acceptable value for DefaultInterval is 15 (minutes).
Note The Feed Download Engine does not implement support for the
skipHours channel elements.
The SyncSetting property of the feed determines which of the two possible update interval settings is used.
- To use the global DefaultInterval setting, set FSS_DEFAULT.
- To use the Interval property value of the feed, set FSS_INTERVAL.
- To never sync the feed automatically, set FSS_MANUAL.
Based on the properties above, the next synchronization time (Ts) for a feed can be specified by the following formula. A random percentage (up to 10%) of the length of the calculated interval is added to Ts to decrease the likelihood that feeds will continue to be updated in the same order, or at the same time.
Ts = LastDownloadTime + max(Ttl,Interval) * (1.0 + random(0.1))
The Feed Download Engine synchronizes the feed when the current time is greater than or equal to Ts.
To force a synchronization of all feeds, regardless of LastDownloadTime and Ttl, use AsyncSyncAll.
Download Thread Count
The Windows RSS Platform is not actively aware of available network bandwidth as, for example, BITS is. However, the Windows RSS Platform attempts to be polite when downloading feed updates in the background by using a "token-bucket" algorithm. By default, 4 threads are used to make requests, each requiring a token. The platform starts with 2 tokens in its bucket and adds 1 token every second. The bucket holds a maximum of 4 tokens. If no token is available, the thread waits for one to become available. Using this method, the platform issues about 1 request per second, up to four at a time.
To further reduce the impact of concurrent requests on a slow network, you can limit the number of concurrent threads to 1 by setting the following registry key:
DownloadThreadCount = 1
Create a new DWORD value named
DownloadThreadCount and set it to
Switching to Manual
What happens when a user subscribes to a feed (such as a webcast or conference) that has a limited lifespan? How will the subscriber know when the publisher stops updating a feed? More importantly, will the browser continue to synchronize a feed that is no longer actively being updated?
There are two ways for a publisher to indicate to the browser that a feed is no longer active: with an HTTP response from the server, or a custom tag in the XML source of the feed itself.
HTTP 410 Gone
If the Feed Download Engine gets an
HTTP 410 Gone response to a feed download request, it immediately stops automatic updates of that feed by setting the SyncSetting to FSS_MANUAL. The HTTP 410 status code indicates that the requested resource is no longer available and will not be available again. Returning this response from the server is not as simple as removing the file--that would cause an HTTP 404 Not Found error instead. Because this method requires specific configuration of the Web server, it is not recommended for the hobbyist publisher.
The Windows RSS Platform also supports the channel element
<cf:noMoreUpdates />. This element is part of the
http://www.microsoft.com/schemas/rss/core/2005 namespace, and is a child of the RSS 2.0 channel element.
The feed may or may not contain feed items, as depicted in the following example:
<?xml version="1.0" encoding="utf-8" ?> <rss version="2.0" xmlns:cf="http://www.microsoft.com/schemas/rss/core/2005"> <channel> <cf:noMoreUpdates /> ... </channel> </rss>
The following PowerShell script shows that it works:
PS C:\> $fm = new-object -comobject "Microsoft.FeedsManager" PS C:\> $feed = $fm.rootfolder.CreateFeed("gone","http://example.com/feeds/noMoreUpdates.xml") PS C:\> $feed.SyncSetting 0 PS C:\> $feed.Download() PS C:\> $feed.SyncSetting 2 PS C:\>
In the preceding example, the SyncSetting property is changed from 0 to 2 after the call to Download. Note that 0 is FSS_DEFAULT and 2 is FSS_MANUAL.
Feed enclosures can also be downloaded automatically when the feed is synchronized. (See DownloadEnclosuresAutomatically.) The Feed Download Engine schedules these downloads with BITS, which must be running as a service for the downloads to complete. When the enclosure has been successfully downloaded, its DownloadStatus property is set to FDS_DOWNLOADED.
Note If the user changes the DownloadEnclosuresAutomatically property from false to true after the feed has already been downloaded, only enclosures of new and updated items are downloaded.
Client applications can discover items with enclosures by enumerating the Items collection of the Feed object, and looking for those with a valid Enclosure property. Refer to the ScreenSaver Sample for an example of this.
Removing Archived Items
By default, the Common Feed List maintains a maximum of 200 items per feed. When the number of items in the feed store exceeds the specified maximum, the oldest items are deleted. Automatically downloaded enclosures are also deleted at this time, except those whose read-only attribute is no longer set. (The read-only flag is set upon successful download.)
The following properties directly affect the number of items that remain after a synchronization operation.
- PubDate—used to determine the "age" of items. If PubDate is not set, LastDownloadTime is used. If the feed is a list, the order of items is predetermined and PubDate (if present) is ignored.
- MaxItemCount—a per-feed setting that limits the number of archived items. The feed's ItemCount will never exceed the maximum, even if there are more items that could be downloaded from the feed.
- ItemCountLimit—the upper limit of items for any one feed, normally defined as 2500. The value of MaxItemCount may not exceed this limit. Set MaxItemCount to ItemCountLimit to retain the highest possible number of items.
Setting the feed's MaxItemCount property does not take effect until the next synchronization operation. To "apply" a change, you must manually synchronize the feed or call Download. Additionally, raising the value of MaxItemCount may not immediately cause older items to be downloaded, due to the conditional GET request. For example:
- Subscribe to a feed with 50 items.
- Change MaxItemCount to 10.
- Download the feed. (The feed now has only 10 items.)
- Change MaxItemCount to 20 items and download.
- The feed *still* has only 10 items.
Because the content of the feed has not changed, the conditional GET does not return any new items. Additional items will be added when the feed is updated again.
The following table lists the FeedFolder events that are sent when a feed is synchronized, in the order that they are received.
|1.||FeedDownloading(path)||The download of feed specified by path has started.|
|2.||FeedDownloadCompleted(path,error)||The download of the feed is finished. The error parameter indicates whether it was successful. Enclosures are downloaded separately without events.|
|3.||FeedItemCountChanged(path,count,unread)||If successful, the item count for the feed is updated.|
|4.||FolderItemCountChanged(path,count,unread)||Finally, the aggregate count for the folder is updated for this and all parent folders.|
There are no synchronization events for enclosures. To determine whether an enclosure has been downloaded successfully, you must check the value of DownloadStatus for a value of FDS_DOWNLOADED and LastDownloadError for a value of FDE_NONE.