3.1.4 Message Processing Events and Sequencing Rules

This protocol has the operations described in the following table.

Operation

Description

EnumerateFolder

Returns items and subfolders of a given folder.

GetAttachments

Returns attachments to a given list item.

GetList

Returns general information, the schema of the fields, and access right permissions for a given list.

GetListCollection

Returns general information about all lists of a given site.

GetListItems

Returns all or the desired fields of all list items satisfying certain criteria in a given list.

GetSite

Returns information about the site collection of the context site.

GetSiteAndWeb

Extracts the fragments of a URL which refer to the site as a whole and the particular Web where the URL belongs.

GetURLSegments

Given a URL of some structural element, extracts the identifiers, URLs, or both elements of the containing structural elements.

GetWeb

Returns information about the context site.

Each method is a SOAP operation, as specified in [SOAP1.1], that sends parameters that are organized into an input XML message and returns a set of values that are combined in a response or output XML message. The protocol server never initiates any communication with the protocol client. All communication is transported over HTTP or HTTPS as specified in [RFC2616], section 9.1.

Method calls are sent as HTTP posts with SOAP action headers denoting the method and the body containing the input message that are wrapped in a SOAP envelope. The responses are sent as an output message in the body of an HTTP response and have the content type: text/xml and are wrapped in a SOAP envelope. All posts are made to a well-known URL on the protocol server. For example, http://root/_vti_bin/SiteData.asmx, where the root denotes a root URL of a site or subsite.

Crawling begins with the establishment of context. Assume that all the crawling agent knows is the URL of a page from the site content. For example:

 http://www.fabricam.com/Subsite/Shared%20Documents/Forms/AllItems.aspx

The following sequence diagram depicts the sequence of the required operations for full traversal. A brief explanation of each message follows. Note that the following figure explains only one branch of depth-first [KNUTH] traversal of a site. For example, having found a list among others returned by the GetListCollection response, the protocol explores its list items. Complete depth-first traversal enumerates all the lists first, and then explores the list items of each list.

Outline of site traversal scenario

Figure 2: Outline of site traversal scenario

The following occurs:

  1. The protocol client sends a GetSite input message with any site-referring URL as a parameter to get the URL for the site collection and the site referred to by the URL.

  2. The protocol server sends a response message containing those URLs.

  3. The protocol client sends a GetWeb request message targeting a web site of interest in the site collection. For example, http://www.fabricam.com.

  4. The protocol server sends a response message that contains the list of all subsites (child objects) and lists of the web site targeted by the request. In this example, http://www.fabricam.com/subsite.

  5. To explore the first list in the depth-first traversal, the protocol client sends a GetList message, passing this subsite URL and list identifier as parameters.

  6. The protocol server sends the GetList response message containing information about the list, including when it was created, when it was last modified, the permission settings, and the list schema.

  7. To obtain information about all the list items, the protocol client sends a GetListItems input message with an empty sQuery parameter (meaning "all items ").

  8. The protocol server sends a GetListItems response message, enumerating all list items.

  9. To obtain information about each list item, the protocol client sends a GetListItems input message again to get the list item properties.

  10. The protocol server sends a GetListItems response message for list item properties.

The protocol client can now inspect all fields of those list items to build the index. The details of building the index are outside of the scope of this protocol.

Full details of individual Site Data Protocol operations are specified in sections 3.1.4.1 through 3.1.4.9.