1.1 Glossary

This document uses the following terms:

access URL: An internal Uniform Resource Locator (URL) that is used by a crawler to identify and gain access to an item.

Active Directory: The Windows implementation of a general-purpose directory service, which uses LDAP as its primary access protocol. Active Directory stores information about a variety of objects in the network such as user accounts, computer accounts, groups, and all related credential information used by Kerberos [MS-KILE]. Active Directory is either deployed as Active Directory Domain Services (AD DS) or Active Directory Lightweight Directory Services (AD LDS), which are both described in [MS-ADOD]: Active Directory Protocols Overview.

anchor content source: A content source that is used to import the anchor text from links between items into the full-text index catalog.

anchor crawl: A process in which anchor text from links between items is added to a full-text index catalog.

anonymous authentication: An authentication mode in which neither party verifies the identity of the other party.

authority hops: The number of site levels to be navigated from a start address to a specific item.

binary large object (BLOB): A discrete packet of data that is stored in a database and is treated as a sequence of uninterpreted bytes.

Business Data Connectivity (BDC): A shared service that stores information about business application data that exists outside a server farm. It can be used to display business data in lists, Web Parts, search results, user profiles, and custom applications. Previously referred to as Business Data Catalog.

certificate: A certificate is a collection of attributes and extensions that can be stored persistently. The set of attributes in a certificate can vary depending on the intended usage of the certificate. A certificate securely binds a public key to the entity that holds the corresponding private key. A certificate is commonly used for authentication and secure exchange of information on open networks, such as the Internet, extranets, and intranets. Certificates are digitally signed by the issuing certification authority (CA) and can be issued for a user, a computer, or a service. The most widely accepted format for certificates is defined by the ITU-T X.509 version 3 international standards. For more information about attributes and extensions, see [RFC3280] and [X509] sections 7 and 8.

configuration database: A database that is stored on a back-end database server and contains both persisted objects and site collection metadata for lookup purposes.

content source: A set of options for specifying the type of content to be crawled and the start addresses for the content to be indexed. A content source is defined by the protocol handler that is used to access specific systems, such as SharePoint sites, file systems, and external websites. A content source can contain up to 500 start addresses.

cookie: A small data file that is stored on a user's computer and carries state information between participating protocol servers and protocol clients.

crawl: The process of traversing a URL space to acquire items to record in a search catalog.

crawl account: A user account that has access to all of the content that is traversed by a crawl component.

crawl mapping: A mapping that associates an access URL, which is used to obtain an item from a content source, and a display URL, which is the address of the item.

crawl queue: A data structure that stores the list of items to crawl next.

crawl rule: A set of preferences that applies to a specific URL or range of URLs. A crawl rule can be used to include or exclude items in a crawl and to specify the content access account to use when crawling that URL or range of URLs.

crawler: A process that browses and indexes content from a content source.

delete crawl: A process that is started automatically after a content source or start address deletion occurs and removes associated items from a search catalog.

directory service (DS): A service that stores and organizes information about a computer network's users and network shares, and that allows network administrators to manage users' access to the shares. See also Active Directory.

display URL: The URL that is displayed on a search results page for each search result. This can be different than an access URL. See also access URL.

domain: A set of users and computers sharing a common namespace and management infrastructure. At least one computer member of the set must act as a domain controller (DC) and host a member list that identifies all members of the domain, as well as optionally hosting the Active Directory service. The domain controller provides authentication of members, creating a unit of trust for its members. Each domain has an identifier that is shared among its members. For more information, see [MS-AUTHSOD] section 1.1.1.5 and [MS-ADTS].

domain name: The name given by an administrator to a collection of networked computers that share a common directory. Part of the domain naming service naming structure, domain names consist of a sequence of name labels separated by periods.

drive letter: One of the 26 alphabetical characters A-Z, in uppercase or lowercase, that is assigned to a volume. Drive letters serve as a namespace through which data on the volume can be accessed. A volume with a drive letter can be referred to with the drive letter followed by a colon (for example, C:).

exclusion list: A list of items to exclude from query results and to remove from a search index the next time that a crawl occurs.

file: A single, discrete unit of content.

folder: A file system construct. File systems organize a volume's data by providing a hierarchy of objects, which are referred to as folders or directories, that contain files and can also contain other folders.

forms authentication: An authentication method in which protocol clients redirect unauthenticated requests to an HTML form by using HTTP. If the protocol client authenticates the request, the system issues a cookie that stores the credentials or a key for reacquiring the identity. In subsequent requests, the cookie is submitted in request headers and the requests are authenticated and authorized by an ASP.NET event handler that uses the validation method that is specified by the protocol client.

full crawl: A crawl process that indexes all of the items in a specified content source, regardless of whether the item was modified.

full-text index catalog: A collection of full-text index components and other files that are organized in a specific directory structure and contain the data that is needed to perform queries.

globally unique identifier (GUID): A term used interchangeably with universally unique identifier (UUID) in Microsoft protocol technical documents (TDs). Interchanging the usage of these terms does not imply or require a specific algorithm or mechanism to generate the value. Specifically, the use of this term does not imply or require that the algorithms described in [RFC4122] or [C706] must be used for generating the GUID. See also universally unique identifier (UUID).

host hop: The process of traversing to a server with a different host name during a crawl.

HRESULT: An integer value that indicates the result or status of an operation. A particular HRESULT can have different meanings depending on the protocol using it. See [MS-ERREF] section 2.1 and specific protocol documents for further details.

HTTP GET: An HTTP method for retrieving a resource, as described in [RFC2616].

HTTP POST: An HTTP method, as described in [RFC2616].

Hypertext Transfer Protocol (HTTP): An application-level protocol for distributed, collaborative, hypermedia information systems (text, graphic images, sound, video, and other multimedia files) on the World Wide Web.

Hypertext Transfer Protocol Secure (HTTPS): An extension of HTTP that securely encrypts and decrypts web page requests. In some older protocols, "Hypertext Transfer Protocol over Secure Sockets Layer" is still used (Secure Sockets Layer has been deprecated). For more information, see [SSL3] and [RFC5246].

inclusion list: A list of items to include in query results and to add to a search index the next time that a crawl occurs.

incremental crawl: A crawl process that includes logic to index only a subset of the items in a content source that is crawled based on item modifications.

index server: A server that is assigned the task of crawling.

item: A unit of content that can be indexed and searched by a search application.

metadata index: A data structure that is stored on a back-end database server. It stores properties that are associated with each item, and the attributes of those properties.

page hop: The process of traversing from one item to another during a crawl. See also site hop.

portal content project: A primary search catalog that contains all of the content sources and settings for an administrator-defined crawl.

query component: A portion of a URL that follows a question mark (?), as described in [RFC3986].

query server: A server that has been assigned the task of fulfilling search queries.

search catalog: All of the crawl data that is associated with a specific search application. A search catalog provides information that is used to generate query results.

search database: A database that stores search-related information, including stored procedures and tables that are used for crawl data, document metadata, and administration information.

search query: A complete set of conditions that are used to generate search results, including query text, sort order, and ranking parameters.

search service account: A user account under which a search service runs.

security trimmer: A filter that is used to limit search results to only those resources that a user can view, based on the user's permission level and the access control list (ACL) for a resource. A security trimmer helps to ensure that search results display only those resources that a user has permission to view.

share: A resource offered by a Common Internet File System (CIFS) server for access by CIFS clients over the network. A share typically represents a directory tree and its included files (referred to commonly as a "disk share" or "file share") or a printer (a "print share"). If the information about the share is saved in persistent store (for example, Windows registry) and reloaded when a file server is restarted, then the share is referred to as a "sticky share". Some share names are reserved for specific functions and are referred to as special shares: IPC$, reserved for interprocess communication, ADMIN$, reserved for remote administration, and A$, B$, C$ (and other local disk names followed by a dollar sign), assigned to local disk devices.

Shared Services Provider (SSP): A logical grouping of shared service applications, and their supporting resources, that can be configured and managed from a single server and can be used by multiple server farms.

site: A group of related pages and data within a SharePoint site collection. The structure and content of a site is based on a site definition. Also referred to as SharePoint site and web site.

SOAP: A lightweight protocol for exchanging structured information in a decentralized, distributed environment. SOAP uses XML technologies to define an extensible messaging framework, which provides a message construct that can be exchanged over a variety of underlying protocols. The framework has been designed to be independent of any particular programming model and other implementation-specific semantics. SOAP 1.2 supersedes SOAP 1.1. See [SOAP1.2-1/2003].

SOAP action: The HTTP request header field used to indicate the intent of the SOAP request, using a URI value. See [SOAP1.1] section 6.1.1 for more information.

SOAP body: A container for the payload data being delivered by a SOAP message to its recipient. See [SOAP1.2-1/2007] section 5.3 for more information.

SOAP fault: A container for error and status information within a SOAP message. See [SOAP1.2-1/2007] section 5.4 for more information.

start address: A URL that identifies a point at which to start a crawl. Administrators specify start addresses when they create or edit a content source.

Uniform Resource Identifier (URI): A string that identifies a resource. The URI is an addressing mechanism defined in Internet Engineering Task Force (IETF) Uniform Resource Identifier (URI): Generic Syntax [RFC3986].

Uniform Resource Locator (URL): A string of characters in a standardized format that identifies a document or resource on the World Wide Web. The format is as specified in [RFC1738].

Universal Naming Convention (UNC): A string format that specifies the location of a resource. For more information, see [MS-DTYP] section 2.2.57.

URL encode: The process of encoding characters that have reserved meanings for a Uniform Resource Locator (URL), as described in [RFC1738].

URL space: A list of Uniform Resource Locators (URLs) that contains information about the links from each URL to other URLs.

user profile import: The process of importing records from a directory service (DS) to a user profile store.

web application: A container in a configuration database that stores administrative settings and entry-point URLs for site collections.

web server: A server computer that hosts websites and responds to requests from applications.

web service: A unit of application logic that provides data and services to other applications and can be called by using standard Internet transport protocols such as HTTP, Simple Mail Transfer Protocol (SMTP), or File Transfer Protocol (FTP). Web services can perform functions that range from simple requests to complicated business processes.

Web Services Description Language (WSDL): An XML format for describing network services as a set of endpoints that operate on messages that contain either document-oriented or procedure-oriented information. The operations and messages are described abstractly and are bound to a concrete network protocol and message format in order to define an endpoint. Related concrete endpoints are combined into abstract endpoints, which describe a network service. WSDL is extensible, which allows the description of endpoints and their messages regardless of the message formats or network protocols that are used.

website: (1) A group of related webpages that is hosted by a server on the World Wide Web or an intranet. Each website has its own entry points, metadata, administration settings, and workflows. Also referred to as site.

(2) A group of related pages and data within a SharePoint site collection. The structure and content of a site is based on a site definition. Also referred to as SharePoint site and site.

WSDL message: An abstract, typed definition of the data that is communicated during a WSDL operation [WSDL]. Also, an element that describes the data being exchanged between web service providers and clients.

X.509: An ITU-T standard for public key infrastructure subsequently adapted by the IETF, as specified in [RFC3280].

XML namespace: A collection of names that is used to identify elements, types, and attributes in XML documents identified in a URI reference [RFC3986]. A combination of XML namespace and local name allows XML documents to use elements, types, and attributes that have the same names but come from different sources. For more information, see [XMLNS-2ED].

XML namespace prefix: An abbreviated form of an XML namespace, as described in [XML].

MAY, SHOULD, MUST, SHOULD NOT, MUST NOT: These terms (in all caps) are used as defined in [RFC2119]. All statements of optional behavior use either MAY, SHOULD, or SHOULD NOT.