Architectural Overview of Windows SharePoint Services

 

Microsoft Windows SharePoint Services Product Team
Les W. Smith
Microsoft Corporation

October 2004

Applies to:
   Microsoft Windows SharePoint Services
   Microsoft Office SharePoint Portal Server 2003

Summary: Examine the architecture implemented in Microsoft Windows SharePoint Services. Learn what happens on the server when users issue page requests, and how Windows SharePoint Services responds. Understand the role of managed code in relation to unmanaged code in Windows SharePoint Services, and the Windows SharePoint Services database schema. (13 printed pages)

Contents

High-Level System View
Web Server Topology
How Requests are Handled in IIS and Through the ISAPI Filter
Web Part Infrastructure
Unmanaged Code in Windows SharePoint Services
Contents of the Configuration Database
Conclusion

High-Level System View

Within a deployment of Microsoft Windows SharePoint Services, three types of server components are active:

  • One or more front-end Web servers.
  • One configuration database.
  • One or more content database servers.

You can install these three components on a single computer, or you can distribute them among multiple computers within a server farm. All state information is maintained through the content and configuration databases in Microsoft SQL Server.

Figure 1. High-level view of a Windows SharePoint Services deployment

Web Servers

In a server farm running Windows SharePoint Services, the Web servers are stateless clones. A request can be routed to any Web server through load balancing, and any site can be served by any Web server. The Web server connects to a back-end database to retrieve data so that it can construct and return the Web page to the client. When a Web server fails within a server farm, requests are routed to other Web servers. You can add capacity to the deployment by adding more Web servers. Documents and other end-user data are not stored on the Web servers. All Web site content and configuration settings are kept in the database servers.

Content Database Servers

The back-end content database stores all site content, including site documents or files in document libraries, list data, and Web Part properties, as well as user names and rights. Unlike Web servers, content database servers are not identical. All the data for a specific site resides in one content database on only one computer. SQL Server provides backup failover protection to help prevent the service from being interrupted when a database server fails.

Configuration Database

The configuration database handles all administration of the deployment, directing requests to the appropriate database, and managing load-balancing for the back-end databases. When a front-end Web server receives a request for a page in a particular site, it checks the configuration database to determine which content database holds the site's data. You can run the configuration database on the same computer as a Web server or on a remote computer running Microsoft SQL Server.

Web Server Topology

The topology, or logical layout, of Web servers within a deployment of Windows SharePoint Services varies depending on the context. When you deploy Windows SharePoint Services, by default you create two virtual servers, or Web sites in Microsoft Internet Information Services (IIS). You create a Web site for an administrative virtual server, and you extend the existing IIS Web site on port 80 to create an end-user or run-time virtual server.

Figure 2. Default administrative and end-user virtual servers within IIS

You can have only one administrative virtual server on a single computer, which you use to configure all the front-end Web servers and to extend new virtual servers.

In any deployment of Windows SharePoint Services, you can implement more than one virtual server. You can configure the virtual servers in two ways. The default configuration resolves domain names to virtual servers in IIS. In this configuration, multiple virtual servers can be created, one domain name per server. The second configuration, called Scalable Host Header mode, increases the capacity of the host header mode used in IIS that allows a single virtual server to host multiple domain names, using the host header or domain name to resolve sites.

The scale entity for an end-user virtual server is the site collection, which consists of a top-level site or root Web (for example, http://VServer/[sites/]SiteCollection) and any number of subsites (for example, http://VServer/[sites/]SiteCollection/Subsite). The top-level site includes, for example, the Web Part, list template, and site template galleries and provides administration for all sites within the site collection. A virtual server is partitioned by site collections, which allow a given URL namespace to be separated into different segments, each site collection with its own namespace (for example, http://VServer/[sites/]YourSiteCollection, http://VServer/[sites/]MySiteCollection, and so on). Windows SharePoint Services can load-balance site collections across different content databases, but individual sites always live within the same database as their parent site collection.

IIS does not recognize site collections, which are not equivalent to either IIS Web sites or IIS virtual directories.

**Note for SharePoint Portal Server 2003   **When you deploy Microsoft Office SharePoint Portal Server 2003, the portal site is a Windows SharePoint Services site collection with additional functionality. There is one portal site per virtual server, but additional standard Windows SharePoint Services site collections can be created on the virtual server.

How Requests are Handled in IIS and Through the ISAPI Filter

IIS handles base HTTP requests, but Windows SharePoint Services implements an Internet Server API (ISAPI) filter (STSFLTR.DLL) that modifies IIS behavior to handle managed paths, or inclusions and exclusions. The ISAPI filter either redirects requests to the Windows SharePoint Services ISAPI extension, or it allows ASPX pages (.aspx) and Web service URLs (.asmx) to fall through the filter to the SharePoint ASP.NET handler.

Figure 3. HTTP requests routed to ASP.NET or to the ISAPI extension

**Note for SharePoint Portal Server 2003   **When you deploy SharePoint Portal Server, service and profile databases are also created in SQL Server, and an ADO.NET layer is implemented beneath ASP.NET.

The Role of IIS

IIS routes HTTP requests to the appropriate application, using the HTTP.SYS driver to listen at the port designated for an IIS Web site, or SharePoint virtual server, and to handle incoming IP packets. By default port 80 is used for HTTP requests and port 443 is used for HTTPS requests. IIS resolves requests using the domain name, the port, and the IP address of the virtual server to which the request is directed. IIS Web sites provide administrative, security, and resource boundaries in a deployment, but Windows SharePoint Services code runs from the virtual server level downward to handle site activity.

IIS handles all user authentication (Anonymous, Integrated Windows Authentication, or Basic) on a per-virtual-server basis, and manages the enabling of anonymous requests. If anonymous access is disabled in IIS, then it's turned off for the entire virtual server, and anonymous requests never reach Windows SharePoint Services. However, Windows SharePoint Services must be configured to accept requests that are not authenticated. If IIS is configured to accept anonymous requests but Windows SharePoint Services is not configured to do so, the requests are still rejected.

In addition to using IIS to handle user authentication, Windows SharePoint Services uses the new application pool feature of IIS 6 to allow virtual servers to run in different application pools. Each application pool has its own processor and memory resources to provide an isolated set of worker processes in which Web applications run. Windows SharePoint Services uses application pools to handle resource allocation, which offers the following advantages:

  • Process identity When Windows SharePoint Services connects to SQL Server, it does so by means of an application pool identity unless it is configured to use SQL Server authentication.
  • Process isolation Different virtual servers can be on the same computer, yet their databases can remain entirely separate if the servers are run with different accounts, although they share a common configuration database. Application pools provide a security boundary, each application pool requiring its own set of credentials on the server.
  • Application recycling In earlier versions of IIS, processes might run for a long time and resource leaks posed a significant threat to the server, but the current version of IIS recycles processes so that leaks do not disrupt the server.

Modifications Windows SharePoint Services Makes to IIS

While Windows SharePoint Services uses IIS application pools and authentication without modifying them, it modifies other IIS services it uses, such as virtual servers or request handling, for integration into the Windows SharePoint Services architecture. When installed on a single computer, Windows SharePoint Services replaces the default IIS application pool, DefaultAppPool, with its own application pool, STSAppPool1, for use with its default end-user virtual server, and creates an additional application pool, STSAdminAppPool, for the administrative virtual server. In the context of a server farm, administrators create and name both application pools.

**Note for SharePoint Portal Server 2003   **The name of the administrative application pool that is created by default in SharePoint Portal Server is CentralAdminAppPool.

Unlike SharePoint Team Services from Microsoft, Windows SharePoint Services does not keep configuration data in the IIS metabase beneath the virtual server level. All site or site collection information is stored in the SharePoint configuration database independently of IIS, which makes server farm configuration much easier to maintain.

Handling Requests for Paths through the ISAPI Filter

Before authentication, the ISAPI filter (STSFLTR.DLL) examines the requested URL to determine whether the request is for a Windows SharePoint Services managed path, based on an administrator-controlled list of included and excluded paths. The ISAPI filter routes requests according to the following logic:

  • If not a managed path, allow pass-through.
  • If a managed path and located in a virtual directory such as _layouts or _vti_bin, rewrite the URL to _layouts or _vti_bin in the server root.
  • If a managed path, but not in _layouts, and the request is for an .aspx or .asmx file, allow pass-through. In this case the request is handled by the ASP.NET ISAPI extension and then by the managed handler of Windows SharePoint Services.
  • If a managed path, but not in _layouts, and not an .aspx or .asmx file, rewrite to the ISAPI extension of Windows SharePoint Services.

Requests to the SharePoint ISAPI extension occur in the context of static HTML pages, legacy protocols such as the FrontPage Server Extensions Remote Procedure Call protocol, and in the context of the new DAV protocol or of static page GETs (for example, to get an .htm or .doc file).

The main function of managed paths is to define which URL namespaces are managed by Windows SharePoint Services, but they also define the paths that are used during self-service site creation, allowing administrators to control where sites can be created (for example, restrict site creation to a specified directory and containing URL such as http://Server/UserSites/).

Paths are either included or excluded. Exclusions specify that a given URL namespace does not pertain to an extended site, which prevents Windows SharePoint Services from intercepting the request. Windows SharePoint Services ignores requests directed to an excluded path. Inclusions, on the other hand, specify how to partition a URL namespace into different site collections within a SharePoint deployment, and there are two types of inclusion.

  • Explicit inclusion means that the root of the server is itself a SharePoint site and specifies a Web site that Windows SharePoint Services manages.
  • Wildcard inclusion specifies an entire site collection beneath a given directory that Windows SharePoint Services manages.

**Note   **When you upgrade SharePoint Portal Server 2001 and use Web store document management, SharePoint Portal Server uses exclusion to expose the URL to the Web store system, for example, under a virtual directory path of the portal root site. SharePoint Portal Server implements two wildcard inclusions by default, one for teams and one for personal sites, and uses partitioning schemes (for example, "/teams" or "/sites") to prevent confusion between top-level site folders and subsite collections.

ASP.NET Handler and Page Rendering

The ASP.NET handler in Windows SharePoint Services acts as a filter that determines the ASP.NET mode to use when running a page, which can be either direct or safe mode.

Pages located within the virtual /_layouts directory, called application pages, run in direct mode, meaning that Windows SharePoint Services does not intercept these pages but allows the pages to execute normally in ASP.NET. Application pages include, for example, the native SharePoint pages used to create new lists, edit views, and so on. The contents of the /_layouts directory are considered outside the Web site, and its pages are supplied directly by IIS as requested. Arbitrary code can be used within custom ASP.NET pages that are placed in this directory.

ASP.NET pages that are located within a Web site, such as the Home page, pages for viewing lists and items, or Web Part Pages, are called user pages and run in safe mode, meaning that Windows SharePoint Services allows Web form controls to run on them only if the administrator has designated the controls as safe. You can customize these pages, for example, through the UI or by using Microsoft Office FrontPage 2003, but arbitrary code cannot be added to a user page, which is only rendered as text in the browser at run time. Unlike direct mode, in safe mode the ASP.NET page is not compiled into a DLL. The list of safe controls allowed to run in the Web sites of a specific virtual server can be modified by editing the server's web.config file.

**Note   **For an ASP.NET application to coexist with Windows SharePoint Services and run on the same server without interference from the SharePoint ISAPI filter, the application's URL must be excluded. In addition, the web.config file of the application must be modified to clear out the SharePoint ASP.NET handler. Because the code in the application must run at an excluded URL, it cannot reference the Windows SharePoint Services assembly. For more information, see Working with web.config Files in the Microsoft SharePoint Products and Technologies 2003 Software Development Kit (SDK).

Web Part Infrastructure

The Web Part infrastructure provides safe mode handling to Windows SharePoint Services, allowing Web Parts to be added to an ASP.NET page based on the page URL, the current user ID, and other information stored in the database.

Figure 4. Populating the zones in a Web Part Page

When a Web Part Page is opened in the browser, Windows SharePoint Services uses the page URL and user ID to return from SQL Server a list of the Web Parts specified for the page, and to build an ASP.NET page object; this populates the Web Part zones on the page with the specified Web Parts. For example, the Home page of a SharePoint site by default includes two Web Part zones containing Web Parts that display summary views of the Announcements, Events, and Links lists, as well as an Image Web Part that displays a logo. However, if the administrator allows personalization of the page and a user customizes the page, Windows SharePoint Services displays different Web Parts depending on the user.

Unmanaged Code in Windows SharePoint Services

Most of the logic used in Windows SharePoint Services to work with sites and lists lives in unmanaged code, largely provided through dynamic-link libraries (DLLs) used in the earlier SharePoint Team Services. Web Parts and other ASP.NET objects in Windows SharePoint Services, as well as the ISAPI extension, are actually thin layers over the unmanaged code. Web Parts and Web services are built on top of the new SharePoint object model, and in turn the object model serves as a wrapper that calls into the unmanaged code.

The unmanaged code supports Microsoft Office FrontPage 2003 Server Extensions, the DAV protocol, view rendering, static document GETs, and all database input/output. The principal DLL, owssvr.dll, provides much of the logic for working with lists, including the logic that interprets the proprietary XML language, Collaborative Application Markup Language (CAML), used to define data and emit text.

Each front-end Web server contains site and list definitions within the setup directory (Local_Drive:\Program Files\Common Files\Microsoft Shared\web server extensions\60\) that include CAML schema files. The schema files determine, for example, how instances of new sites and lists are created, as well as how list data is viewed. For more information about site and list definitions, see Introduction to Templates and Definitions.

**Note for SharePoint Portal Server 2003   **SharePoint Portal Server additionally implements managed-code access to its databases through ADO.NET.

Rendering Views

CAML defines how list views are rendered within a Web Part. Each type of Windows SharePoint Services list has its own SCHEMA.XML file located in the \web server extensions\60\TEMPLATE\Language_ID\Site_Definition\LISTS directory that defines how the list is viewed on the HTML page when it is displayed in the browser. In the earlier version, SharePoint Team Services, list views were conveyed through the expansion of CAML islands in the browser (indicated by ows prefix), but in Windows SharePoint Services, the CAML view is instead conveyed through a Web Part, as shown in the following figure.

Figure 5. Rendering a list view through a Web Part in Windows SharePoint Services

A CAML SCHEMA.XML file contains definitions for the default list views, and for the forms used to work with individual items. CAML is used to construct the HTML and script required for the client browser, including, for example, the script, toolbars, or column headers used in the view header, field names or field values used in the view body, and page navigation or list properties used in the view footer. CAML can be used in the context of Windows SharePoint Services to emit in the browser whatever markup, script, or text may be required in the Web Part view, for example, HTML, XML, WML, ECMAScript (Microsoft JScript, or JavaScript), and so on. CAML is used to construct complicated regions, such as the script that is used in the list view of a calendar control.

**Note   **The Data View Web Part constructs XML from a SharePoint list or list view and uses Extensible Stylesheet Language Transformation (XSLT) to render the UI of the list view, offering a standards-based approach to doing high-end customizations. Windows SharePoint Services provides the data in XML format, and FrontPage implements an alternative Web Part to render a view of the data.

In addition to SCHEMA.XML, the following CAML schema files are used in Windows SharePoint Services:

  • **WEBTEMP.XML   **Allows multiple site definitions to be used in a deployment, such as the default definitions included in Windows SharePoint Services for the Team Site or Document Workspace template, or for one of the Meetings Workspace templates.
  • **ONET.XML   **Defines the lists and pages to include in new sites.
  • **FLDTYPES.XML   **Defines the SQL implementation of each field type used in Windows SharePoint Services as well as its HTML rendering.
  • **BASE.XML   **Defines the schemas for global lists, for example, the Lists, Docs, and UserInfo tables located in the database.
  • **DOCICONS.XML   **Specifies the icon to display for each file type and associates the file type with an application.

For more information about site and list definitions, see Introduction to Templates and Definitions and Customizing SharePoint Sites and Portals: Part 1.

Customized Pages

Safe mode allows users to customize the site without allowing them to put code on the server, but this mode also improves scalability, reducing the number of objects that must be created for the site and the amount of data that must be stored in the database.

In the earlier SharePoint Team Services, if you wanted to create a thousand different sites on the server, you needed to create a thousand different copies of each user page for each list, such as the AllItems.htm file or the item forms. In Windows SharePoint Services, as long as a user page has not been customized, the CAML SCHEMA file of the setup directory contains the page definition completely, and the definition is cached on the front-end Web server. Caching the definition eliminates the need for Windows SharePoint Services to create copies of user pages each time you create a site or list. Such pages are virtual pages whose content derives from CAML schema files, although they appear to be actual pages in the browser. Caching maximizes scalability by allowing pages that are not customized to be reused across sites and by reducing unnecessary data storage and the otherwise large memory footprint that is required to serve the pages. Windows SharePoint Services queries the database to determine whether a requested page is customized. If not, then the database does not contain the entire contents of the file, and the query only returns the path to a folder in the setup directory that contains the uncustomized page. For Web Part Pages that have not been customized, SQL Server is hit only to return the list of Web Parts to display within the Web Part zones.

When the default.aspx page for a site is requested, Windows SharePoint Services first checks to see if the page source is customized. If it is, then the Content column in the Docs table no longer contains null, as for pages that are not customized, but instead contains the page content in binary format. The definition is not drawn from the setup directory, but instead from the database. After a page is customized, reverting back to the original page definition is not supported. Similarly, when the list view on an AllItems.aspx page is customized, the modified view is stored in the tp_View column of the WebParts table in the database, and the list view definition is no longer drawn from the setup directory.

Contents of the Configuration Database

An installation of Windows SharePoint Services has one configuration database that manages the deployment. You can only modify the configuration settings in this database through the administrative virtual server; the settings are read-only from end-user virtual servers.

The configuration database stores the following general types of data:

  • **Global settings   **Information about the server farm, such as which Web servers or database servers are in the farm.
  • **Virtual server   **Information about each virtual server in the deployment, such as which SMTP server to use for a particular virtual server, or default settings for sites.
  • **Site map   **Information about which content database contains data for a given site. When Windows SharePoint Services receives the URL of a request, settings in this database determine which content database contains data for the site.

Figure 6 shows how the site map works in a content database lookup.

Figure 6. Content database lookup based on virtual server-relative URL

A request is posted to the server for http://MyServer/sites/mysite/Lists/AllItems.ASPX. The relevant portion of the URL, sites/mysite, specifies the site collection in the request. Because by default sites provides wildcard inclusion of site collections created on the server, sites/mysite is a virtual server-relative fragment that specifies the site collection. Because the relevant part of the URL excludes the computer name, two virtual servers with the same address or different addresses can serve the same content; site data can have, for example, both an extranet-facing site and an intranet-facing site.

In the previous example, the configuration database specifies that data for the site lives on a database server named ITG_STS_1 in a database named STS_01. After this information is gathered from the configuration database, the database ID (101 in Figure 6) is cached on the Web server in the session cookie and used to connect to the correct database in subsequent requests. Windows SharePoint Services uses an optimistic caching scheme, meaning that it assumes a site has not been moved since it was last visited by the user. If the cached URL is wrong and the site is absent, Windows SharePoint Services checks the configuration database to see if the site moved, and a self-correcting algorithm allows the system to adjust.

**Note for SharePoint Portal Server 2003   **When SQL Server is installed, SharePoint Portal Server adds proprietary tables to the configuration database, and uses this database to track the activity of servers within the farm, such as who is doing search, indexing, or single sign-on. SharePoint Portal Server also adds mappings for alternate access and extranet mapping to perform aggregation across multiple alternate stores.

Content Database Schema

Windows SharePoint Services stores all end-user data in the SQL Server database, which offers the following advantages:

  • Storage of list data, documents, and metadata in normalized tables.
  • Transactional updates of documents and document metadata.
  • Consistent backups of documents and document metadata.
  • A programmable storage layer.
  • Deadlock detection and resolution.

While the earlier SharePoint Team Services implements one database per site and one table per list, Windows SharePoint Services uses a fixed database schema and a fixed number of databases per server to improve scalability. A sparse database table stores all list data, and the mapping of list schemas to physical tables is logical. In addition, stored procedures minimize the number of roundtrips that must be made to SQL Server and bring input/output logic closer to data storage.

Figure 7. Content database core schema

The Sites table contains settings that apply to each site collection represented within the database; these are default settings that apply to all subsites created within each site collection. The table represents each top-level site of each site collection, as well as the root site and My Site in the context of a portal site. The Webs table contains settings that apply to each site within a site collection.

The Docs table stores all the documents of all sites in site collections represented by the database, including, for example, documents in document libraries, attachments, and nodes for each list, but also default.aspx and user pages for each list if they are customized. The Content column of the Docs table contains null if the pages have not been customized. When a site is first created, the Content column contains null for all site pages in the table because they are not customized, their page definitions drawing from schema files physically located on the front-end Web server.

The Lists table (or the list of lists) contains a row for each list of all the sites in the database. This table contains settings for each list, specifying which lists or document libraries are included in the sites, and which schema is instantiated by each list. The UserData table contains all the list data for items created by users in the sites; each row contains the data for each item.

The Links table contains links used in link fix-up to recalculate links, greatly simplifying link management because, in the previous version, shadow files had to be created in the file system behind each document, containing all the links coming from and pointing to the document.

The Web Parts table contains information about all the Web Parts and list views used in the sites, replacing the Views table of the previous version just as Web Parts replace the usage of CAML views directly on user pages. The tp_View column contains CAML for modified views but null for views that remain uncustomized. Web Part personalization information is maintained in the Personalization table.

**Note for SharePoint Portal Server 2003   **The SharePoint Portal Server content database is a superset of the Windows SharePoint Services database, adding tables and stored procedures. SharePoint Portal Server uses foreign key relationships into tables and does not alter the database schema of Windows SharePoint Services, but it adds two databases. The Profile database stores personal profiles and audience definitions for targeting of Web Parts and content, and the Services database supports search and indexing as well as subscriptions and subscription results.

Conclusion

The Windows SharePoint Services architecture offers modifications that directly address interests of scalability and performance over the architecture of the earlier version, SharePoint Team Services. Windows SharePoint Services expands and diversifies itself as a Web development platform because it integrates the .NET Framework into its own functionality. In addition, significant changes in its database schema offer the ability to take better advantage of SQL Server features. As a result of the database changes to Windows SharePoint Services, IIS plays a less prominent role in relation to the SharePoint architecture than it did in SharePoint Team Services. Understanding this architecture can help you determine how to develop custom applications upon the Windows SharePoint Services platform.