What Every SharePoint Admin Needs to Know About Host Named Site Collections
This post intends to tell you everything you need to know about host named site collections so that you can decide if they are appropriate for your environment. This post is NOT telling you to run out and create everything as host named site collections in SharePoint, instead it is meant to help educate you about a feature that exists as part of the product. Simply put, I heard from some of my customers that there was some misinformation in the community saying to never use host named site collections. This is plain wrong, just as saying to always use them is also wrong. This post explains what host named site collections are, how to use them, and some limitations to consider if you intend to use them.
What Are Host Named Site Collections
A host-named site collection allows you to address a site collection with a unique DNS name, such as “http://fabrikam.com”.
Typically you will create a SharePoint web application, and it contains many path-based site collections that share the same host name (DNS name). For example, Team A has a site collection at http://contoso.com/sites/teamA, and Team B has a site collection at http://contoso.com/sites/teamB. These are referred to as path-based site collections, and is the recommendation for most corporate scenarios. Host named site collections enable you to assign a unique DNS name to site collections. For example, you can address them as http://TeamA.contoso.com and http://TeamB.contoso.com, which enables hosters to scale to many customers.
SharePoint makes the decision on how to map the host name to the site collection when the SPSite object is constructed. It internally uses the SPWebApplication object to find the web application in the configuration database and determine if there is a host header associated with the site collection. If no host header information is returned, this is a typical site collection. Turn on Verbose logging, and for path-based sites you will see a ULS entry similar to “Looking up the additional information about the typical site http://contoso.com/sites/teamB”. If host header information is returned, then the host named site collection information is retrieved. You can see this in the ULS log with the entry “Site lookup found the host header site http://hosta.sharepoint.com/Pages/default.aspx”.
Host named site collections have been in the product since SharePoint 2003, where it was referred to as “scalable hosting mode”. The point is that host named site collections is not a new feature to SharePoint 2010, it has been around a long time and customers have been successfully using it in their environments for many scenarios for quite a long time now.
Admittedly, there were several known issues with the implementation early on in SharePoint 2007, such as the inability to use blobcache with host name site collections (see Stefan Gossner on MOSS 2007 blob caching and its limitations), which has contributed to the misinformation in the community and lingering perception issues about the appropriate use of host named site collections. This issue as well as several others were addressed early on with SharePoint 2007. You can see in the screen shot below that images are in fact cached using blobcache using SharePoint 2007 just fine.
To be clear, the blobcache issue was never an issue in SharePoint 2010, in fact SharePoint 2010 made two significant improvements for host named site collections: the ability to use managed paths with host-named site collections, and the ability to use off-box SSL termination with host-named site collections. Host-named site collections can also be used to implement multi tenancy solutions (for a complete discussion on multi tenancy in SharePoint 2010, see Spence Harbar’s excellent Rational Guide to Multi Tenancy in SharePoint 2010). Office365 is implemented using host named site collections and multi tenancy as this is how the product scales to support many, many customers for that specific hosting scenario.
Creating Host Named Site Collections
To better understand host named site collections, let’s see how you create them. You cannot use the self-service site creation web UI to create a host-named site collection, but instead you use PowerShell. The following code creates a new web application listing on port 80, and two host named site collections that use the Publishing Portal site template.
$w = New-SPWebApplication -DatabaseName "WSS_Content_HostNameTest" -ApplicationPool "SharePoint - Content App Pool" -Name "HostNameTest" -Port 80 #Remember to do IISRESET /noforce on each server before using the new web application $w = Get-SPWebApplication "HostNameTest" New-SPSite http://HostA.SharePoint.com -OwnerAlias "SHAREPOINT\kirkevans" -HostHeaderWebApplication $w -Name "HostA" -Template "BLANKINTERNETCONTAINER#0" New-SPSite http://HostB.SharePoint.com -OwnerAlias "SHAREPOINT\kirkevans" -HostHeaderWebApplication $w -Name "HostB" -Template "BLANKINTERNETCONTAINER#0"
The result is a web application that contains my two host-named site collections. There is a third site collection for the Office Viewing Cache because I am using Office Web Apps. I did this to show that you can have a mix of path-based and host named site collections within the same web application.
We created two host named site collections, but I didn’t say anything about adding the host headers into the IIS bindings. That is because the web application that contains them listens to *:80, any port 80 traffic that is not otherwise directed to another web application on the same machine is directed to this web application. You can verify this by looking at the bindings for the containing web application in IIS.
This is an important point to consider. If you create the web application with a non-default port, something like 3759, your host named site collections should also be on that port. Otherwise, you will have to manually add the host header entries in the bindings in IIS, and do this on every machine. Just say no to non-default ports, use port 80 and make everyone’s life easier. You really do not want to start editing the bindings for the web applications manually, especially on every machine in your farm. Typically you will just use a web application that listens on *:80.
Reiterating this point, if you are going to implement host named site collections, you should not use a host header for the containing web application that houses the host named site collections or it will not work properly. Doing so will create a host name binding for the web application in IIS, and the other host names will not be routed to the IIS site. In the example I used, all host names are processed by port 80, which alleviates the need to manually add host bindings in IIS. This has one unfortunate downside: if other web applications that use host headers, say contoso.com:80, are stopped due to application pool failures or other issues, then the application that listens on *:80 will process the traffic. This can lead to unexpected results if the contoso.com:80 site uses a different authentication mechanism or is processed by a different application pool identity. It’s just a function of how IIS works, but something to consider when designing your applications and something you should test in your environment (you do test things before implementing them in production, don’t you?)
Managed Paths and Host Named Site Collections
As stated previously, an improvement to host named site collections in SharePoint 2010 is the ability to use managed paths. As we saw when creating host named site collections, there is no UI to create a host header managed path, you instead use PowerShell using the –HostHeader switch on the New-SPManagedPath PowerShell command.
New-SPManagedPath "cthub" -HostHeader -Explicit $w = Get-SPWebApplication "HostNameTest" New-SPSite http://HostA.SharePoint.com/cthub -OwnerAlias "SHAREPOINT\kirkevans" -HostHeaderWebApplication $w -Name "HostA Content Type Hub" -Template "sts#0"
In this example, we create a managed path “cthub” and create a site collection using that managed path. The results are as you would expect, a new site collection is created using the HostA.SharePoint.com DNS name using the managed path “cthub”.
This can be hugely beneficial when defining consistent provisioning solutions.
Can I Use Alternate Access Mappings with Host Named Site Collections?
SharePoint provides a capability to create a web application using a host name such as http://publishing.contoso.com, and you extend the web application using a new alternate access mapping for the web application to a new host header such as http://www.fabrikam.com. This is particularly useful when you have different users accessing the same content through different authentication means. Internal users authenticated with Windows claims can access the web application using http://publishing.contoso.com, and external users access the same content using http://www.fabrikam.com. The difference is the URL that you use to access the web application.
With host named site collections in SharePoint 2010, you do not have the ability to provide different authentication mechanisms based on the host header. Remember that this happens at the web application level. Using our previous example of a web application “HostNameTest” that contains two host named site collections, “HostA.SharePoint.com” and “HostB.SharePoint.com”, it is not possible in SharePoint 2007 or 2010 to provide different authentication to each of the site collections. This means that you cannot use alternate access mappings with host named site collections. This should be a consideration when designing your information architecture. If you need to support site collections responding to multiple host-name URLs, consider using path-based site collections with alternate access mappings instead of host-named site collections.
Host Named Site Collections Only Use One Host Name
Continuing on the discussion on AAMs and host named site collections, you cannot use multiple host names to address a site collection in SharePoint 2010. Because host-named site collections have a single URL, they do not support alternate access mappings and are always considered to be in the Default zone. This is important if you are using a reverse proxy to provide access to external users. Products like Unified Access Gateway 2010 allow external users to authenticate to your gateway and access a site as http://uag.sharepoint.com and forward the call to http://portal.sharepoint.com. Remember that URL rewriting is not permitted. Further, a site collection can only respond to one host name. This means if you are using a reverse proxy, it must forward the calls to the same URL. If your networking team has a policy against exposing internal URLs externally, you must instead use web applications and extend the web application using an alternate access mapping.
Host Named Site Collections and SSL
We’ve pointed out several times that we can only use one host name, can we simultaneously use both HTTP and SSL based URLs simultaneously? The answer is no. By default, host-named site collections in SharePoint 2010 use the same protocol scheme as the public URL of the Default zone of the web application. If you wish to provide host-named site collections in your Web application over SSL, ensure that the public URL in the Default zone of your Web application is an HTTPS-based URL.
[Update: thanks, Spence Harbar!] The August 2010 Cumulative Update adds additional support for both HTTP-based host-named site collections and HTTPS-based host-named site collections to coexist in the same web application. By default this support is disabled. To enable this support, set the
Microsoft.SharePoint.Administration.SPWebService.ContentService.EnableHostHeaderSiteBasedSchemeSelection property to true. As soon as it is enabled, SharePoint will no longer use the web application's default zone public URL protocol scheme for all host-named site collections in that web application. Instead, SharePoint will use the protocol scheme provided during host-named site collection creation, restoration, or rename. Host-named site collections that are created before this update is installed will default to use the HTTP protocol scheme if this property is set to true. These site collections can be switched to use the HTTPS protocol scheme by renaming the host-named site collection and providing an HTTPS-based URL as the new site collection URL.
To configure SSL for host-named site collections, enable SSL when creating the Web application. This will create an IIS Web site with an SSL binding instead of an HTTP binding. After the Web application is created, open IIS Manager and assign a certificate to that SSL binding. You can then create site collections in that Web application.
A server certificate has to be installed and assigned to the IIS Web site. Each host-named site collection in a Web application will share the single server certificate assigned to the IIS Web site. You need to acquire a wildcard certificate or subject alternate name certificate and then use a host-named site collection URL policy that matches that certificate. For example, you will need a *.sharepoint.com wildcard certificate to generate host-named site collection URLs such as https://hosta.sharepoint.com, https://hostb.sharepoint.com, and so on, to enable these sites to pass browser SSL validation. If you require unique second-level domain names for your sites, you have to create multiple Web applications rather than multiple host-named site collections.
Because SharePoint Server 2010 uses the public URL in the Default zone of the Web application to determine whether host-named site collections will be rendered as HTTP or SSL, you can also use host named site collections with off-box SSL termination. As discussed in the previous section, “Host Named Site Collections Only Use One Host Name”, the SSL terminator must preserve the original HTTP host header from the client. As discussed in the TechNet paper, Plan for host-named site collections (SharePoint Server 2010), there are 3 requirements to use SSL termination with host-named site collections:
- The public URL in the Default zone of the Web application must be an HTTPS-based URL.
- The SSL terminator or reverse proxy must preserve the original HTTP host header from the client.
- If the client SSL request is sent to the default SSL port (443), then the SSL terminator or reverse proxy must forward the decrypted HTTP request to the front-end Web server on the default HTTP port (80). If the client SSL request is sent to a non-default SSL port, then the SSL terminator or reverse proxy must forward the decrypted HTTP request to the front-end Web server on the same non-default port.
The ability to use SSL termination and managed paths makes host header site collections a very powerful tool for hosting scenarios.
When you need to create vanity URLs, the first question you must ask yourself is “how many URLs will I need?” The unit of scale for SharePoint is at the site collection level. There is a tested limit of around 100 web applications for SharePoint, while each web application can have up to 250,000 site collections (see notes in the section “Know Your Limits” for discussion on number of web applications). When talking about options for vanity URLs, size matters. I have heard several people blanketly denounce host named site collections without providing context around the size of the environment or the solution being architected. It is important to remember the guidance in the SharePoint Server 2010 capacity management: Software boundaries and limits paper when architecting your information architecture. If you require a handful of vanity URLs, you are likely going to be better served creating web applications and leveraging alternate access mappings. If you are creating hundreds or thousands, then obviously you can’t create this many web applications, you are left with host-named site collections. See the section below “Know Your Limits” and understand the software boundaries and limits when designing your information architecture.
When you create a new web application using the SharePoint web UI, you are reminded to use “IISRESET /noforce” on each of the servers before using the web application. Simply put, this affects your uptime, even for a very short duration. A benefit to host named site collections is that you do not need to issue IISRESET when you create a site collection, host named or path-based. Maybe this isn’t a big deal in your environment, but in some environments an IISRESET is a big deal. You can see in my example above that I include a note reminding you to use IISRESET on each machine, this is only because I created a new web application as part of the demo.
Can’t I Just Rewrite the Path?
ASP.NET 2.0 introduced a very cool feature that you could rewrite a URL programmatically, such that a request for http://contoso.com/employees/kirk could be rewritten to http://contoso.com/sites/HR/pages/Employees.aspx?Emp=Kirk. This allows for URLs that are easier for end users to understand and navigate your web site. SharePoint does not support asymmetrical URL rewriting (http://technet.microsoft.com/en-us/library/cc288609(office.12).aspx). Modifications to the content path or the host name are not supported. This means that URL rewrites (different than redirects, see the next section) cannot be used with SharePoint 2010. I have seen several developers try to do this to do things like remove “Pages” from the URL, this is not supported either. The inability to rewrite the path has implications on other scenarios, such as reverse proxies. See the section “Host Named Site Collections Only Use One Host Name” for a discussion on reverse proxies.
What About Redirects?
If your goal is to provide vanity URLs, then there is another approach to consider: using HTTP redirects. You can configure IIS to redirect all requests to a particular path to another resource. This is different than URL rewriting because the client makes a request to a resource, receives a 302 redirect response, and then requests the new resource based on the information in the 302 redirect. In fact, Jie Li mentions using redirects in an attempt to address SEO. For many cases, this is absolutely a great approach. However, consider that SOAP (web services) does not handle 302 redirects. This means anything that calls SharePoint with the vanity URL will not work. If your end user tries to use one of the Office client applications and work with SharePoint using their vanity URL, they will receive unexpected behavior and error messages because those applications use SOAP to communicate with the server.
Kerberos and Host Named Site Collections
You are probably wondering if you can use Kerberos with a host named site collection, and the answer is yes. To prove this, I configured the web application “HostNameTest” to use Negotiate for authentication. Its application pool is running as an account sharepoint/sp_app. The DNS entry for “hosta.sharepoint.com” was set up as an A record. I added a SPN “HTTP/hosta.sharepoint.com” to that account, made a request, and we see a Kerberos ticket when I use KLIST.exe. This shows that I am getting a Kerberos ticket.
Yes, Kerberos works with host-named site collections.
Migrating Path-Based to Host Named Site Collections
If you have a number of web applications that provide vanity URLs and you are approaching some of the software boundaries for too many web applications (see the next section!), you are probably wondering if you can somehow collapse the information architecture to use host named site collections instead of path-based site collections in web applications. The answer is yes!
In order to convert from path-based to host named site collections, you have to use backup and restore using PowerShell. Back up the site collection to a file using Backup-SPSite, and then restore to a host named site collection using Restore-SPSite.
Backup-SPSite http://server_name/sites/site_name -Path C:\Backup\site_name.bak Remove-SPSite –Identity http://server_name/sites/site_name –Confirm:$False Restore-SPSite http://www.example.com -Path C:\Backup\site_name.bak -HostHeaderWebApplication http://server_name
It goes without saying that when you backup and restore, make sure that you are not creating multiple site collections that are identical in your farm to avoid duplicate GUIDs. It is important that you remember to use Remove-SPSite before Restore-SPSite in this example.
In SharePoint 2007, site collection restore was performed with “stsadm –o restore”, which did not provide a mechanism to specify which content database would contain the site collection. A nice change in SharePoint 2010 is the Restore-SPSite command allows you to specify the content database that will house the site collection. This lets you kill two birds with one stone if your old path-based site collection was contained in a content database that was approaching the 200 GB limit, you can move the content to a new content database.
By default, the site collection will be set to read-only for the duration of the backup to reduce the potential for user activity during the backup operation to corrupt the backup. Another nice feature of the improved capabilities in SharePoint 2010 (compared to SharePoint 2007) is the ability to use SQL snapshots with SQL Server Enterprise Edition. If you have SQL Server Enterprise Edition, we recommend that UseSqlSnapshot parameter be used because this ensures a valid backup while it allows users to continue reading and writing to the site collection during the backup.
See the next section, “Know Your Limits”, for discussion on limitations of the Backup-SPSite and Restore-SPSite commands.
Know Your Limits
There are many factors that you should consider when designing any information architecture, host-named site collections or not. As a SharePoint administrator, you should commit the SharePoint Server 2010 capacity management: Software boundaries and limits paper to memory. I am pointing these limits because while most readers will have small farms with a handful of cases where host named site collections are used, there are a few administrators of huge farms that could potentially have tens of thousands of host named site collections. Understand the limits of the product when designing your provisioning solution. Also understand that some of these limits are extreme boundaries that may not be at all applicable to your environment, in fact your environment may not be capable of reaching some of these numbers.
- The number of application pools is limited to 10 per web server.
- There is no published limit for the number of web applications for SharePoint 2010. The limit in SharePoint 2007 was 100 web applications total. Most of the Premier Field Engineers that I work with recommend that this limit generally applies to most SharePoint 2010 environments. This is due to the number of timer jobs, resource contention, and other factors that vary by environment. In fact, most PFEs recommend a practical limit of number of web applications to around 20 web applications to improve manageability of the environment.
- Each web application can have up to 300 content databases.
- Each content database can have up to 200 GB of content.
- Maximum recommended 2000 site collections per content database. If you intend on creating many site collections, take this and the max content database size in mind.
Other considerations to keep in mind is how your information architecture may impact search. Search will not automatically pick up the host named site collections, you may need to manually add the addresses in as start addresses or as content sources. Given the limit of 100 start addresses, you may need to define multiple content sources and start addresses. This means you need to keep the following software boundaries and limits in mind:
- Limit to 100 scope rules per scope; 600 total per search service application.
- Maximum 200 site scopes and 200 shared scopes per search service application.
- Maximum 50 content sources per search service application.
- Maximum 100 start addresses per content source.
If you are using host named site collections for vanity URLs (not using multi-tenancy), you should also consider the behavior of the PeoplePicker. You can adjust the PeoplePicker settings to only resolve names within the current site collection (http://technet.microsoft.com/en-us/library/gg602070.aspx). This isn’t really anything to do with host named site collections, but worth pointing out.
As mentioned in the Migrating Path-Based to Host Named Site Collections section above, content is migrated from path-based to host-named site collections using Backup-SPSite and Restore-SPSite (replacing the stsadm backup and restore). This command has a tested limit of 15 GB. I have one customer who is able to use this command with much larger content databases, and another customer who times out with 18 GB databases. It varies depending on disk latency for the local disk as well as disk latency / throughput in SQL. Know that the supported limit is 15 GB if you intend to migrate from path-based to host named site collections.
Host named site collections are a great tool to keep in mind when designing your information architecture. However, you must also consider other limitations and decide if host named site collections are appropriate for your scenario and understand the practical limits of the product when designing your information architecture. And the next time that you hear someone blanketly say “never use host named site collections”, please have them visit this blog post so we can bring them up to date
Many thanks to Spence Harbar, Sean Livingston, and Keith Bendure for reviewing this article! As always, if you have any feedback to this post, please post comments to this blog post so that others can see.