URL's Matter! (?)

As a technical consultant I'm always trying to put myself into the shoes of those who have better things do in life than wonder at 1's and 0's. Often while in "end user" mode I find myself wondering how we have let those techies get away with URL's for so long. They aren't really all that intuitive, they are susceptible to attacks like "Phishing" (becoming very popular with banking customers), they are quite painful to type (":", "/", ".") and they just generally dont look very nice. Sure, we have tried to make things easier with concepts like favourites, but this is something that even I, as a seriously tidy computer user, have not been able to ever effectively master. (beats me, it's just like files and folders right?) 

So, with that short background speil, when designing a solution I try to ensure URL's are not something an end user has to worry too much about. Sure they may need to type it in once, but from then on they should be able to forget about the address bar. With this in mind, I set about working on a large scale SharePoint deployment.

Time passed....

Some weeks into the project, one of our business sponsors noticed something weird about SharePoint, after a certain number of Portal areas had been created, it suddenly started putting ugly "\C1\", "\C2", in the URL's. Now, I knew this is what SharePoint did, it has often been raised on the internal discussion lists we have at Microsoft, but it hadn't occurred to me that this might be something an "end user" would complain about. Now apart from the whole "well, you should have!" or "well, you should have mentioned it anyway!", both of which I will gladly wear ("hey, we had some serious deadlines, and I knew it was something we had to live with) I was now faced with a "critical" issue that had come out of nowhere, completely ruining my day, and beautiful RAID status.

So, before I talk about what we did to overcome this little "issue", here is a brief description of the technical reasons we have this "C1", "C2" feature:

The first thing to note is that SharePoint "looks" like a hierachy of Areas, these areas can be dragged and dropped through a simple to user interface (Portal Site Map) to do two key things:
1. Organise your portal content effectively
2. Drive the site navigation

The second thing to note is that a SharePoint Area is (simplistically) just a Windows SharePoint Services site, but with a whole lot of fancy features added to it. Leading on from this, a Portal site is, behind the scenes, a hierarchial Windows SharePoint Services Site Collection (this is a useful way to think about the product sometimes). Why is this important? Because while the WSS site structure is fixed in that you cannot move those sites around, the Portal Area hierarchy must be flexible. WSS is the physical structure, while the SPS Area hierarchy is a virtual structure.

The third thing to note, is that the URL's you are seeing relate not to the SPS Area tree but the underlying WSS site Tree. You may argue this is or is not good, but one thing it does is ensure that your URL never changes, therefore minimising broken links. Yes, there could have been other ways the product group could have done it, some fancy redirect gadget, but that is not always ideal either.

Ok, so take a breather, this can mess with your head a little.

For scalability reasons it is best to limit the number of children sites you create, so the product team had to come up with a concept called a "bucket web", these "bucket webs" are what you are seeing as the "C1" and "C2" in the URL. SharePoint has an algorithm in it that says every bucket web can have 20 others buckets and 20 ordinary webs. What this in effect means is that the first 20 areas you create will go directly under the root, after that buckets will kick in, and they will have a "CX" in the URL, where X is a number. This means the first web has 20 areas, this first bucket level 400 and the second bucket level 8000, etc. When all the first level of buckets are filled, it will move onto the second level C1, C2, C3....C19, C1/C1, C1/C2, etc.

The final thing you should know is that when you delete an area its "spot" becomes available, so if you delete one of the first 20 areas, you can reclaim it the next time you create an area. As you will see this is important.

Now, going back to my customer "issue". They had decided on a organisational hierachy (at least initially) and this had 6 key business units. When we drilled into the requirements, not all URL's mattered, just these 6 key business units. They wanted something like http://<server>/<department>. These needed to be relatively "predictable" because people would often go directly to them rather than the top level. So, with this in mind, we were able to ensure these 6 were the first areas created, and therefore had the URL's we wanted.

Taking it a step further we realised that this may in fact increase in the future. To cater for this we created a hidden area (hidden from site navigation, and secured for Admins only) in it we created the rest of the 15 or so areas with "predicatble" URL's. This effectively locked those away, in the future if we need one, we will simply delete one of these holding areas, then immediately create the new area. In our testing, this seemed to work quite nicely.

The problem here is that you really only get one chance to do this, and that is when you first install, so pack it away in your kit bag. Oh...and URL's....yeah, they matter, a bit.