The MSDN Table of Contents in C#


Chris Lovett
Microsoft Corporation

January 15, 2001

Download Msdntocc2.exe.



Back in July, Bryn Waibel and John Boylan wrote an article titled Online Carpentry: Crafting a New MSDN Table of Contents, about the new XML-based TOC system that MSDN uses. Last month, the article was updated in Online Carpentry: Refinishing Your Table of Contents.

Now that Beta 1 of the .NET Framework is publicly available, I figured it would be a good time to try to port the code to C#, then write about the porting experience and the pros and cons of the result. I decided not to make any substantial changes to the design of the application so that I could focus on pure porting issues.

All the client-side JScript, .gif and .css files remain the same, of course. The XML files that describe the TOC also remain the same—because they were well designed and contained only pure TOC data, nothing specific to ASP itself.

The only files that needed to be ported were tocDown.xsl and the .asp files: default.asp, toc.asp, treeload.asp. The toc.js file needed some work also. The overall porting process took about four hours.


This file was straightforward, since it was just a wrapper page anyway. The top of the page changes to the following:

<%@ Page language="c#" AutoEventWireup="false" CodePage="65001"%>

I also changed the JScript ASP code to C#. and this was extremely simple. Basically the var declarations became string declarations and the QueryString method now uses square brackets. The string.match() calls become calls to System.Text.RegularExpressions.Regex.IsMatch(). This class name is abbreviated to Regex by importing the namespace up front with the following pragma:

<%@ Import Namespace="System.Text.RegularExpressions"%>


This page contained a bunch of code—about 200 lines. So first I moved the bulk of the code out of this page and into a clean C# class called MsdnToc in the file called Toc.cs. This code is then imported into default.aspx with the Src attribute on the Page pragma:

<%@Page language="c#" Src="toc.cs" AutoEventWireup="false" %>

To instantiate the MsdnToc class, I do the following:

MsdnToc toc = new MsdnToc(Context, RootDir, DefaultTopic,
    Server.MapPath( SubMapPath ), RootTocToken);

That passes all the necessary variables to the external class. Then this class has one main method that returns the TOC content in an XmlDocument object:

XmlDocument masterTOC = toc.GetContent(curPath, RootDir);

What I decided to leave in this file are the URL parameter manipulations and the final XSL transformation. The URL parameter code was straight forward. The XSLT work is a bit more involved. The original code was:

var styleFile = Server.MapPath( "tocDown.xsl" );
var style = Server.CreateObject( "MSXML2.DOMDocument" );
style.load( styleFile );
style.async = false;
var content = masterTOC.transformNode( style );

This is replaced with a call to a Render method on the MsdnToc object:

toc.Render(masterTOC, Server.MapPath("tocDown.xsl"), Response.Output);

This method is implemented as follows:

    public void Render(XmlDocument doc, string styleFile,
                       TextWriter output)
        XslTransform style = new XslTransform();
        style.Load( styleFile );
        XmlNavigator xmlNav = new DocumentNavigator(doc);
        style.Transform(xmlNav, null, output);

Notice that the MSXML2.DOMDocument becomes a System.Xml.XmlDocument object, and the XSL stylesheet is loaded into a System.Xml.Xsl.XslTransform object. The transform is then written directly to the Response.OutputStream.

There are a couple of important things to note here. Unlike MSXML2.DOMDocument objects, the new Managed XmlDocument.Load() method is synchronous by default. In fact, the Managed XmlDocument is always synchronous. This is because asynchronous HTTP connections can be established using lower-level classes in System.Net namespace before even calling Load, therefore Load itself doesn't really need to be asynchronous. We found that few customers were actually taking advantage of the fact that they could access the MSXML document while it was being loaded. If you really need to do this, then you would spin up a thread in the Managed world.

The XmlNavigator is new to the Managed world. This provides a cursor-based XML model over any data. The DocumentNavigator provides an XmlNavigator implementation over XmlDocument. The XmlNavigator also provides XPath support, which is used by the XslTransform since XSLT is built on XPath.

The Transform method is able to write directly to the ASP.NET page output stream, and the encoding of the output will be handled correctly. This was not as well integrated in the MSXML/ASP world and typically just doesn't work because ASP is usually outputting ISO-8859-1 and MSXML transformNodeToObject is outputting Unicode.


I also moved some code out of this page into a method called GetSubToc in Toc.cs, namely the loading of the XmlDocument. The final XSLT transform can now re-use the Render method we used in Toc.aspx. So this file is now quite clean and simple.


This is where the bulk of the TOC code lives.Therefore it represents the bulk of the porting effort. The new types introduced to the code where char[], string, int, XmlDocument, XmlElement, XmlNavigator, File, Regex, and Exception.

One annoying thing that made the port a little harder is the re-use of variables in the original version where the type changed. So an XmlDocument object was being assigned to an XmlElement then back to XmlDocument. I had to introduce a couple of new local variables to work around this.

The JScript string.concat and array.slice were replaced with a simple + operator and String.Join respectively. Also string.replace and string.match were replaced with Regex.Replace and Regex.IsMatch.

There were also the following changes in the XML code:

MSXML2 System.Xml
nodeFromId GetElementById
selectSingleNode XmlNavigator.SelectSingle
childNodes.item(curPath[i]) ChildNodes[Int32.Parse(curPath[i])]
childNodes.length ChildNodes.Count

I also simplified the following code:

    node.attributes.getNamedItem( "pth" ).nodeValue;

to just use GetAttribute. I'm not sure why the original code was written this way. This made the old GetAttribValue function unnecessary, so I removed it. I also removed RemoveChildren because it wasn't used.


The XSL file used to render the XML into DHTML also needed to be upgraded to the namespace, which is the only version of XSLT the .NET Framework supports.

This was very simple indeed, given that the original file did not use any proprietary features of MSXSL that were not supported in the new XSLT.

The only annoying thing is that the Beta 1 version of XslTransform does not support the HTML output option yet, so it writes a bogus XML declaration. It also writes out a bogus </IMG> on image tags, which broke the toc.js client-side code. So see below for information on what needed to be changed there as a result. Tracking down this problem and coming up with a suitable fix was one of the most-time consuming aspects of the porting process.


This code broke because it was rather fragile to start with. It regularly de-references an element's children[0] expecting an anchor tag and children[1] expecting an IMG tag. The new XSLT was just different enough to break this. So I replaced this code with calls to two new functions, GetLink() and GetImage(). These functions in turn use the DHTML getElementsByTagName method to make sure they find an anchor tag and IMG tag, respectively.

I also had to modify this file because .asp changed to .aspx.


I found that the new version was much more debuggable. The strong typing resulted in good compile errors from ASP.NET that helped tremendously in the porting effort. Also, the separate code files containing C# classes made the code more manageable and reusable. The new Managed XML classes seem to fit more naturally into the ASP.NET model and are relatively easy to port from MSXML code.

The performance of toc.aspx seems to be on par with toc.asp. But Loadtree.asp gets about 65 requests per second on ltoc0.xml, and loadTree.aspx gets only 30 requests per second. This seems to be because the XSLT stylesheet is doing a lot of attribute value matching rather than simple element/attribute name matching. When I comment out the final XSLT rendering, toc.aspx gets about 120 requests per second and toc.asp gets about 65. So clearly the compiled C# code and basic XmlDocument processing is a lot faster, but the XslTransform could use some more performance tuning. The Web Data team is doing a ton of performance work right now, so this should be greatly improved in the next release.

Chris Lovett is a program manager for Microsoft's XML team.