XmlNameTable Revisited

After my post on the XmlNameTable: The Shiftstick of System.Xml I thought that I would follow it up with a second one to discuss it in greater detail as a result of the feedback comments. The XmlNameTable is an internal piece of implementation exposed on the System.Xml APIs which is why I drew parallels to it being the shift stick on a car. Most people drive automatic cars in the USA and having the ability to play with the performance of the car via a stick shift is unnecessary for most. In fact it is an inconvencience given that people typically eat, drink, do their hair and talk on the cell phone simultaneously.

Internals - Internally the XmlNameTable is vital to the XmlReader. As characters are read from the stream these are added to the XmlNameTable via the Add Method (Char[], Int32, Int32) method (note: no string creation here) and then a hashing algorithm is applied to the character array to create a hash value to store for later lookup when used. All element and attribute names, namespaces and prefixes are stored in the internal hash table. The hashing algrorithm was ok for v1.1 and in the V2 release we did much more research to make this significantly better to help get that 2x performance improvement. This process is called string atomized and is done by the XmlReader classes such as the XmlTextReader. All “strings” must be added to the XmlNameTable, you cannot choose to ignore any otherwise the XmlReader implementation breaks as object reference comparison is performed everywhere internally. The benefit of the XmlNameTable is not only object comparison, but also that it greatly reduces the number of strings that need to be allocated during parsing. It would be very slow if there were no XmlNameTable since a new string would be allocated for each name.

User created - You could choose to implement your own XmlNameTable since it is an abstract class (the default implementation is the NameTable class) however this is hard for a fast general purpose algorithm and acheivable for a specific instance. Using the CLR Hashtable class instead is not a good idea for example since you have to create strings in order to add these to the Hashtable which adds a significant performance overhead. However you could imagine a scenario where you optimize for the most repeated expected names in your document  i.e. have a faster lookup for them with a B-tree or similar.

Usability - Although the XmlNameTable is useful, you should not feel that you have to use this everywhere in your code. That is why most examples in the .NET documentation do not show it being used (although the ones that do show the XmlNameTable do not show the best approach - a V2 doc fix has been done here). One performance tenet is “don't optimize your code unless it gives measurable and needed benefits”, so if non-XmlNameTable parsing is good enough for your scenario then you're done. However, I would always recommend sharing the XmlNameTable across components noting that in V1.1 the NameTable implementation is not thread safe. There will be thread safe version in V2. Hence do not create separate threads with XmlTextReaders using the same NameTable and expect this to work in V1.1.

Here is a example of reading several XML documents from a directory using the same NameTable for each XmlTextReader and then independently using the same NameTable when loading a separate XML document.

static XmlNameTable nt = new NameTable();

static void GlobalNameTable()

{

      int invoicecount = 0, lineitemcount = 0;

      object book = nt.Add("book");

      object price = nt.Add("price");

      object invoice = nt.Add("Invoice");

      object lineitem = nt.Add("LineItem");

      object lineitems = nt.Add("LineItems");

      object description = nt.Add("Description");

      object cust = nt.Add("CustomerName");

      //Create the reader.

      string[] files = Directory.GetFiles("input");

      foreach (string file in files)

      {

            XmlTextReader reader = new XmlTextReader(file, nt);

            while (reader.Read())

            {

                  object localname = reader.LocalName;

                  if (invoice == localname)

                        invoicecount++;

                  if (lineitem == localname)

                        lineitemcount++;

            }

      }

      XmlDocument doc = new XmlDocument(nt);

      doc.Load("anotherfile.xml");

}