Inside MSXML Performance


Chris Lovett
Microsoft Corporation

February 21, 2000

Download Xmlperf.exe.


MetricsMSXML FeaturesWorking SetMegabytes Per SecondAttributes vs. ElementsFirst DOM Walk Working Set DeltacreateNode OverheadWalk vs. selectSingleNodeSaveNamespacesFree-Threaded DocumentsDelayed Memory CleanupVirtual MemoryIDispatchScriptingThe Dreaded "//" OperatorPrune the Search TreeCross-Threading ModelsConclusion

I definitely got the message from your online comments that we need more "novice-level" material and some real XML applications. However, this article was already in the pipeline-and is intended for the advanced XML developer. (After all, this column is called "Extreme XML"!) That said, this article assumes you are familiar with XML and the Microsoft XML Parser (MSXML) in particular. See the MSDN XML Developer's Center for more information.

So, you're designing your XML-based Web application and you need to know what kind of performance to expect from your XML server. Obviously, this depends a lot on what processing you plan to do. It is hard to generalize, because there are so many variables—such as the size of the XML documents, the amount of script code required to process the documents, the amount of output generated, and so on.

For example, major variables that can affect the performance of MSXML include:

  • The kind of XML data
  • The ratio of tags to text
  • The ratio of attributes to elements
  • The amount of discarded white space

To illustrate some of these variables, I'll use four sample data files. Shown below is a snippet from each file to show you what each looks like:


This sample file is a persistently saved ADO Recordset object—and is extremely attribute heavy. Each attribute value is short, with little wasted white space, making it a data-dense document.

<rsSchema:row au_id='267-41-2394' au_lname='O'Leary' au_fname='Michael'
    phone='408 286-2428' address='22 Cleveland Av. #14' city='San Jose' state='CA'
    zip='95128' contract='True' name='systypes' id='4' uid='1' type='S ' userstat='0'
    sysstat='113' indexdel='0' schema_ver='1' refdate='1900-01-01T00:00:00'
    crdate='1996-04-03T03:38:57.387000000' version='0' deltrig='0' instrig='0'
    updtrig='0' seltrig='0' category='0' cache='0'/>


This sample file consists of Shakespeare's play "Hamlet." The file is a well -balanced combination of text and element markup, with no attributes.

<SCENE><TITLE>SCENE I.  Elsinore. A platform before the castle.</TITLE>
<LINE>Who's there?</LINE>


This sample file consists of the entire Old Testament. Each tag is only one or two characters, which reduces the tag-to-text ratio.

<bktlong>The First Book of Moses, Called GENESIS.</bktlong>
<chapter><chtitle>Chapter 1</chtitle>
<v><vn>1</vn><p>In the beginning God created the heaven and the earth.</p></v>


This sample file contains a portion of the Northwind database that ships with Microsoft Access. It uses elements instead of attributes, and has a high tag-to-text ratio, and has a lot of extra white space.

        <OrderID> 10326</OrderID>
        <OrderDate> 11/10/94</OrderDate>
        <ShipAddress> C/ Araquil, 67</ShipAddress>

Another major factor is whether the original file is stored as UCS-2. For most XML documents in English, UTF-8 is half the size of UCS-2 because the Latin characters compress down to a single byte in UTF-8. But this is not true for all languages. For some Asian languages, UTF-8 is actually larger than UCS-2, because it can expand to three bytes per character in the worst case. To be fair, the best format to use for measuring performance is UCS-2 on disk so that the numbers are more globally meaningful.

The following table shows the UCS-2 file sizes, number of unique names, number of elements and attributes, number of text nodes, and amount of text content (in Unicode characters) for each of our sample files. It also shows a "tagginess factor," which is the ratio of element and attribute name characters to the rest of the file.

Sample File size Unique names Elements and attributes Text nodes Text content (characters) Tagginess (percentage)
Ado.xml 2,171,812 53 63,722 61,462 3890 18.7
Hamlet.xml 559,260 17 6637 5472 170,545 5.9
Ot.xml 7,663,624 12 71,417 47,302 3,236,900 1.4
Northwind.xml 488,140 12 3680 2761 31,155 6.0

The number of unique names is interesting because MSXML "atomizes" element and attribute names, meaning it creates only one string object for each unique name and points to that object from each element or attribute that shares the same name. This is important because the names of elements and attributes are typically highly repetitive. For example, the Ado.xml sample actually contains 63,722 element and attribute names, which consume a total of 407,148 bytes of the overall file size. This is a tag-to-file size ratio of over 18 percent! But out of all these names remain only 53 unique names. So instead of using 407 KB of memory to store them, they can be stored in just a few kilobytes.


There are four key performance metrics that most of you are interested in as you develop your XML-based Web application:

  • Working set: The peak amount of memory used by MSXML to process requests. Once the working set exceeds available RAM, performance usually declines sharply as the operating system starts paging memory out to disk.
  • Megabytes per second: Simply a measure of raw speed for a given operation, such as the document load method. By itself it is interesting, but to get the real picture for a production application, you really need to consider the next two metrics as well.
  • Requests per second: A measure of how many requests the XML parser can handle per second. An XML parser might have a high megabytes-per-second rate, but if it is expensive to set up and tear down that parser, it will still have a low throughput in requests per second. This metric can help you calculate how many clients your server can handle under a peak load. Obviously, this depends on how heavily the clients are going to load up your server. For example, if the clients hit the server at a peak rate of one request per second, and if the server can do 150 requests per second, the server can probably handle up to 150 clients.
  • Scaling: A measure of how well your server can process requests in parallel. If your server is processing 150 client requests in parallel, then it is doing a lot of multi-threading. Processing 150 threads in parallel is rather much for one processor, which will spend a lot of time just switching between these threads. In this scenario, you might add more processors to the computer to share the load.

For example, a quad-processor server would need to process only 37 threads per processor—a more reasonable amount. (Scaling beyond this can be done with Web farms.) The goal is to scale linearly with the number of processors you add (for example, a quad-processor server should be four times faster than a single-processor computer). However, this is rarely achieved because there is usually some sort of contention for shared resources, such as memory, file system, registry, and network. Most components also contend for their own internal shared resources (for example, a global state that is protected by locks). Typically, when you add processors, scaling problems become a lot more obvious.

Disclaimer: I want you to understand that the numbers published here are not official in any way, but are intended to paint the overall picture so that you can get a feel for what kinds of things to expect and be able to make the right design choices while building your XML applications.

MSXML Features

Next, let's examine some important scenarios associated with the Document Object Model (DOM)—including loading, saving, walking a DOM tree, and creating a new DOM tree in memory.


The MSXML Document Object Model ("Microsoft.XMLDOM," CLSID_DOMDocument, IID_IXMLDOMDocument) is the starting point for all XML processing within the MSXML parser. The fastest way to load an XML document is to use the default "rental" threading model (which means the DOM document can be used by only one thread at a time; it doesn't matter which thread) with validateOnParse, resolveExternals, and preserveWhiteSpace all disabled:

    var doc = new ActiveXObject("Microsoft.XMLDOM");
    doc.validateOnParse = false;
    doc.resolveExternals = false;
    doc.preserveWhiteSpace = false;

Working Set

When using the DOM, the first metric to consider is the working set. Memory is used to load Msxml.dll and the other .dll files on which it depends. Some of these other .dll files are "delay loaded," which means the working set won't be affected until that .dll is used. MSXML is a COM DLL, so you typically use the standard COM APIs (CoInitialize and CoCreateInstance) to create a new XML document object. The minimum working set for a simple Visual C++ 6.0 command line application that uses COM is about one megabyte. (This includes the following .dll files: Ntdll.dll, Kernel32.dll, Ole32.dll, Rpcrt4.dll, Advapi32.dll, Gdi.dll, User32.dll, and Oleaut32.dll.) The first call to CoCreateInstance of an IXMLDOMDocument object loads Msxml.dll and Shlwapi.dll, which adds another 745 KB on top of this. Once all the .dll files are loaded, a new IXMLDOMDocument object is only about 8 KB.

The memory used by the XML data loaded into an XML document is anywhere from one to four times the size of the XML file on disk, depending on the "tagginess" of the data being loaded and whether the file was already in a Unicode format on disk. The following is a very rough formula for estimating the memory required for a given XML document:

ws = 32(n+t) + 12t + 50u + 2w;

The following table describes the parts of the formula:

Part Description
ws The working set in bytes.
n The number of element and attribute nodes in the tree. Each element, attribute, attribute value, and text content has one node (for example, <element attribute = "value">text</element> = four nodes).
t The number of text nodes.
u The number of unique element and attribute names.
w The number of Unicode characters in text content (including attribute values). Note that loading single-byte ANSI text into memory results in twice the number, because all text is stored as Unicode characters, which are two bytes each.

This assumes you do not set the preserveWhiteSpace flag; when you do, more nodes are created to preserve the white space between elements, using more memory.

For the sample data above, we see the following working set numbers (not including the initial startup working set):

Sample Working set Ratio to file size
Ado.xml 4,689,920 2.16
Hamlet.xml 704,512 1.25
Ot.xml 10,720,000 1.39
Northwind.xml 249,856 0.51

An element-heavy XML document containing a lot of white space between elements and stored in Unicode can actually be smaller in memory than on disk. Files that have a more balanced ratio of elements to text content, such as Hamlet.xml and Ot.xml, end up at about 1.25 to 1.5 the UCS-2 file size when in memory. Files that are very data-dense, such as Ado.xml, end up more than twice the disk-file size when loaded into memory.

Megabytes Per Second

For the megabytes-per-second metric, I loaded each sample file 10 times in a loop on a Pentium II 450-MHz dual-processor computer running Windows 2000, measured the load times, and averaged the results.

Sample Load time (milliseconds) MB/second Nodes/second
Ado.xml 677 3.2 184,909
Hamlet.xml 104 5.3 116,432
Ot.xml 1063 7.2 111,682
Northwind.xml 62 7.8 103,887

Also shown in this table is a measure of nodes per second. Notice how this correlates with megabytes per second. The more nodes processed per buffer of input data, the slower the absolute throughput. Conversely, the more compact the nodes are (as in Ado.xml), the higher the nodes per second.

Attributes vs. Elements

You could conclude from this that attribute-heavy formats (such as that of Ado.xml) deliver more data per second than element-heavy formats. But this should not be the reason for you to switch everything to attributes. There are many other factors to consider in the decision to use attributes versus elements.

First DOM Walk Working Set Delta

Walking the DOM tree for the first time has an impact on the working set metric because some nodes in the tree are created on demand. To illustrate this, I'll show the working set deltas resulting from the first walk over the freshly loaded sample data:

Sample Working set delta (percentage)
Ado.xml 0
Hamlet.xml 25
Ot.xml 14
Northwind.xml 36

According to these results, it seems that the ADO attribute case doesn't have any nodes created on demand.

createNode Overhead

Creating a DOM tree from scratch results in a higher peak working set than loading the same document from disk. To illustrate this, I created a DOM document with 10,000 elements looking like this:

    <item>this is a test</item>

Then I compared the load time, load plus walk time (to force on-demand construction), and create time, showing associated working sets.

Function Time (milliseconds) Working set (bytes)
Load 146 842,137
Load+Walk 148 1,173,914
Create 740 2,503,066

These results show that loading a document is roughly five times faster than creating the same document from scratch in memory. The reason is that the process of creating a document requires a lot of DOM calls, which slows things down. Merely loading the document bypasses all this and goes directly to the internal data structures.

Walk vs. selectSingleNode

The fastest way to walk the tree is to avoid the children collection and any kind of array access. Instead, use firstChild and nextSibling:

function WalkNodes(node)
    var child = node.firstChild;
    while (child != null)
        child = child.nextSibling;

The following table shows the results for the sample test files—walking all elements, attributes, and text nodes:

Sample Walk time (milliseconds) Number of nodes Nodes/second
Ado.xml 243 63,723 262,234
Hamlet.xml 63 12,100 192,063
Ot.xml 660 118,720 179,878
Northwind.xml 33 6,438 195,090

However, if you are looking for something in the tree, a much faster way to find it is to use XPath via the selectSingleNode or selectNodes methods. For example, I entered an XPath expression "//willnotfindanything", which walked the entire tree and got the following results that show the percentage improvement over a brute force tree walk:

File selectSingleNode (milliseconds) Improvement (percentage) Nodes/second
Ado.xml 173 29 368,341
Hamlet.xml 11 82 1,100,000
Ot.xml 113 82 1,050,619
Northwind.xml 5 84 1,287,600

selectSingleNode is faster because it avoids the overhead of calling through the COM layer. Instead of thousands of calls to firstChild and nextSibling, it works by one single call.

Now, this comparison is really not equal: On one hand, selectSingleNode was doing less walking, because it didn't need to look at the text nodes—it could just skip right over them. On the other hand, selectSingleNode was doing more work, because it was comparing each nodeName with the name specified in the query. But you get the idea. In general, selectSingleNode reduces the amount of DOM code you need to write and gives you a noticeable performance improvement.


Saving a document is generally slower than loading one. The following table summarizes the differences:

Sample Load (milliseconds) Save (milliseconds) Difference (percentage)
Ado.xml 677 1,441 113
Hamlet.xml 104 184 77
Ot.xml 1,063 1,971 85
Northwind.xml 62 103 66

When saving each sample, the worst case is the attribute-heavy ADO Recordset, which is more than twice as slow as loading it into memory. Given that it seems perfectly reasonable to expect a write operation to be slower than a read operation, these numbers are rather good. But be careful. You can easily strain the file system if you are loading and saving lots of small XML documents, as shown in the section "Free-Threaded Documents."


XML Namespaces also add some overhead in XML parsing time, but not as much as you might think. I have two versions of the Ado.xml example. One uses a namespace prefix "recordset:" on all 2,347 rows, and one does not. The following table shows the difference in load time:

Measurement Ado.xml Ado-ns.xml Difference (percentage)
File size 1,007,214 1,030,651 2.3
Load time (milliseconds) 662 680 2.7

Most of the difference in load time can be explained by the increase in file size.

Free-Threaded Documents

A "free-threaded" DOM document (CLSID_DOMFreeThreadedDocument, "Microsoft.FreeThreadedXMLDOM") exposes the same interface as the "rental" threaded document (IID_IXMLDOMDocument). This object can be safely shared across any thread in the same process.

Free-threaded documents are generally slower than rental documents because of the extra thread safety work they do. You use them when you want to share a document among multiple threads at the same time, avoiding the need for each of those threads to load up their own copy. In some scenarios, this can result in a big win that outweighs the cost of the extra thread safety work.

For example, suppose you have a 2-KB XML file on your Web server, and you have a simple ASP page that loads that file, increments an attribute inside the file, and saves the file again.

Response.Expires = -1;
var filename = Server.MapPath("simple.xml");
var doc = Server.CreateObject("Microsoft.XMLDOM");
doc.async = false;
var c = parseInt(doc.documentElement.getAttribute("count"))+1;

This ASP code will be completely disk I/O bound. On my Pentium II 450-MHz dual-processor computer, I was not able to get any more than 50 percent CPU utilization. The disk was making a lot of noise.

However, we could bring the file into shared-application state using a free-threaded DOM document, as follows:

Response.Expires = -1;
var doc = Application("shared");
if (doc == null)
    doc = Server.CreateObject("Microsoft.FreeThreadedXMLDOM");
    doc.async = false;
    Application("shared") = doc;
var c = parseInt(doc.documentElement.getAttribute("count"))+1;


Then we would see the throughput jump dramatically:

Method Requests/second
Load/Save 34
Shared 250

In other words, this second approach using the free-threaded DOM document is seven times faster than the other!

Delayed Memory Cleanup

The one downside to the free-threaded model is that it exhibits more latency in how it cleans up unused memory, affecting the performance of subsequent operations. (Some people report this as a memory leak when in fact it is just delayed cleanup.) The larger the XML document, the more pronounced the latency. The following table shows the increase in load time and working set when using the free-threaded document as opposed to the rental document:

Sample Load time (percentage increase) Working set (percentage increase)
Ado.xml 4 137
Hamlet.xml 23 83
Ot.xml 1 53
Northwind.xml 2 42

These results show that the worst case is when you have many nodes (as in Ado.xml), which makes sense because this generates more memory for the memory manager to clean up. Note that the benefits of being able to share the same document object across threads, as shown above, still outweigh the downside of slower load time and a larger working set.

Virtual Memory

In the free-threaded model, the working set can spike if you generate enormous amounts of memory for the memory manager to clean up—which can happen when performing XSL transformations on large documents. In this case, you will use up all available memory and strain the virtual memory manager, causing the performance to dramatically decrease. Watch the peak working set of your application under a heavy load to make sure this doesn't happen. If it does, redesign your application by breaking the XML down into smaller chunks. This situation has been improved somewhat in the MSXML January 2000 Web Release. In fact, we got the following reader comment on the January 2000 Web Release page:


I've done a couple of informal speed trials on server-side transforms (XML to HTML via XSL), and noticed that there's a big improvement.

- Anonymous 9-Feb-2000


Late-bound scripting languages, such as JScript and VBScript, add a lot of overhead to each method call and property access in the DOM interface. The script engines actually invoke the methods and properties indirectly through the IDispatch interface. First, both script engines call GetIDsOfNames or GetDispID, which will pass in a string name for the method or property and return a DISPID. Then the engines package all the arguments into an array and call Invoke with the DISPID.

Need I say more? Clearly, this is going to be slower than calling a virtual function in C++ or compiled Visual Basic. Visual Basic is a bit tricky, though, because you can actually do both styles of programming in one application. For example:

    Dim doc as Object
    set doc = CreateObject("Microsoft.XMLDOM")

This is late-bound and will be as slow as VBScript or JScript. To speed this up, from the Project menu, select References and add a reference to the latest version of the "Microsoft XML" library. Then you can write the following early-bound code:

    Dim doc As New MSXML.DOMDocument

The other advantage of programming this way is that you'll get all sorts of helpful drop-down lists of available methods and their arguments from Visual Basic.


Validation compares the types of elements in an XML document against a Document Type Definition (DTD) or XML Schema. For example, the DTD may say that all "Customer" elements must contain a child "Name" element. Take a look at the DTD for Hamlet.xml (hamletdtd.htm) and the XML Schema for Hamlet.xml (hamletschema.htm).

Validation is another huge area for performance analysis, but I only have time for a brief mention today. Validation is expensive for several reasons. First, it involves loading a separate file (the DTD or XML Schema) and compiling it. Second, it requires state machinery for performing the validation itself. Third, when the schema also includes information about data types, any data types also have to be validated. For example, if an XML element or attribute is typed as an integer, that text has to be parsed to see if it is a valid integer.

The following table shows the difference between loading without validation, with DTD validation, and with XML Schema validation.

Sample Load (milliseconds) DTD (milliseconds) Schema (milliseconds) Schema plus datatypes (milliseconds)
Ado.xml 662 2,230 2,167 3064
Hamlet.xml 106 215 220 N/A
Ot.xml 1,069 2,168 2,193 N/A
Northwind.xml 64 123 127 N/A

The bottom line is to expect validation to double or triple the time it takes to load your documents. New to MSXML January 2000 Web Release is a SchemaCollection object, which allows you to load the XML Schema once and then share it across your documents for validation. This will be discussed in a future article.


XSL can be a big performance win over using DOM code for generating "transformed" reports from an XML document. For example, suppose you wanted to print out all the speeches by Hamlet in the sample Hamlet.xml. You might use selectNodes to find all the speeches by Hamlet, then use another selectNodes call to iterate through the lines of each of those speeches, as follows:

function Method1(doc)
    var speeches = doc.selectNodes("/PLAY/ACT/SCENE/SPEECH[SPEAKER='HAMLET']");
    var s = speeches.nextNode();
    var out = "";
    while (s)
        var lines = s.selectNodes("LINE");
        var line = lines.nextNode();
        while (line)
            out += line.text;
            line = lines.nextNode();
        out += "<hr>";
        s = speeches.nextNode();
    return out;

This works, but it takes about 1,500 milliseconds. A better way to tackle this problem is to use XSL. The following XSL style sheet (or template) does exactly the same thing:

<xsl:template xmlns:xsl="">
  <xsl:for-each select="/PLAY/ACT/SCENE/SPEECH[SPEAKER='HAMLET']">
    <xsl:for-each select="LINE">

You can then write the following simpler script code that uses this template:

function Method2(doc)
    var xsl = new ActiveXObject("Microsoft.XMLDOM");
    xsl.async = false;
    return doc.transformNode(xsl)

This takes only 203 milliseconds—it is more than seven times faster. This is a rather compelling reason to use XSL. In addition, it is easier to update the XSL template than it is to rewrite your code every time you want to get a different report.

The problem is that XSL is very powerful. You have a lot of rope with which to hang yourself, so to speak. XSL has a rich expression language that can be used to walk all over the document in any order. It is highly recursive, and the MSXML parser includes script support for added extensibility. Using all these features with reckless abandon will result in slow XSL style sheets. The following sections describe a few specific traps to watch out for.


It is convenient to call script from within an XSL style sheet, and it is a great extensibility mechanism. But as always, there is a catch. Script code is slow. For purposes of illustration, imagine that we wrote the following style sheet instead of the one shown previously:

<xsl:template xmlns:xsl="">
  <xsl:for-each select="/PLAY/ACT/SCENE/SPEECH[SPEAKER='HAMLET']">

This produces the same result, but it takes 266 milliseconds instead of 203 milliseconds—a whopping 23 percent slower. The more frequently your xsl:eval statements are executed, the slower the performance becomes. For purposes of illustration only, lets move the xsl:eval inside the inner for-each loop:

    <xsl:for-each select="LINE">

This one takes 516 milliseconds, more than twice as slow. The bottom line is to be careful with script code in XSL.

The Dreaded "//" Operator

Watch out for the "//" operator. This little operator walks the entire subtree looking for matches. Developers use it more than they should just because they are too lazy to type in the full path. (I catch myself using it all the time, too.) For example, try switching the select statement in the previous example to the following:

  <xsl:for-each select="//SPEECH[SPEAKER='HAMLET']">

The time it takes to perform the selection jumps from 203 milliseconds to 234 milliseconds. My laziness just cost me a 15 percent tax.

Prune the Search Tree

If there's anything you can do to "prune" the search tree, by all means do it. For example, suppose you were reporting all speeches by Bernardo from Hamlet.xml. All Bernardo's speeches happen to be in Act I. If you already knew this, you could skip the entire search of Act II through Act V. The following shows what the new select statement would look like:


This chops the time down from 141 milliseconds to 125 milliseconds, a healthy 11 percent improvement.

Cross-Threading Models

Before, the transformNode and transformNodeToObject methods required that the threading model of the style sheet and that of the document being transformed be the same. In the MSXML January 2000 Web Release, you can use free-threaded style sheets on rental documents and vice versa. This means you can get the performance benefit of using rental documents at the same time as the performance win of sharing free-threaded style sheets across threads.


The XML team at Microsoft has done some great work to improve the performance of the MSXML parser over the last three releases. They made a major improvement to the scalability of MSXML for the Windows 2000 release, as described at the end of Charlie Heinemann's article "What's New in XML for Microsoft Windows 2000." They added new interfaces in the MSXML January 2000 Web Release that allow caching of schemas and XSL templates. The Web Release also includes some important work to improve the memory management of XSL transformations, so that they scale better on the server.

I ran out of time in this article to explore the new features. If there's enough interest, perhaps I'll do a follow up article on them. The C++ source code used to get the performance numbers in this article and the same data is included in the xmlperf.exe file that accompanies this article.

Chris Lovett is a program manager for Microsoft's XML team.