What's New in XML for Microsoft Windows 2000

 

Charlie Heinemann
Microsoft Corporation

Updated January 10, 2000

The Microsoft XML parser (MSXML.DLL) that ships with Microsoft® Windows® 2000 is essentially a service pack release. It contains a minor amount of new features and a good deal of bug fixes. This version of MSXML.DLL is also available in the Web release of Internet Explorer 5.01, and so it also works on Windows NT 4.0. The most outstanding difference between this version of the parser and its predecessor, however, is the substantial improvement made in the areas of reliability and scaling. The new version of the parser is better equipped for the server, with improved performance under stress and improved scaling in multiprocessor scenarios.

This article outlines the new features of the parser and describes changes in behavior that affect its functionality. For more detailed information about the new features, see the XML reference documentation at the XML Developer Center.

New Features of the Microsoft XML Parser

The following new features of the Microsoft XML parser are described in detail:

  • Data type support on attributes
  • Elements as type "id"
  • Data types as element names
  • Instantiation of COM objects in XSL script blocks

Data Type Support on Attributes

The data type support provided with the Internet Explorer 5 version of the Microsoft XML parser has been expanded to include simple data type support (such as "int", "float", and "date") on attributes. For the full list of supported simple data types, see the XML-Data Schemas reference documentation.

Expanded data type support within the parser makes it possible to have attribute-centric data that can still be assigned a data type. The data type can be declared on an attribute node through schema just as it is declared on an element node. Use the dt:type attribute on the AttributeType element:

<AttributeType name="y" dt:type="int"/>
<ElementType name="x">
    <attribute type="y"/>
</ElementType>

Elements as Type "id"

An element can now be of type "id" as shown in the following code:

<foo xmlns:dt="urn:schemas-microsoft-com:datatypes">
  <bar dt:dt="id">fooid</bar>
</foo>

The bar element has a type of "id", meaning that the text value of the bar element can be used to reference the foo element. In addition to assigning the "id" type to the element on the instance, a schema can also be used to assign the type "id" to the element.

Once an element has a type of "id", the parent of that element can be referenced by the value of the id element. For example, if you were to pass the string "fooid" to the nodeFromID method, that method would return the foo element.

xmldoc.nodeFromID("fooid")

Data Types as Element Names

Currently, data types can be declared on the instance using the dt:dt attribute or in the schema using the dt:type attribute or datatype element. For the Windows 2000 release of the Microsoft XML parser, elements can also be assigned types through their names.

The following XML element is of type integer, because its name is of the "urn:schemas-microsoft-com:datatypes" namespace, and it has a local name that is a valid data type, "int":

<dt:int>8</dt:int>

This new way of assigning types to elements gives you a shorthand method of creating typed elements.

Instantiation of COM Objects in XSL Script Blocks

In the version of the Microsoft XML parser that shipped with Internet Explorer 5, the instantiation of COM objects within xsl:script blocks was not allowed for security reasons. For the Windows 2000 release of the Microsoft XML parser, this situation has been fixed, and COM objects can now be instantiated safely within xsl:script blocks.

Changes in Behavior of the Microsoft XML Parser

A few changes in behavior were introduced with the Windows 2000 version of the Microsoft XML parser. These changes are the result of both bug fixes and customer feedback. The following table lists bug fixes that could result in unanticipated changes in behavior.

Bug Fixes

Functional area Bug description Comments
XSL Patterns A query with // returns duplicates when elements are defined by an entity. XSL pattern navigation into the content of a DocumentType node is now disallowed.
Data Types Element retains data type even after data type declaration (i.e., "dt:dt='int'") is removed. An element no longer retains its data type when data type declaration is removed.
Namespaces Reserved namespaces are allowed to qualify elements. The prefix "xml" in any combination of uppercase or lowercase letters cannot be set by the user. It is a reserved namespace prefix.
Namespaces Namespace declarations on empty elements are not being persisted. Namespace declarations on empty elements are now persisted.
Data Types Data type validation is occurring prior to XSL transformations. This causes XSL elements such as the following to error:
<price dt:dt="number">
<xsl:value-of select="price"/>
</price>
The Microsoft XML parser no longer validates the data type before XSL is able to do the transformation.
XML Data Source Object The oncellchange event fires twice. The oncellchange event now fires only once per instance of change in the data.
Object Model Getting the previousSibling property of the first child of an element returns the last attribute in the attributes collection on the parent element (if the attribute collection is not empty). The previousSibling property of the first child of an element now returns null (regardless of whether the attributes collection on the parent element is empty).
Object Model A cloned document retained the ID mappings of the original document (i.e., ID navigation returned you to the original document). The IDs in a cloned document are mapped to nodes within that cloned document, not to the nodes in the original document.
Object Model Cloned documents do not contain a clone of the original document's inner subset. Cloned documents now contain a clone of the original document's inner subset (provided one exists).
XML DSO Replacing the root element by setting the documentElement property does not cause the shape of the recordset to be rebuilt. Replacing the root element by setting the documentElement property now causes the shape of the recordset to be rebuilt.
Validation Invalid XML is allowed to exist in a DTD (as value of an entity). Invalid XML is no longer allowed to exist in a DTD (once the entity is resolved).
Object Model No error messages in Visual Basic®. Error messages are now available in Visual Basic.
Object Model Setting the nodeTypedValue property of an element of type boolean results in invalid XML after persisting. The value "1" is now persisted when the boolean value is set to true rather than "-1". The value "-1" is not a valid value for nodes of type boolean.
IE4 Object Model Attribute value incompatible with Internet Explorer 4.0. \t and new lines are now converted into ' ' (space) in attribute values. This is compatible with the Internet Explorer 4.0 implementation.
Parser The Microsoft XML 2.0 parser does not support the encoding "us-ascii". The Microsoft XML parser now supports "us-ascii".
Object Model Not normalizing new lines is a big problem for HTML-based XML applications. Now a carriage return and a new line are converted to a new line.
Object Model Loading an XML document using the loadXML method forces resolveExternals to False. If there is no security context (such as in C, C++, or an HTML behavior), the resolveExternals property is False. If there is a security context (as in an HTML page), it should be set to True.
Object Model The loadXML method returns S_FALSE if the isSuccessful parameter is NULL. The Microsoft XML parser returns only S_FALSE for parse failures.
Parser The Microsoft XML 2.0 parser is unable to handle parameter entities used in element declarations such as the following:
<!ELEMENT document (%inline;)>
Now the parser parses parameter entities correctly when in element declarations:
 <!ELEMENT document (%inline;)>
Data Types The bin.base64 implementation is incorrect. The Microsoft XML 2.0 parser recognizes "*" instead of "/" in the character set. The base64 character set is:
'ABCDEFGHIJKLMNOPQRSTUVWXYZ
abcdefghijklmnopqrstuvwxyz0123456789+/'. The Microsoft XML parser now recognizes this correctly.
XSL Script in XSL can alter the DOM in Internet Explorer but not from other containers. The Microsoft XML parser is now restricted from altering the DOM regardless of the container.
XML DSO The $Text field and the node's text value that is mapped to that field have different values. The $Text field and the node value fields are now both normalized.
XSL XSL exposes "dt" attributes for data types defined in a schema. The dt:dt attribute is no longer exposed in the DOM tree when data types are defined through a schema.
Parser IDs under entity declarations are registered. The Microsoft XML parser no longer registers IDs under entity declarations.

Other Changes in Behavior

The following changes are based on customer feedback and could cause your applications to change behavior:

  • Microsoft URNs are no longer case-sensitive.
  • If you call id() twice, referring to the same node, the second reference will be honored.
  • When accessing the namespaceURI property of an xmlns:foo and xml:space attribute, you will get "http://www.w3.org/XML/1998/namespace" and "" respectively. This change is now a bug and represents a break from the proper behavior in the Microsoft XML parser shipping with Internet Explorer 5. This will be corrected in a future release.

Stress and Performance Gains

A contention problem at the automation level has been addressed that now makes the parser scale positively when used in an Active Server Pages (ASP) file. This means that on multi-processor computers, the throughput of your .asp files using XML will increase dramatically. The following graph illustrates these gains where the x-axis is the number of processors and the y-axis is the number of ASP requests per second:

This was measured using a simple XML.ASP page that did some DOM tree operations.

In addition, a number of stress bugs have been fixed that greatly improve the parser's reliability under stress. The parser is now substantially more stable under stress, the proof of which can be seen in the deployment of the parser by Microsoft.com on some of its most heavily used sites.

Charlie Heinemann is a program manager for Microsoft's XML team. Coming from Texas, he knows how to think big.