Whitespace Handling in XSLT Transformation

 

Using Managed XslCompiledTransform class, native MSXML DOMDocument, and Tools (Internet Explorer and XSLT Debugger)

Mohan Vanmane
Microsoft Corporation

May 2006

Applies to:
   Visual Studio® 2005
   Internet Explorer

Summary: This article explains whitespace-handling issues in several scenarios for XSL transformation. (9 printed pages)

Contents

Introduction
Key Points for Whitespace Handling in XSL Transformation
Transformation Results
Best Practices for Handling Whitespaces
Whitespace Handling in XSLT Applications
Sample Applications
Acknowledgements

Introduction

Last week I developed an XSLT using the new XSLT debugger in Visual Studio 2005. Since I was happy with the transformation output in the XSLT Debugger, I deployed my solution on my Web site. However, I was surprised to see that the XSLT transformation output in Internet Explorer looked very different. After investigation, I realized that the difference was related to the whitespace handling in XSLT Debugger and Internet Explorer. This experience prompted me to write this paper to explore whitespace handling in XSLT transformation applications in various APIs and tools.

In XML, carriage return, linefeed, tab, and the spacebar are the whitespace characters. In XSL transformation, whether or not whitespace in input XML is preserved in the output depends on whitespace handling settings (default or explicit). These settings are specified at various places in the application, such as:

  • In the input XML document.
  • In the APIs used in the transformation process.
  • In XSLT.

Each of these components has default settings for whitespace handling that can optionally be altered. The ability to specify whitespace handling at different levels in the application is often a source of confusion; developers writing XSLT applications often spend time investigating the unexpected presence (or absence) of whitespace in the XML output.

This paper explains whitespace-handling issues in the following scenarios for XSL transformation:

  • Writing managed applications using XslCompiledTransform class.
  • Writing native applications using MSXML DOMDocument.
  • Using Internet Explorer and XSLT Debugger tools.

Key Points for Whitespace Handling in XSL Transformation

The key points in understanding whitespace handling in XSL transformation are:

  • You can explicitly specify that whitespace in the input document be preserved by adding xml:space="preserve" (for example, <someElement xml:space=preserve.../>. If you do this, the whitespace will be preserved in the output document no matter what.

  • If the input document does not specify to preserve whitespace, the whitespace may be stripped at different stages of processing the input document (parsing, caching, or transformation as shown in the following diagram). This stripping is permanent (that is, subsequent processing is forced to respect it).

Figure 1. Whitespace handling in XSL transformation

Note that a whitespace in an input document falls into two categories:

  • Whitespace that is allowed to be stripped. Depending on your application settings, these whitespaces may or may not be preserved in the output.
  • Whitespace that is not allowed to be stripped. This whitespace must be preserved in the output no matter what your application's whitespace handling settings are.

Before going into further details, let us review the following example.

Example

Consider the following example XML and XSLT files. The transformation output discussed later shows the impact of whitespace handling. Note that these XML and XSLT documents are later used in sample applications.

Book.xml

The following is a sample XML instance:

<?xml version="1.0" encoding="utf-8"?>
<document>
   <title> My Document </title>
   <paragraph>
      <sentence xml:space="preserve">
         <emphasis>Hello</emphasis> <emphasis>World</emphasis>
      </sentence>
      <sentence>
         <emphasis>Hello</emphasis> <emphasis>World</emphasis>
      </sentence>
   </paragraph>
</document>

BookXslt.xslt

The following XSLT transforms Book.xml, producing the results shown later in the example.

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

   <xsl:output method="html"/>

   <xsl:template match="document">       
      <html>
         <h1>[<xsl:value-of select="title"/>]</h1>
         <xsl:for-each select="paragraph/sentence">
            <p>
               <xsl:for-each select="node()">
                  <xsl:choose>
                     <xsl:when test="normalize-space(.)=''">-</xsl:when>
                     <xsl:when test="name()='emphasis'"><b><xsl:value-of select="."/></b></xsl:when>
                     <xsl:otherwise><xsl:value-of select="."/></xsl:otherwise>
                  </xsl:choose>
               </xsl:for-each>
            </p>
         </xsl:for-each>
      </html>    
   </xsl:template>

</xsl:stylesheet>

Note that in the transformation, this XSLT does the following:

  • Adds brackets around the <title> value. The brackets are added only to show whitespaces in the text node (leading, trailing, and in between the words "My Document").
  • Processes nodes inside each of the <sentence> elements, as follows:
    • For each node, if the normalize-space function returns an empty string (indicating all whitespaces), the node is represented by a hyphen ('-') in the output.
    • Otherwise, the node value is added in the output.

Transformation Results

As described earlier, whitespaces in the input XML that must be preserved will be preserved in the output. However, depending on your application's whitespace handling settings, the whitespaces that are allowed to be stripped might be either stripped or preserved. This creates two potential outcomes of this XSL transformation.

Result A: Whitespaces That Are Allowed to Be Stripped Are Stripped

If the whitespaces that are allowed to be stripped are stripped by your application, the XSL transformation produces the following output.

Note that:

  • The XSLT transformation adds brackets around the content of the <title> element.

  • The <title>element value in the input XML includes text (My Document). It also includes leading and trailing whitespaces, and a whitespace in between the words "My Documents". These whitespaces are not allowed to be stripped. Therefore, regardless of your application whitespace handling settings, they will be preserved in the resulting output as shown.

  • The first <sentence> element in the input XML has the following whitespaces:

    • A new line between the <sentence> start tag and the first <emphasis> start tag.
    • A space between the two <emphasis> elements.

    These whitespaces are all allowed to be stripped. But because the <sentence> element specifies xml:space="preserve", they will be preserved in the output regardless of your application's whitespace handling settings.

    In BookXslt.xslt, for each of these whitespaces, the transformation adds a hyphen ('-') in the output, producing –Hello-World-.

  • The second <sentence> element is similar to the first <sentence> element, except that it does not specify xml:space="preserve". Therefore, if your application's settings strip these whitespaces the transformation returns HelloWorld concatenated.

Result B: All Whitespaces Are Preserved

If all whitespaces in the input XML are preserved in the output, the transformation produces the following:

Best Practices for Handling Whitespaces

To preserve some or all white spaces in the input document, the following are recommended best practices:

  1. If you are the author of the input XML, use xml:space=preserve in the document where you want to preserve whitespaces.
  2. If you don't have control over the content of the input XML and you are only processing the input XML, the next best thing is to explicitly specify preserve on the API (XPathDocument, XmlDocument or DomDocument) in your application. These APIs strip the whitespace by default.
  3. Finally, if whitespaces that can be stripped remain in the input document at the time of transformation, you can strip them by adding <xsl:strip-space...> in the XSLT file. By default, XSLT preserves whitespaces.

Specifying xml:space="preserve" in the XSLT stylesheet will output the whitespaces found in the scope of the XSLT element it is specified.

With this overview, we can now explore how you can handle whitespaces when developing XSLT transformation applications.

Whitespace Handling in XSLT Applications

This section discusses default whitespace handling (and ways to change the behavior) in:

  • XSLT applications (native and managed).
  • Tools (Internet Explorer and XSLT Debugger).

Default Whitespace Handling in Managed Applications

In managed applications, the XslCompiledTransform.Transform method applies XSL transformation to the input XML. The input XML may be provided by its file name or by using an API such as XPathDocument, XmlDocument or XmlReader. The following table lists these application components with their default whitespace handling behavior and how you can change it.

  • The first column shows ways you can provide input document to the XslCompiledTransform.Transform method.
  • The second column shows the default whitespace handling behavior.
  • The third column shows how you can alter the default behavior. This column shows only code fragments. Fully working samples are provided later in the paper.
Input mode Default Altering the Default Behavior
XML file name Preserve (Result B) When the file name is passed as a parameter, the Transform method loads the document using XmlTextReader, which by default preserves whitespaces. In this case, you don't have any way to specify changing the default. The alternative is to use XmlReader and set whitespace handling on the reader, instead of passing the file name directly.

For a working sample, see Example D and Example E.

XmlReader Preserve (Result B) The following code fragment provides an example of stripping whitespaces.

XmlReaderSettings settings = new XmlReaderSettings();

settings.IgnoreWhitespace = true;

In the code:

You first create the XmlReaderSeattings object and set its IgnoreWhitespace Boolean property to true.

Then you pass this object to the XmlReader.Create method, which strips the whitespaces.

For a working sample, see Example C and Example E.

XmlDocument Strip (Result A) The following code fragment shows how you can preserve whitespaces when using XmlDocument:

XmlDocument xd =
new XmlDocument(); xd.PreserveWhitespace = true; xd.Load(xmlFile);

Setting the PreserveWhitespace property as shown preserves the whitespaces.

For a working sample, see Example B and Example E.

XPathDocument Strip (Result A) The following code fragment shows how you can preserve whitespaces when using XPathDocument.

XPathDocument document = new XPathDocument(xmlFile, XmlSpace.Preserve);

Passing the XmlSpace object as parameter when creating the XPathDocument object preserves whitespaces.

For a working sample, see Example A and Example E.

Default Whitespace Handling in Native Applications

In native applications, the DOMDocument.transformNode method applies XSL transformation to the input XML. When using the DOMDocument object, you can preserve whitespaces by using the preserveWhitespace property. The following table shows the default whitespace handling behavior of the DOMDocument object and how you can change it.

Input Default Altering the Default Behavior
DOMDocument Strip (Result A) To preserve whitespaces, you can set the preserveWhiteSpace property of the DOMDocument to true as shown in the Sample Applications section.

For a working sample, see Example F.

Default Whitespace Handling in Tools

The following table describes default whitespace handing behavior when using Internet Explorer and XSLT Debugger and how you can alter the default behavior.

Tool Default Altering the Default Behavior
Internet Explorer Strip (Result A) To preserve whitespaces, add the xml:space="preserve" attribute in the input XML document.

If the XML document does not specify preserving whitespaces, then Internet Explorer loads the XML document in DOM, which by default strips whitespaces. Subsequently, in applying transformation, whitespace handling specified in XSL does not matter because whitespaces are already removed.

If the XML document specifies xml:space="preserve", whitespaces will be preserved no matter what is specified in XSL.

In other words, in Internet Explorer, specification of whitespace handling behavior in XSL (by adding <xsl:preserve-space .../> or <xsl:strip-space .../>) has no impact on the resulting transformation.

XSLT Debugger Preserve (Result B) To strip whitespaces, add the <xsl:strip-space .../> element in the XSLT. For example, to strip all whitespaces allowed to be stripped, add the <strip-space> element, as follows:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:strip-space elements="*" />

...

You cannot alter whitespace-handling behavior using the tool.

Sample Applications

This section provides samples that illustrate API whitespace handling behavior and how you can change it. Both managed applications using XslCompiledTransform and a native MSXML application using DOMDocument are provided.

Both applications use the Book.xml and BookXslt.xslt files provided earlier in this document.

Example A: Managed application using XslCompiledTransform with XPathDocument

In this application, the input XML document (Book.xml) is provided to the Transform method using the XPathDocument. By default,XPathDocument strips whitespaces. To preserve whitespaces, this example specifies the XmlSpace.Preserve parameter to the XPathDocument constructor.

    using System;
    using System.Xml;
    using System.Xml.Xsl;
    using System.Xml.XPath;
    using System.IO;
    using System.Diagnostics;
    namespace XmlCore.Test
    {
        class Invoker
        {
            [STAThread]
            static void Main(string[] args)
            {
                string xmlFile = "InputXML.xml";
                string xsltFile = "BookXslt.xslt";
                XPathDocument xd = new XPathDocument(xmlFile,
                                                           XmlSpace.Preserve); 
                XslCompiledTransform xslt = new XslCompiledTransform(true); 
                xslt.Load(xsltFile, XsltSettings.TrustedXslt, 
                         new XmlUrlResolver());
                xslt.Transform(xd, null, 
                           new XmlTextWriter(new StreamWriter("Result.htm")));
                Process.Start("Result.htm");
            }
        }
    }
Imports System.Xml
Imports System.Xml.Xsl
Imports System.IO
Imports System.Xml.XPath
Module Module1

    Sub Main()
        Dim xmlFile As String
        Dim xsltFile As String
        xmlFile = "InputXML.xml"
        xsltFile = "XsltFile.xslt"

        Dim xd As XPathDocument = New XPathDocument(xmlFile, XmlSpace.Preserve)
        Dim xslt As XslCompiledTransform = New XslCompiledTransform(True)
        xslt.Load(xsltFile, XsltSettings.TrustedXslt, New XmlUrlResolver())
        xslt.Transform(xd, Nothing, New XmlTextWriter(New StreamWriter("Result.htm")))
        Process.Start("Result.htm")

    End Sub

End Module

Note that:

  • The input XML and the XSLT documents do not specify explicit whitespace handling.

  • Because the application specifies to preserve whitespace, whitespaces are preserved in the output as shown.

Example B: Managed application using XslCompiledTransform with XmlDocument

Instead of passing in XPathDocument as shown in Example A, you can pass in XmlDocument to the Transform method. By default, the XmlDocument strips the whitespaces. To preserve the whitespaces, set thePreserveWhitespace property to true as shown in the following code fragment.

    XmlDocument xd = new XmlDocument();
    xd.PreserveWhitespace = true;
    xd.Load(xmlFile);
    ...
    [Visual Basic]
    Dim xd As XmlDocument = New XmlDocument()
    xd.PreserveWhitespace = True
    xd.Load(xmlFile)
    ...

Note that:

  • The input XML and the XSLT documents do not specify explicit whitespace handling.

  • Because PreserveWhitespace is set to true, whitespaces are preserved in the output as shown:

Example C: Managed application using XslCompiledTransform with XmlReader

Instead of passing in XPathDocument as shown in Example A, you can pass in XmlReader, which by default preserves whitespaces. To strip whitespaces you set the IgnoreWhitespace property on the XmlReaderSettings object to true as shown in the following code fragment.

    XmlReaderSettings settings = new XmlReaderSettings();
    settings.IgnoreWhitespace = true;
    XmlReader xd = XmlReader.Create(xmlFile, settings);
    [Visual Basic]
    Dim settings As XmlReaderSettings = New XmlReaderSettings()
    settings.IgnoreWhitespace = True
    Dim xd As XmlReader = XmlReader.Create(xmlFile, settings);

This is the result:

Example D: Managed application using XslCompiledTransform – and passing the input XML file name to the Transform method

You can alter any of the examples above to specify the file name as the first parameter to the Transform method. As discussed in the table earlier, note that whitespace is preserved in the resulting output.

Example E: Managed application using XslCompiledTransform and explicitly preserving whitespaces by adding the xml:space attribute in the input XML

In the input XML, if you add xml:space attribute (for example<document xml:space="preserve">) and use any of the examples A-D, you will notice that regardless of the API settings in your code, whitespaces are preserved.

Example F: Native MSXML application using the preserveWhitespace property of DOMDocument to preserve whitespaces

This native XSL transformation application loads the input XML and XSLT documents by using DOMDocument. The transformNode method applies XSL transformation, producing the output.

By default, DOMDocument strips whitespace. To preserve whitespace you set the preserveWhiteSpace propertyon the DOMDocument object to true as shown in the following Jscript code

var xml = new ActiveXObject("MSXML2.DOMDocument.6.0");
xml.async = false;
xml.preserveWhiteSpace = true;
if(!(xml.load("InputXml.xml")) )
   WScript.Echo(xml.parseError.reason);
var xsl = new ActiveXObject("MSXML2.DOMDocument.6.0");
xsl.async = false;
xsl.load("XsltFile.xslt");
try{
      var output = xml.transformNode(xsl);
      WScript.Echo(output);
}
catch(err)
{
      WScript.Echo("XSL TRANSFORM ERROR : " + err.description);
}

Acknowledgements

I would like to thank Avner Aharoni, Andrew Kimball, Swapna Guddanti, Umat Alev, and Chris Lovett for their quality and timely input in helping me understand whitespace handling in XSLT transformation applications. Their understanding of the subject matter provided me the insight needed to get my work done and write this paper.