How to: Remove Hidden Text from a Word 2007 Document by Using the Open XML API

This content is outdated and is no longer being maintained. It is provided as a courtesy for individuals who are still using these technologies. This page may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist.

The Office Open XML Package specification defines a set of XML files that contain the content and define the relationships for all of the parts stored in a single package. These packages combine the parts that comprise the document files for Microsoft® Office Excel® 2007, Microsoft Office PowerPoint® 2007, and Microsoft Office Word 2007. The Open XML object model allows you to create packages and manipulate the files that comprise the packages. This topic walks through the code and steps to remove hidden text from an Office Open XML package in Office Word 2007, although the steps are the same for each of the three 2007 Microsoft Office system programs that support the Office Open XML Format.

NoteNote

The code samples in this topic are in Microsoft Visual Basic® .NET and Microsoft Visual C#®. You can use them in an add-in created in Microsoft Visual Studio® 2008. For more information about how to create an add-in in Visual Studio 2008, see Getting Started with the Open XML Format SDK 1.0.

Remove Hidden Text from the Document

In the following code, you remove the hidden text from a document part.

Public Sub WDDeleteHiddenText(ByVal docName As String)
   '  Given a document name, delete all the hidden text.
   Const wordmlNamespace As String = "http://schemas.openxmlformats.org/wordprocessingml/2006/main"
   Dim wdDoc As WordprocessingDocument = WordprocessingDocument.Open(docName, true)
   Using (wdDoc)
      '  Manage namespaces to perform XPath queries.
      Dim nt As NameTable = New NameTable
      Dim nsManager As XmlNamespaceManager = New XmlNamespaceManager(nt)
      nsManager.AddNamespace("w", wordmlNamespace)
      '  Get the document part from the package.
      '  Load the XML in the document part into an XmlDocument instance.
      Dim xdoc As XmlDocument = New XmlDocument(nt)
      xdoc.Load(wdDoc.MainDocumentPart.GetStream)
      Dim hiddenNodes As XmlNodeList = xdoc.SelectNodes("//w:vanish", nsManager)
      For Each hiddenNode As System.Xml.XmlNode In hiddenNodes
         Dim topNode As XmlNode = hiddenNode.ParentNode.ParentNode
         Dim topParentNode As XmlNode = topNode.ParentNode
         topParentNode.RemoveChild(topNode)
         If Not topParentNode.HasChildNodes Then
            topParentNode.ParentNode.RemoveChild(topParentNode)
         End If
      Next
      '  Save the document XML back to its document part.
      xdoc.Save(wdDoc.MainDocumentPart.GetStream(FileMode.Create, FileAccess.Write))
   End Using
End Sub
public static void WDDeleteHiddenText(string docName)
{
   //  Given a document name, delete all the hidden text.
   const string wordmlNamespace = "http://schemas.openxmlformats.org/wordprocessingml/2006/main";
   using (WordprocessingDocument wdDoc = WordprocessingDocument.Open(docName, true))
   {
      //  Manage namespaces to perform XPath queries.
      NameTable nt = new NameTable();
      XmlNamespaceManager nsManager = new XmlNamespaceManager(nt);
      nsManager.AddNamespace("w", wordmlNamespace);
      //  Get the document part from the package.
      //  Load the XML in the document part into an XmlDocument instance.
      XmlDocument xdoc = new XmlDocument(nt);
      xdoc.Load(wdDoc.MainDocumentPart.GetStream());
      XmlNodeList hiddenNodes = xdoc.SelectNodes("//w:vanish", nsManager);
      foreach (System.Xml.XmlNode hiddenNode in hiddenNodes)
      {
         XmlNode topNode = hiddenNode.ParentNode.ParentNode;
         XmlNode topParentNode = topNode.ParentNode;
         topParentNode.RemoveChild(topNode);
         if (!(topParentNode.HasChildNodes))
         {
            topParentNode.ParentNode.RemoveChild(topParentNode);
         }
      }

      //  Save the document XML back to its document part.
      xdoc.Save(wdDoc.MainDocumentPart.GetStream(FileMode.Create, FileAccess.Write));
   }
}

To remove the hidden text from a document part

  1. First, pass in a parameter representing the path to and the name of the source Word 2007 document.

  2. Then, open the document as a WordprocessingDocument object.

    The remainder of the code uses XPath statements to remove the elements in the main document part that refer to the hidden text.