Editing Data in an Excel 2007 Open XML File with VBA

Explore different ways to find particular XML data in an Excel 2007 file, retrieve the data, and then write it back to the file all by using Office Open XML. (14 printed pages)

Jan Karel Pieterse, Excel MVP

Frank Rice, Microsoft Corporation

June 2009

Applies to: Microsoft Office Excel 2007

Contents

  • Introducing Office Open XML Files

  • Finding Data in Worksheets

  • Adding Text to Cells

  • Adding Numbers to Cells

  • Editing Elements in Open XML Files with VBA

  • Using Class Modules

  • Uncompressing.xlsx or .xlsm Files

  • Extracting XML from Files

  • Writing XML to Worksheets

  • Writing XML Back to Files

  • Getting XML for a Specific Worksheets

  • Retrieving Relationship Ids

  • Retrieving Part Names

  • Retrieving Friendly Worksheet Names

  • Conclusion

  • Additional Resources

Introducing Office Open XML Files

The Microsoft Office Open XML file structure makes it possible to edit and update Microsoft Office 2007 files without the need to open the file in the application that created it. The components parts of the Open XML file are contained in an industry standard Zip container. Inside of the container are XML files (document parts) that: define the relationship between each part in the package, contain the data displayed in the document or workbook, define the styles and formatting used in the document, and contain information on the properties and settings of package and its document parts. Because it is a Zip file, you can see the document parts and structure of the file by adding the .zip extension to the document file name and opening the package. Figure 1 is an example of the structure and components of a Microsoft Excel 2007 workbook.

Figure 1. The structure and contents of an Excel 2007 workbook

Contents of an Excel workbook

In this article, we demonstrate how to manually explore the structure and parts of an Excel 2007 workbook and add text and numbers to a cell. After that, we provide Microsoft Visual Basic for Applications (VBA) code that programmatically opens the Zip file, extracts the XML contents of a document part, writes the XML back to the document part, and then closes the Zip file.

In the following sections, we demonstrate how to find the document part that contains the data in a worksheet, and add text and a number to a cell in the worksheet. We use the sample Excel 2007 file available for download from the JKP Application Development Service Web site.

Finding Data in Worksheets

To get started, open the workbook in the sample. The workbook contains a number of worksheets as shown Figure 2.

Figure 2. The contents of the sample Excel 2007 file

Contents of the sample Excel file

For this exercise, suppose that you want to find data that resides in the Comments worksheet without opening the workbook in Excel. Figure 3 shows the contents of the Comments worksheet. This information will be used later.

Figure 3. The contents of the Comments worksheet

Contents of the Comments worksheet

The first thing you want to do is to find which XML file in the package contains the data in the Comments worksheet. To do this, open the workbook.xml in an XML editor or text editor such as NotePad. Figure 4 shows an excerpt of the document part’s XML.

Figure 4. An excerpt of the workbook part

Excerpt of the workbook part

Looking the <sheet> element for the Comment worksheet, you see that the r:id attribute, the relationship identifier, is equal to rId7. To find which worksheet part this relationship Id points to, open the workbook.xml.rels part in the xl\_rels folder. Figure 5 shows an excerpt from the file.

Figure 5. An excerpt of the workbook.xml.rels part

Excerpt of the workbook.xml.rels part

The highlighted statement in Figure 5 shows the relationship rId7 points to sheet7.xml in the workbook. Thus the Comments worksheet is represented by the sheet7.xml part in the workbook package. See Figure 3 for the content of the Comments worksheet. Next, open worksheet sheet7.xml in the editor. Figure 6 displays the XML for this part which shows cells A1 through A3.

Figure 6. The XML for the sheet7.xml part

XML for the sheet7.xml part

Now suppose that you are interested in the content of cell A1 (<c r="A1" t="s">). How do you find that content? The t="s" attribute says that the cell contains a string. The package contains a xl\sharedStrings.xml part where the strings in a workbook are stored in a table. Because several cells may share the same string such as a postal code or country name, instead of repeating the same string in each cell, it is more efficient to store the string once in a separate part. Then each cell that uses a particular string contains a number representing the indexed location of that string in the sharedStrings part.

So looking at the next line <v>28</v> in the sheet7.xml part tells you that the value for that cell is contained in the shared strings table at index 28. Opening the sharedString.xml part shows the following information at the top of the file.

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" count="69" uniqueCount="46">

The uniqueCount attribute says that there are 46 unique strings in the workbook. Counting down to item 29, you see that the item contains the XML shown in Figure 7.

Figure 7. The contents of the sharedString.xml part

Contents of the sharedString.xml part

Adding Text to Cells

Now suppose that you want to add a new text entry, which is not in the shared strings table, into a row in the Comments worksheet.

  1. Update the string counts at the top of the sharedString.xml part. The updated XML should look like the following:

    <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    <sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" count="70" uniqueCount="47">
    
  2. Insert the new text into the sharedString.xml part at position 29 (the index is zero-based) as shown in Figure 8.

    Figure 8. Insert this text to position 29 in the sharedString.xml part

    Text to insert

  3. Add an entry to the sheet7.xml part as shown in Figure 9.

    Figure 9. Insert this data into the sheet7.xml part

    Data to insert

  4. Remove the .zip file extension from the filename and open the file in Excel 2007. The new text is in the Comments worksheet.

Adding Numbers to Cells

Adding numbers to a cell is similar to adding text to a cell with two differences:

  • The numbers are stored in the worksheet part with no reference to the shared string table.

  • You can add number formatting.

For example, Figure 10 shows the XML in sheet7.xml that refers to the number 12 in cell A2 (see Figure 3) with the General number format and no other special formatting.

Figure 10. The number 12 in cell A2

The number 12 in cell A2

Cells with formatting (styles) are referred to by an index number into the Styles.xml part. For example, Figure 11 shows the entry in sheet7.xml for the data shown in cell A3 (see Figure 3).

Figure 11. The entry for cell A3 in sheet7.xml

The entry for cell A3

Figure 12 shows the entry for the index 34 in the styles.xml part. This entry points to numFmtId = 14 which is an implicit value for the date format mm-dd-yy as shown in the table in Figure 13. For more information about implicit style formats, see the Standard ECMA-376 Office Open XML Formats: 2nd Edition Part 2.

Figure 12. The indexed entry for index 34 in the styles.xml part

Indexed entry for index 34

Figure 13. Table of implicit style formats

Table of implicit style formats

As this section demonstrates, adding formatted cells to a worksheet is not straightforward. The best way to do this is to use an existing format style as a model.

Editing Elements in Open XML Files with VBA

Now that you have had a chance to go through the process of manual reading and updating an Office Open XML Spreadsheet file, it is time to see how to edit the contents of a file programmatically with VBA. You can find a downloadable version of the file used in this section at the JKP Application Development Service Web site

The VBA project is located in the file EditOpenXML.xlsm and consists of two modules and one class module as seen in Figure 14. The modDemo module contains the code that calls the method in the clsEditOpenXML class. The modUNC module contains code to work with UNC paths.

Figure 14. The EditOpenXML project structure

EditOpenXML project structure

The project also uses the Microsoft XML, version 3.0 library to assist in working with the XML parts. The project also takes advantage of the File Scripting Object and the Windows Shell application object. The class module performs the following actions:

  1. Uncompress an .xlsx or .xlsm file (in this sample formcontrols.xlsm)

  2. Extract XML from the Office Open XML part in the file formcontrols.xlsm

  3. Write the resulting XML to Sheet2 in the EditOpenXML.xlsm file

  4. Write the XML back to the part

  5. Compress the Zip file

These actions are discussed in the sections that follow.

Using Class Modules

To call the procedures in the following section, I have created a module named modDemo.

Public Sub Demo()
    Dim cEditOpenXML As clsEditOpenXML
    Dim sXML As String

    Set cEditOpenXML = New clsEditOpenXML

    With cEditOpenXML
        'Indicate which OpenXML file to process.
        .SourceFile = ThisWorkbook.Path & "\formcontrols.xlsm"

        'Before you can access info in the file, it must be unzipped.
        .UnzipFile

        'Indicate which sheet you want to change.
        .Sheet2Change = "MySheet"

        'Get XML from the sheet's xml file.
        sXML = .GetXMLFromFile(.SheetFileName)
        'Change the xml of the sheet here.

        'To prove we really imported the sheet, write the content of
        'the cells to Sheet2 of this file.
        WriteXML2Sheet sXML

        'Write the xml back to the sheet.
        '.WriteXML2File sXML, .SheetFileName

        'Rezip the unzipped package.
        .ZipAllFilesInFolder
    End With

    'After you let the class go out of scope, then remove the zip file's .zip extension
    'in the terminate event of the class.
    'Then the OpenXML file has its original filename back.
    Set cEditOpenXML = Nothing
End Sub

Uncompressing.xlsx or .xlsm Files

To work with the files in the Office Open XML package, the file (the file named formcontrols.xlsm) must be uncompressed into its components parts. The code that follows does that. As a precaution, a backup copy of the package file is created. Next, a folder is created to contain the uncompressed files. Then the files are copied to the target folder.

Public Sub UnzipFile()
    ' This code is courtesy Ron de Bruin www.rondebruin.nl.
    Dim FSO As Object
    Dim oShellApp As Object
    Set FSO = CreateObject("scripting.filesystemobject")
        
    ' Derive the folder to unzip to from the location of the source file.
    UnzipFolder = FolderName
    
    ' In the same folder, create a dedicated unzip folder as the source file,
    ' named ..\ Unzipped Filename\
    If Right(UnzipFolder, 1) <> "\" Then
        UnzipFolder = UnzipFolder & "\UnZipped " & FileName & "\"
    Else
        UnzipFolder = UnzipFolder & "UnZipped " & FileName & "\"
    End If
    On Error Resume Next
    ' Remove all previous existing folders.
    FSO.deletefolder UnzipFolder & "*", True
    Kill UnzipFolder & "*.*"
    On Error GoTo 0

    ' Create normal folder.
    If FolderExists(UnzipFolder) = False Then
        MkDir UnzipFolder
    End If

    Set oShellApp = CreateObject("Shell.Application")
    ' Copy the files in the newly created folder.
    oShellApp.Namespace(UnzipFolder).CopyHere oShellApp.Namespace(SourceFile).items

    On Error Resume Next
    ' Clean up temp folder.
    FSO.deletefolder Environ("Temp") & "\Temporary Directory*", True
    
    ' The files are now located in unzipped folder.
    XLFolder = UnzipFolder & "xl\"
    Set oShellApp = Nothing
    Set FSO = Nothing
    Exit Sub
End Sub

Extracting XML from Files

The following code extracts the XML from the package part and then writes the XML to Sheet1 in the file EditOpenXML.xlsm to let you see what XML was extracted.

Public Function GetXMLFromFile(sFileName As String) As String
    Dim oXMLDoc As MSXML2.DOMDocument
    If Len(XLFolder) = 0 Then
        GetXMLFromFile = ""
    Else
        Set oXMLDoc = New MSXML2.DOMDocument
        oXMLDoc.Load XLFolder & sFileName
        
        Dim nodeList As IXMLDOMNodeList
        Dim node As IXMLDOMNode
        Dim nodeValue As String
        Dim concatValues As String
        Dim currRow As Integer
        'Retrieve the node list by using an XPath query.
        Set nodeList = oXMLDoc.SelectNodes("/worksheet/sheetData/row/c")
        currRow = 1
        ' Iterate through the node list and write the values to Sheet1 in this workbook. 
        ' Then, concatenate the node value to a string.
        For Each node In nodeList
            nodeValue = node.nodeTypedValue
            Range("A1").Offset(currRow - 1, 0).Value = nodeValue
            concatValues = concatValues + vbCrLf + nodeValue
            
            currRow = currRow + 1
        Next
        ' Return the concatenated string.
        GetXMLFromFile = concatValues
        ' GetXMLFromFile = oXMLDoc.XML
        
        Set oXMLDoc = Nothing
    End If
End Function

Writing XML to Worksheets

After reading the XML from the part, write the results to a worksheet.

Public Sub WriteXML2Sheet(sXML As String)
    Dim oNode As MSXML2.IXMLDOMNode
    Dim oNodeList As MSXML2.IXMLDOMNodeList
    Dim oXMLDoc As MSXML2.DOMDocument
    Dim lRow As Long
    Set oXMLDoc = New MSXML2.DOMDocument
    oXMLDoc.loadXML sXML
    Set oNodeList = oXMLDoc.SelectNodes("/worksheet/sheetData/row/c")
    For Each oNode In oNodeList
        If oNode.FirstChild.nodeTypedValue = "" Then
            ' Value:
            Sheet2.Range(oNode.Attributes(0).nodeValue).Value = oNode.nodeTypedValue
        Else
            ' Formula:
            Sheet2.Range(oNode.Attributes(0).nodeValue).Value = oNode.FirstChild.nodeTypedValue
        End If
    Next
End Sub

Writing XML Back to Files

After whatever processing that you want to do, write the updated XML back to the original file.

Public Sub WriteXML2File(sXML As String, sFileName As String)
    Dim oXMLDoc As MSXML2.DOMDocument
    Set oXMLDoc = New MSXML2.DOMDocument
    oXMLDoc.loadXML sXML
    oXMLDoc.Save XLFolder & sFileName
End Sub

Compressing Zip Files

And finally, compress the file back to the Zip package.

Public Sub ZipAllFilesInFolder()
    ' This code sample is courtesy Ron le Bruin www.rondebruin.nl
    Dim oShellApp As Object
    Dim sDate As String
    Dim sDefPath As String
    Dim vFileNameZip As Variant
    Dim FSO As Object
    Dim lFileCt As Long
    Set FSO = CreateObject("scripting.filesystemobject")
    
    ' To ensure a unique filename,
    ' append date and time to the name of the current file.
    sDate = Format(Now, " dd-mmm-yy h-mm-ss")
    vFileNameZip = SourceFile & sDate & ".zip"
    
    'Create the empty Zip file.
    NewZip vFileNameZip
 
    Set oShellApp = CreateObject("Shell.Application")
    
    ' Count how many items are in the original folder.
    lFileCt = oShellApp.Namespace(FolderName & "Unzipped " & FileName & "\").items.Count
    
    ' Copy the files to the compressed folder.
    oShellApp.Namespace(vFileNameZip).CopyHere oShellApp.Namespace(FolderName & "Unzipped " & FileName & "\").items
 
    ' Keep script waiting until there are the same number of files in the new folder.
    On Error Resume Next
    Do Until oShellApp.Namespace(vFileNameZip).items.Count = lFileCt
        Application.Wait (Now + TimeValue("0:00:01"))
    Loop
    DoEvents
    
    ' Remove the original file.
    Kill SourceFile
    
    ' Rename new Zipped file to same name as original file (with .zip appended).
    Name vFileNameZip As SourceFile
    On Error Resume Next
    
    ' Now remove the old folder.
    FSO.deletefolder FolderName & "Unzipped " & FileName, True
    On Error GoTo 0

    Set oShellApp = Nothing
End Sub

Getting XML for a Specific Worksheets

When it comes to working with Excel 2007 and its Office Open XML files, the most common operation is likely changing the XML for a particular worksheet. One procedure shown in this section extracts the rId (relationship identifier) attribute value from a list of nodes representing the worksheet in the package. Another procedure retrieves the sheet for that relationship identifier from the package. The last procedure retrieves the friendly name of the worksheet with that relationship identifier.

Retrieving Relationship Ids

The following code retrieves the rId attribute value for a particular sheet from the workbook.xml file.

Private Function GetSheetIdFromSheetName(sSheetName) As String
    Dim oXMLDoc As MSXML2.DOMDocument
    Dim oXMLNode As MSXML2.IXMLDOMNode
    Dim oXMLNodeList As MSXML2.IXMLDOMNodeList
    If mvXLFolder <> "" And Sheet2Change <> "" Then
        Set oXMLDoc = New MSXML2.DOMDocument
        oXMLDoc.Load XLFolder & "workbook.xml"
        Set oXMLNodeList = oXMLDoc.SelectNodes("/workbook/sheets/sheet")
        For Each oXMLNode In oXMLNodeList
            If oXMLNode.Attributes.getNamedItem("name").nodeValue = sSheetName Then
                GetSheetIdFromSheetName = oXMLNode.Attributes.getNamedItem("r:id").nodeValue
                Exit Function
            End If
        Next
    End If
End Function

Retrieving Part Names

The following procedure returns the package part name that matches the relationship from the previous procedure.

Public Function GetSheetFileNameFromId(sSheetId As String) As String
    Dim oXMLDoc As MSXML2.DOMDocument
    Dim oXMLNode As MSXML2.IXMLDOMNode
    Dim oXMLNodeList As MSXML2.IXMLDOMNodeList
    If mvXLFolder <> "" And Sheet2Change <> "" Then
        Set oXMLDoc = New MSXML2.DOMDocument
        oXMLDoc.Load XLFolder & "_rels\workbook.xml.rels"
        Set oXMLNodeList = oXMLDoc.SelectNodes("/Relationships/Relationship")
        For Each oXMLNode In oXMLNodeList
            If oXMLNode.Attributes.getNamedItem("Id").nodeValue = sSheetId Then
                GetSheetFileNameFromId = oXMLNode.Attributes.getNamedItem("Target").nodeValue
                Exit Function
            End If
        Next
    End If
End Function

Retrieving Friendly Worksheet Names

The following procedure retrieves the friendly name for the worksheet identified by the rId attribute.

Public Function GetSheetNameFromId(sId As String) As String
    Dim oXMLDoc As MSXML2.DOMDocument
    Dim oXMLNode As MSXML2.IXMLDOMNode
    Dim oXMLNodeList As MSXML2.IXMLDOMNodeList
    If mvXLFolder <> "" Then
        Set oXMLDoc = New MSXML2.DOMDocument
        oXMLDoc.Load XLFolder & "workbook.xml"
        Set oXMLNodeList = oXMLDoc.SelectNodes("/workbook/sheets/sheet")
        For Each oXMLNode In oXMLNodeList
            If oXMLNode.Attributes.getNamedItem("r:id").nodeValue = "rId" & Val(sId) + 1 Then
                GetSheetNameFromId = oXMLNode.Attributes.getNamedItem("name").nodeValue
                ' After you get the name, exit the function.
                Exit Function
            End If
        Next
    End If
End Function

Conclusion

The techniques demonstrated in this article show you different ways to find, extract, and write back XML data to a file. Working with Office Open XML files requires knowledge of the structure and content of the package and of the document parts. We encourage you to use the techniques explained here to further explore working with these types of files.

Additional Resources

You can find more information about the topics discussed in this article at the following locations: