Editing Data in an Excel 2007 Open XML File with VBA
Explore different ways to find particular XML data in an Excel 2007 file, retrieve the data, and then write it back to the file all by using Office Open XML. (14 printed pages)
Jan Karel Pieterse, Excel MVP
Frank Rice, Microsoft Corporation
June 2009
Applies to: Microsoft Office Excel 2007
Contents
Introducing Office Open XML Files
Finding Data in Worksheets
Adding Text to Cells
Adding Numbers to Cells
Editing Elements in Open XML Files with VBA
Using Class Modules
Uncompressing.xlsx or .xlsm Files
Extracting XML from Files
Writing XML to Worksheets
Writing XML Back to Files
Getting XML for a Specific Worksheets
Retrieving Relationship Ids
Retrieving Part Names
Retrieving Friendly Worksheet Names
Conclusion
Additional Resources
Introducing Office Open XML Files
The Microsoft Office Open XML file structure makes it possible to edit and update Microsoft Office 2007 files without the need to open the file in the application that created it. The components parts of the Open XML file are contained in an industry standard Zip container. Inside of the container are XML files (document parts) that: define the relationship between each part in the package, contain the data displayed in the document or workbook, define the styles and formatting used in the document, and contain information on the properties and settings of package and its document parts. Because it is a Zip file, you can see the document parts and structure of the file by adding the .zip extension to the document file name and opening the package. Figure 1 is an example of the structure and components of a Microsoft Excel 2007 workbook.
Figure 1. The structure and contents of an Excel 2007 workbook
In this article, we demonstrate how to manually explore the structure and parts of an Excel 2007 workbook and add text and numbers to a cell. After that, we provide Microsoft Visual Basic for Applications (VBA) code that programmatically opens the Zip file, extracts the XML contents of a document part, writes the XML back to the document part, and then closes the Zip file.
In the following sections, we demonstrate how to find the document part that contains the data in a worksheet, and add text and a number to a cell in the worksheet. We use the sample Excel 2007 file available for download from the JKP Application Development Service Web site.
Finding Data in Worksheets
To get started, open the workbook in the sample. The workbook contains a number of worksheets as shown Figure 2.
Figure 2. The contents of the sample Excel 2007 file
For this exercise, suppose that you want to find data that resides in the Comments worksheet without opening the workbook in Excel. Figure 3 shows the contents of the Comments worksheet. This information will be used later.
Figure 3. The contents of the Comments worksheet
The first thing you want to do is to find which XML file in the package contains the data in the Comments worksheet. To do this, open the workbook.xml in an XML editor or text editor such as NotePad. Figure 4 shows an excerpt of the document part’s XML.
Figure 4. An excerpt of the workbook part
Looking the <sheet> element for the Comment worksheet, you see that the r:id attribute, the relationship identifier, is equal to rId7. To find which worksheet part this relationship Id points to, open the workbook.xml.rels part in the xl\_rels folder. Figure 5 shows an excerpt from the file.
Figure 5. An excerpt of the workbook.xml.rels part
The highlighted statement in Figure 5 shows the relationship rId7 points to sheet7.xml in the workbook. Thus the Comments worksheet is represented by the sheet7.xml part in the workbook package. See Figure 3 for the content of the Comments worksheet. Next, open worksheet sheet7.xml in the editor. Figure 6 displays the XML for this part which shows cells A1 through A3.
Figure 6. The XML for the sheet7.xml part
Now suppose that you are interested in the content of cell A1 (<c r="A1" t="s">). How do you find that content? The t="s" attribute says that the cell contains a string. The package contains a xl\sharedStrings.xml part where the strings in a workbook are stored in a table. Because several cells may share the same string such as a postal code or country name, instead of repeating the same string in each cell, it is more efficient to store the string once in a separate part. Then each cell that uses a particular string contains a number representing the indexed location of that string in the sharedStrings part.
So looking at the next line <v>28</v> in the sheet7.xml part tells you that the value for that cell is contained in the shared strings table at index 28. Opening the sharedString.xml part shows the following information at the top of the file.
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" count="69" uniqueCount="46">
The uniqueCount attribute says that there are 46 unique strings in the workbook. Counting down to item 29, you see that the item contains the XML shown in Figure 7.
Figure 7. The contents of the sharedString.xml part
Adding Text to Cells
Now suppose that you want to add a new text entry, which is not in the shared strings table, into a row in the Comments worksheet.
Update the string counts at the top of the sharedString.xml part. The updated XML should look like the following:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" count="70" uniqueCount="47">
Insert the new text into the sharedString.xml part at position 29 (the index is zero-based) as shown in Figure 8.
Figure 8. Insert this text to position 29 in the sharedString.xml part
Add an entry to the sheet7.xml part as shown in Figure 9.
Figure 9. Insert this data into the sheet7.xml part
Remove the .zip file extension from the filename and open the file in Excel 2007. The new text is in the Comments worksheet.
Adding Numbers to Cells
Adding numbers to a cell is similar to adding text to a cell with two differences:
The numbers are stored in the worksheet part with no reference to the shared string table.
You can add number formatting.
For example, Figure 10 shows the XML in sheet7.xml that refers to the number 12 in cell A2 (see Figure 3) with the General number format and no other special formatting.
Figure 10. The number 12 in cell A2
Cells with formatting (styles) are referred to by an index number into the Styles.xml part. For example, Figure 11 shows the entry in sheet7.xml for the data shown in cell A3 (see Figure 3).
Figure 11. The entry for cell A3 in sheet7.xml
Figure 12 shows the entry for the index 34 in the styles.xml part. This entry points to numFmtId = 14 which is an implicit value for the date format mm-dd-yy as shown in the table in Figure 13. For more information about implicit style formats, see the Standard ECMA-376 Office Open XML Formats: 2nd Edition Part 2.
Figure 12. The indexed entry for index 34 in the styles.xml part
Figure 13. Table of implicit style formats
As this section demonstrates, adding formatted cells to a worksheet is not straightforward. The best way to do this is to use an existing format style as a model.
Editing Elements in Open XML Files with VBA
Now that you have had a chance to go through the process of manual reading and updating an Office Open XML Spreadsheet file, it is time to see how to edit the contents of a file programmatically with VBA. You can find a downloadable version of the file used in this section at the JKP Application Development Service Web site
The VBA project is located in the file EditOpenXML.xlsm and consists of two modules and one class module as seen in Figure 14. The modDemo module contains the code that calls the method in the clsEditOpenXML class. The modUNC module contains code to work with UNC paths.
Figure 14. The EditOpenXML project structure
The project also uses the Microsoft XML, version 3.0 library to assist in working with the XML parts. The project also takes advantage of the File Scripting Object and the Windows Shell application object. The class module performs the following actions:
Uncompress an .xlsx or .xlsm file (in this sample formcontrols.xlsm)
Extract XML from the Office Open XML part in the file formcontrols.xlsm
Write the resulting XML to Sheet2 in the EditOpenXML.xlsm file
Write the XML back to the part
Compress the Zip file
These actions are discussed in the sections that follow.
Using Class Modules
To call the procedures in the following section, I have created a module named modDemo.
Public Sub Demo()
Dim cEditOpenXML As clsEditOpenXML
Dim sXML As String
Set cEditOpenXML = New clsEditOpenXML
With cEditOpenXML
'Indicate which OpenXML file to process.
.SourceFile = ThisWorkbook.Path & "\formcontrols.xlsm"
'Before you can access info in the file, it must be unzipped.
.UnzipFile
'Indicate which sheet you want to change.
.Sheet2Change = "MySheet"
'Get XML from the sheet's xml file.
sXML = .GetXMLFromFile(.SheetFileName)
'Change the xml of the sheet here.
'To prove we really imported the sheet, write the content of
'the cells to Sheet2 of this file.
WriteXML2Sheet sXML
'Write the xml back to the sheet.
'.WriteXML2File sXML, .SheetFileName
'Rezip the unzipped package.
.ZipAllFilesInFolder
End With
'After you let the class go out of scope, then remove the zip file's .zip extension
'in the terminate event of the class.
'Then the OpenXML file has its original filename back.
Set cEditOpenXML = Nothing
End Sub
Uncompressing.xlsx or .xlsm Files
To work with the files in the Office Open XML package, the file (the file named formcontrols.xlsm) must be uncompressed into its components parts. The code that follows does that. As a precaution, a backup copy of the package file is created. Next, a folder is created to contain the uncompressed files. Then the files are copied to the target folder.
Public Sub UnzipFile()
' This code is courtesy Ron de Bruin www.rondebruin.nl.
Dim FSO As Object
Dim oShellApp As Object
Set FSO = CreateObject("scripting.filesystemobject")
' Derive the folder to unzip to from the location of the source file.
UnzipFolder = FolderName
' In the same folder, create a dedicated unzip folder as the source file,
' named ..\ Unzipped Filename\
If Right(UnzipFolder, 1) <> "\" Then
UnzipFolder = UnzipFolder & "\UnZipped " & FileName & "\"
Else
UnzipFolder = UnzipFolder & "UnZipped " & FileName & "\"
End If
On Error Resume Next
' Remove all previous existing folders.
FSO.deletefolder UnzipFolder & "*", True
Kill UnzipFolder & "*.*"
On Error GoTo 0
' Create normal folder.
If FolderExists(UnzipFolder) = False Then
MkDir UnzipFolder
End If
Set oShellApp = CreateObject("Shell.Application")
' Copy the files in the newly created folder.
oShellApp.Namespace(UnzipFolder).CopyHere oShellApp.Namespace(SourceFile).items
On Error Resume Next
' Clean up temp folder.
FSO.deletefolder Environ("Temp") & "\Temporary Directory*", True
' The files are now located in unzipped folder.
XLFolder = UnzipFolder & "xl\"
Set oShellApp = Nothing
Set FSO = Nothing
Exit Sub
End Sub
Extracting XML from Files
The following code extracts the XML from the package part and then writes the XML to Sheet1 in the file EditOpenXML.xlsm to let you see what XML was extracted.
Public Function GetXMLFromFile(sFileName As String) As String
Dim oXMLDoc As MSXML2.DOMDocument
If Len(XLFolder) = 0 Then
GetXMLFromFile = ""
Else
Set oXMLDoc = New MSXML2.DOMDocument
oXMLDoc.Load XLFolder & sFileName
Dim nodeList As IXMLDOMNodeList
Dim node As IXMLDOMNode
Dim nodeValue As String
Dim concatValues As String
Dim currRow As Integer
'Retrieve the node list by using an XPath query.
Set nodeList = oXMLDoc.SelectNodes("/worksheet/sheetData/row/c")
currRow = 1
' Iterate through the node list and write the values to Sheet1 in this workbook.
' Then, concatenate the node value to a string.
For Each node In nodeList
nodeValue = node.nodeTypedValue
Range("A1").Offset(currRow - 1, 0).Value = nodeValue
concatValues = concatValues + vbCrLf + nodeValue
currRow = currRow + 1
Next
' Return the concatenated string.
GetXMLFromFile = concatValues
' GetXMLFromFile = oXMLDoc.XML
Set oXMLDoc = Nothing
End If
End Function
Writing XML to Worksheets
After reading the XML from the part, write the results to a worksheet.
Public Sub WriteXML2Sheet(sXML As String)
Dim oNode As MSXML2.IXMLDOMNode
Dim oNodeList As MSXML2.IXMLDOMNodeList
Dim oXMLDoc As MSXML2.DOMDocument
Dim lRow As Long
Set oXMLDoc = New MSXML2.DOMDocument
oXMLDoc.loadXML sXML
Set oNodeList = oXMLDoc.SelectNodes("/worksheet/sheetData/row/c")
For Each oNode In oNodeList
If oNode.FirstChild.nodeTypedValue = "" Then
' Value:
Sheet2.Range(oNode.Attributes(0).nodeValue).Value = oNode.nodeTypedValue
Else
' Formula:
Sheet2.Range(oNode.Attributes(0).nodeValue).Value = oNode.FirstChild.nodeTypedValue
End If
Next
End Sub
Writing XML Back to Files
After whatever processing that you want to do, write the updated XML back to the original file.
Public Sub WriteXML2File(sXML As String, sFileName As String)
Dim oXMLDoc As MSXML2.DOMDocument
Set oXMLDoc = New MSXML2.DOMDocument
oXMLDoc.loadXML sXML
oXMLDoc.Save XLFolder & sFileName
End Sub
Compressing Zip Files
And finally, compress the file back to the Zip package.
Public Sub ZipAllFilesInFolder()
' This code sample is courtesy Ron le Bruin www.rondebruin.nl
Dim oShellApp As Object
Dim sDate As String
Dim sDefPath As String
Dim vFileNameZip As Variant
Dim FSO As Object
Dim lFileCt As Long
Set FSO = CreateObject("scripting.filesystemobject")
' To ensure a unique filename,
' append date and time to the name of the current file.
sDate = Format(Now, " dd-mmm-yy h-mm-ss")
vFileNameZip = SourceFile & sDate & ".zip"
'Create the empty Zip file.
NewZip vFileNameZip
Set oShellApp = CreateObject("Shell.Application")
' Count how many items are in the original folder.
lFileCt = oShellApp.Namespace(FolderName & "Unzipped " & FileName & "\").items.Count
' Copy the files to the compressed folder.
oShellApp.Namespace(vFileNameZip).CopyHere oShellApp.Namespace(FolderName & "Unzipped " & FileName & "\").items
' Keep script waiting until there are the same number of files in the new folder.
On Error Resume Next
Do Until oShellApp.Namespace(vFileNameZip).items.Count = lFileCt
Application.Wait (Now + TimeValue("0:00:01"))
Loop
DoEvents
' Remove the original file.
Kill SourceFile
' Rename new Zipped file to same name as original file (with .zip appended).
Name vFileNameZip As SourceFile
On Error Resume Next
' Now remove the old folder.
FSO.deletefolder FolderName & "Unzipped " & FileName, True
On Error GoTo 0
Set oShellApp = Nothing
End Sub
Getting XML for a Specific Worksheets
When it comes to working with Excel 2007 and its Office Open XML files, the most common operation is likely changing the XML for a particular worksheet. One procedure shown in this section extracts the rId (relationship identifier) attribute value from a list of nodes representing the worksheet in the package. Another procedure retrieves the sheet for that relationship identifier from the package. The last procedure retrieves the friendly name of the worksheet with that relationship identifier.
Retrieving Relationship Ids
The following code retrieves the rId attribute value for a particular sheet from the workbook.xml file.
Private Function GetSheetIdFromSheetName(sSheetName) As String
Dim oXMLDoc As MSXML2.DOMDocument
Dim oXMLNode As MSXML2.IXMLDOMNode
Dim oXMLNodeList As MSXML2.IXMLDOMNodeList
If mvXLFolder <> "" And Sheet2Change <> "" Then
Set oXMLDoc = New MSXML2.DOMDocument
oXMLDoc.Load XLFolder & "workbook.xml"
Set oXMLNodeList = oXMLDoc.SelectNodes("/workbook/sheets/sheet")
For Each oXMLNode In oXMLNodeList
If oXMLNode.Attributes.getNamedItem("name").nodeValue = sSheetName Then
GetSheetIdFromSheetName = oXMLNode.Attributes.getNamedItem("r:id").nodeValue
Exit Function
End If
Next
End If
End Function
Retrieving Part Names
The following procedure returns the package part name that matches the relationship from the previous procedure.
Public Function GetSheetFileNameFromId(sSheetId As String) As String
Dim oXMLDoc As MSXML2.DOMDocument
Dim oXMLNode As MSXML2.IXMLDOMNode
Dim oXMLNodeList As MSXML2.IXMLDOMNodeList
If mvXLFolder <> "" And Sheet2Change <> "" Then
Set oXMLDoc = New MSXML2.DOMDocument
oXMLDoc.Load XLFolder & "_rels\workbook.xml.rels"
Set oXMLNodeList = oXMLDoc.SelectNodes("/Relationships/Relationship")
For Each oXMLNode In oXMLNodeList
If oXMLNode.Attributes.getNamedItem("Id").nodeValue = sSheetId Then
GetSheetFileNameFromId = oXMLNode.Attributes.getNamedItem("Target").nodeValue
Exit Function
End If
Next
End If
End Function
Retrieving Friendly Worksheet Names
The following procedure retrieves the friendly name for the worksheet identified by the rId attribute.
Public Function GetSheetNameFromId(sId As String) As String
Dim oXMLDoc As MSXML2.DOMDocument
Dim oXMLNode As MSXML2.IXMLDOMNode
Dim oXMLNodeList As MSXML2.IXMLDOMNodeList
If mvXLFolder <> "" Then
Set oXMLDoc = New MSXML2.DOMDocument
oXMLDoc.Load XLFolder & "workbook.xml"
Set oXMLNodeList = oXMLDoc.SelectNodes("/workbook/sheets/sheet")
For Each oXMLNode In oXMLNodeList
If oXMLNode.Attributes.getNamedItem("r:id").nodeValue = "rId" & Val(sId) + 1 Then
GetSheetNameFromId = oXMLNode.Attributes.getNamedItem("name").nodeValue
' After you get the name, exit the function.
Exit Function
End If
Next
End If
End Function
Conclusion
The techniques demonstrated in this article show you different ways to find, extract, and write back XML data to a file. Working with Office Open XML files requires knowledge of the structure and content of the package and of the document parts. We encourage you to use the techniques explained here to further explore working with these types of files.
Additional Resources
You can find more information about the topics discussed in this article at the following locations: