Set Word Document Properties Programmatically
Code download available at:AdvancedBasics0603.exe(131 KB)
Creating the Settings
Creating the Form
Using the BackgroundWorker Component
Automating Microsoft Word
At the beginning of another lovely day of writing courseware in mad pursuit of unrealistic deadlines, I received a frantic call from a business partner. He was at the end of a long consulting project and had several hundred Microsoft® Word documents, all of which required their document properties to be set identically, except the Title property of the document, which was to be based on the document file name, minus the .doc extension. He didn't have time to set all the properties manually. Could I write a little utility to do the work for him?
Of course, I conjectured that he could probably set the properties manually in the time that it would take me to write a utility. But part of me was rejoicing: article fodder! So here we are.
In case you didn't know, Microsoft Word documents (and a large group of Windows® documents, including each of the Microsoft Office document types) maintain a set of properties that can be viewed in a number of ways. Within Windows Explorer, you can right-click a document, and select Properties from the context menu. Figure 1 shows the Windows dialog box.
Figure 1** View Document Properties **
From within Word, you can select File | Properties to view the same information. You could, of course, simply open each document in turn and manually change the settings. That would be rather tedious, though. A better solution would be to iterate through a selected folder, and possibly the subfolders within the folder, and update the properties programmatically for each file you find. And this is, of course, the solution I chose and the one I'll describe here. (For more information about the built-in document properties, browse to BuiltInDocumentProperties Property).
Word (along with Microsoft Excel® and Microsoft PowerPoint®) provides a repository for common document information, including Title, Subject, Author, and more. (Actually, it's the Office core functionality that makes all this possible, but each application exposes the capability through its own object model.) In addition, each Office application allows you to add custom properties, although this wasn't on my list of agenda items, so I didn't deal with it in my application. You could add support for modifying custom document properties if you want. Think of it as a homework assignment.
Figure 2** Users Can Iterate Through Folders **
In satisfying the needs of my business partner, I used Visual Studio® 2005 to create the application shown in Figure 2. This application satisfies a number of project requirements:
- The user must be able to select a folder in which to search, with the option of searching in subfolders of the selected folder.
- The user must be able to specify text for any of the visible document properties or to leave the property blank to clear any existing value.
- The application must allow the user to leave selected properties unchanged.
- The application must allow the user to cancel the operation, preventing further documents from being modified but not reverting changes already made.
- The application must allow the user to supply a fixed title for each document. The user should also be able to allow the application to insert the document's file name as the title.
As with any weekend project, there's a lot of room for improvement. You may find it instructive to add the features I purposefully left out. For instance, error handling is somewhat vague. You might want to provide better reporting on error conditions. Reporting on modified files would be helpful as well. It might be nice to add support for displaying a list of files that were successfully updated.
In its current state, the application supports only Word. It would be useful, and not terribly difficult, to modify it so that it supports other Office applications as well. Plus, it supports only a subset of the built-in document properties. You might decide to add support for more built-in or user-defined properties.
Although the sample application isn't terribly complicated, it does demonstrate several interesting technologies, including COM interoperability to automate Word, My.Settings to store user selections, object data binding, and the BackgroundWorker component. In the following sections I'll describe some issues involved in creating the app, and describe the interesting code. By the way, the sample is available in this month's download.
Creating the Settings
Because the application might be run several times, for different groups of documents, it seemed useful to me that the application maintain choices across sessions. It was a perfect occasion to take advantage of application settings! I created a setting corresponding to each selection within the form—each textbox, checkbox, and option has a corresponding setting. To create the settings, I right-clicked on the project within the Solution Explorer window, selected Properties from the context menu, and selected the Settings tab. Figure 3 shows the settings I created for the application.
Figure 3** Settings for Every User Option **
Now I wanted to be able to bind controls on the form directly to these settings so I wouldn't have to write the code to move the values into and out of controls on the form myself. Visual Studio 2005 makes this easy—I simply added a new data source to the project, using the Data | Add New Data Source menu. Choose to add an Object data source and you can select as a data source the My.MySettings class (which Visual Studio creates for you once you add all your settings). The settings then will show up in the Data Sources window (see Figure 4). For another discussion of using and binding to settings, see my December 2005 column (Advanced Basics: What's My IP Address?).
Figure 4** MySettings **
Creating the Form
Once you've created the data source corresponding to the MySettings class, you can drag any setting from the Data Sources window onto a form. Doing this creates a BindingSource object named MySettingsBindingSource on the form, which provides the connection between the settings and the form. I bound each checkbox and textbox control on the form to an individual setting in this manner.
The form includes a FolderBrowserDialog control (fbd, in the code), and clicking the Select Input Folder button runs the following code:
' Select the folder containing the document files to be modified. fbd.ShowNewFolderButton = False fbd.SelectedPath = folderLabel.Text If fbd.ShowDialog() = Windows.Forms.DialogResult.OK Then folderLabel.Text = fbd.SelectedPath My.Settings.Folder = fbd.SelectedPath End If
Note that direct binding to the folder label didn't work because I modified the value in code. The code sets the My.Settings.Folder value directly, once the user selects a path.
The settings for the form also include Size and Location values, so that the user can move the form around and the application persists the information. The Form.FormClosing event handler runs the following code, which saves the current information:
My.Settings.Size = Me.Size My.Settings.Location = Me.Location My.Settings.Save()
The Form.Load event handler runs the following code, which initializes the data binding, and restores the locations:
MySettingsBindingSource.DataSource = My.Settings ' Binding the form's properties to these settings ' doesn't work as you might expect. If Not My.Settings.Location.IsEmpty Then Me.Location = My.Settings.Location End If If Not My.Settings.Size.IsEmpty Then Me.Size = My.Settings.Size End If
In order to allow the user to be able to opt out of overwriting property values, the form includes checkboxes along the left edge. The code overwrites any value for which the checkbox is selected. For those that aren't selected, the form simply skips over the property. The Form.Load event handler hooks up the binding between the checkbox's Checked property and the corresponding textbox's Visible property for each checkbox on the form—if the user unchecks a checkbox, the corresponding textbox disappears:
' Set up control property bindings: titleTextBox.DataBindings.Add("Visible", _ useTitleCheckbox, "Checked") subjectTextBox.DataBindings.Add( _ "Visible", useSubjectCheckBox, "Checked") ' and so on, for each pair of controls...
(The form could also have handled the CheckedChanged event for each checkbox, but I like making this type of interaction declarative, using the DataBindings collection of each control in the Load event handler.)
Figure 5** Snap Lines Help You Align Controls **
The single form within the application, shown in Figure 5, includes a number of labels, textboxes, and checkboxes that needed to be carefully aligned. The Windows Forms designer in Visual Studio 2005 provides snap lines, which make it easier to create good-looking forms—even ones this simple. Figure 5 shows the snap lines in use, aligning a Label control with an edge and a text baseline.
Using the BackgroundWorker Component
I've written about the BackgroundWorker component, in a very similar application, in my March 2005 column (Advanced Basics: Doing Async the Easy Way), so I won't repeat that discussion. The sample form includes event handlers for the component's DoWork, ReportProgress, and RunWorkerCompleted events. The DoWork event handler begins the search process and manages the document modifications. The ProgressChanged event handler simply updates a ToolstripStatusLabel control within the form's status bar, like this:
statusLabel.Text = "Updating " & e.UserState.ToString
The RunWorkerCompleted event handler updates the Enabled properties of the two buttons at the bottom of the form and attempts to update the status bar with information about the success of the search, as shown in the following:
Dim statusText As String = "Ready" If e.Cancelled Then statusText &= _ ". Previous operation cancelled." End if statusLabel.Text = statusText makeChangesButton.Enabled = True cancelButton.Enabled = False
The form also includes code to ensure that you don't click the Make Changes button again until the code running from the previous click has completed. The BackgroundWorker component cannot run more than one task at a time, so the form disables the Make Changes button as soon as you click it (enabling the Cancel button concurrently). When the BackgroundWorker completes its work, whether you canceled or not, the code reverses the state of the buttons.
Automating Microsoft Word
Finally, I'll address the heart of the matter—making Word do stuff. The application has a COM reference set to the Word 11.0 Object Library. When I installed Office 2003, I elected to install the entire package, including the Office 2003 Primary Interop Assemblies (PIAs). These assemblies provide a fixed set of interop assemblies for you to program against, using the Microsoft .NET Framework. If you don't have these installed, Visual Studio generates the interop assembly for you when you set the reference, but this isn't the optimal solution. Instead, if you're going to program Word, make sure you do install the appropriate PIAs from the Office 2003 installation media.
The form's code includes class-wide references for the Word Application object and reference to a Word Document object:
Private app As Word.Application Private doc As Word.Document
To make it easier to program against the Word object model, I generally add Imports statements to the top of each code file, as shown in the following:
Imports Office = Microsoft.Office.Core Imports Word = Microsoft.Office.Interop.Word
Without those Import statements, the previous code snippet would look like this:
Private app As Microsoft.Office.Interop.Word.Application Private doc As Microsoft.Office.Interop.Word.Document
After you've set all the options and have clicked Make Changes, the code iterates (recursively, if you've requested subfolders to be searched) through the selected folder or folders and, as it finds each file, calls the OnFileFound procedure in the code. This procedure, after reporting the progress to the BackgroundWorker, opens the document:
' Open the document. doc = app.Documents.Open(fi.FullName)
The code uses the BuiltInDocumentProperties property of the document to set each property. In each case, the code verifies whether you want to overwrite the property and, if so, places the value from the TextBox control into the property, using the procedure shown in the following code:
Private Sub SetProperty( _ ByVal chk As CheckBox, _ ByVal prop As Word.WdBuiltInProperty, _ ByVal txt As TextBox) ' If the supplied CheckBox control is checked, ' then set the Word property to the supplied value. If chk.Checked Then doc.BuiltInDocumentProperties(prop).Value = txt.Text End If End Sub
The OnFileFound procedure calls the SetProperty procedure for each property, like this:
SetProperty(useSubjectCheckBox, wdPropertySubject, subjectTextBox) SetProperty(useAuthorCheckBox, wdPropertyAuthor, authorTextBox) ' and so on...
The code handles the Title property in a special manner, because you may have opted to replace the Title property with the document's file name, or you may have supplied a fixed value:
If useTitleCheckbox.Checked Then If useDocNameCheckBox.Checked Then doc.BuiltInDocumentProperties(wdPropertyTitle).Value = _ Path.GetFileNameWithoutExtension(doc.Name) Else doc.BuiltInDocumentProperties(wdPropertyTitle).Value = _ titleTextBox.Text End If End If
I tested this code on several hundred documents of my own before sending it off to my partner, for whom it immediately failed. It took a few minutes to diagnose, but he had a document in which someone had set the ReadOnlyRecommended property to True. (If this property is set, Word prompts you to open the document as read-only, but allows you to override the suggestion.) Unfortunately, by the time you open the document programmatically, it's too late to change the setting, and the dialog box halts the application.
A search on the Web turned up a tacky, if workable solution: you must save the document without the ReadOnlyRecommended property set—you cannot change the setting for the document you're attempting to work with. Therefore, I added sub-optimal code to the OnFileFound procedure, and it's called immediately after the code opens the document (see Figure 6).
Figure 6 ReadOnlyRecommended
If doc.ReadOnlyRecommended Then Dim docName As String = fi.FullName & "x" ' Save the doc with a new name, ' with ReadOnlyRecommended set to False. doc.SaveAs(docName, , , , , , False) doc.Close() ' Open the new document. doc = app.Documents.Open(docName) doc.SaveAs(fi.FullName) doc.Close() File.Delete(docName) doc = app.Documents.Open(fi.FullName) End If
The code in Figure 6 does exactly what the suggestion I found indicated: it checks the property, and if it's set, saves the document using a new name and having the property set to False. It then deletes the original document and renames the new document to the original document's name. Not pretty, but it solves the problem. If you have a better solution, please let me know.
After using this utility for a while (I've found it quite useful in my own work), I've added a few more settings that help clean up documents before I submit them—I delete all comments and accept all revisions in the OnFileFound procedure, as well:
' Accept revisions: doc.Revisions.AcceptAll() ' Delete all comments: For Each cmt As Word.Comment In doc.Comments cmt.Delete() Next
All that's left for the OnFileFound procedure to do is to close the document and save changes:
If you investigate the BackgroundWorker's DoWork event handler, you'll see that this block of code handles the entire operation:
' Open Word, and handle the documents in the specified folder. Try app = New Word.Application HandleDocs(folderLabel.Text, e) Catch ex As Exception MessageBox.Show(ex.Message) Finally app.Quit() System.Runtime.InteropServices.Marshal.ReleaseComObject(app) End Try
This code creates the instance of Word and calls the HandleDocs procedure to iterate through folders and files, as requested. Once the work has completed, the DoWork event handler quits Word and releases its reference using the ReleaseComObject method. (You may not agree with this design—I could have left Word running until you quit the application, but I hate to leave applications running in the background unless they're actually doing something.)
The HandleDocs procedure does the grunt work: iterating through the file system, as you see in Figure 7. The only interesting things about HandleDocs are that it checks for cancellations from the user, and it calls itself recursively if you ask it to search subfolders.
Figure 7 HandleDocs Procedure
Private Sub HandleDocs(ByVal path As String, _ e as System.ComponentModel.DoWorkEventArgs) Try If bgw.CancellationPending Then e.Cancel = True Exit Sub End If ' Handle subfolders, if required. Dim diLocal As New DirectoryInfo(path) If LookInSubfoldersCheckBox.Checked Then For Each di As DirectoryInfo In diLocal.GetDirectories HandleDocs(di.FullName, e) Next End If ' Search for matching file names. For Each fi As FileInfo In diLocal.GetFiles("*.doc") ' Cancellation pending? Cancel and get out. If bgw.CancellationPending Then e.Cancel = True Exit Sub End If OnFileFound(fi) Next Catch ex As UnauthorizedAccessException ' Don't do anything at all, just quietly get out. This means you ' weren't meant to be in this folder. All other exceptions will ' bubble back out to the caller. End Try End Sub
As with any project, I had other options. I considered two: creating a COM add-in for Word, and using Visual Studio 2005 Tools for the Office 2003 System, but neither seemed appropriate. COM add-ins are useful when you want to run code while you're using the Office application, but aren't helpful outside the running application. Visual Studio Tools for Microsoft Office is great when you want to create solutions using Excel, Word, Outlook®, or InfoPath® that use managed code, but in each case, the solutions are document-centric. After a few moments thought, it became clear that a simple Windows-based application using COM Automation was the best approach.
I'm sure you may be wondering why I didn't use the ill-fated COM component that allowed you to modify document properties without actually loading the document into its host application. That component appears to have been removed from MSDN® (where I originally found it) and to be honest, it never worked completely correctly. Some properties could be changed using the component, but others wouldn't change or would cause your application to fail in violent sorts of ways (sometimes requiring Windows to restart). Enough said about that option.
I've been spending much time recently writing examples using the new XML-based file format for Office 12 documents. This new file format will make tasks such as the one undertaken here much more efficient. Using the new format, there's no reason to load Word to perform the work, and there's not even the need to load the entire document. Because of the partitioning of the data using this new file format, you can load just the properties part of the document, make your changes, and save that tiny piece. Watch this column—when the public beta arrives, I'll revisit this application and kick the tires of the new file format.
Send your questions and comments to firstname.lastname@example.org.
Ken Getz is a senior consultant with MCW Technologies and a courseware author for AppDev. He is coauthor of ASP.NET Developers Jumpstart (Addison-Wesley, 2002), Access Developer's Handbook (Sybex, 2002), and VBA Developer's Handbook, 2nd Edition (Sybex, 2001). Reach him at email@example.com. Ken would like to thank Brian Randell for providing the inspiration for the utility described in this column.