Creating a Text-to-speech add-in for Microsoft Word 2007 with Visual Studio 2008

By Alessandro Del Sole

Introduction

Visual Studio 2008 includes tools for Office that allow developers to build custom components, such as add-ins, for Microsoft Office System applications writing managed code. Writing custom components is really straightforward because developers can take the advantage of the powerful environment provided by Microsoft Visual Studio 2008 and .NET Framework 3.5. Since you are not limited to a small set of libraries or user controls, you can take the best from .NET Framework in your Office solutions and integrate various technologies. For example, you could create a custom component for Microsoft Word 2007 and integrate Text-to-Speech capabilities so that your computer (or your customers’ one) could speak the text contained inside a Microsoft Word document.

In this article I will show you how to create a custom task pane for Microsoft Word 2007 and how to integrate Text-to-Speech capabilities provided by .NET Framework since version 3.0.

Creating the project

Download the code

Two options are available when developing solutions with Visual Studio 2008 for Microsoft Word and Microsoft Excel 2007: application-level solutions and document-level solutions. Application-level solutions are custom components which affect the host application each time it’s loaded. Document-level solutions are custom components which affects single documents or spreadsheets when they are opened from within the application. In this case we are going to develop an application-level solution, so that our custom task pane is available for each document opened in Word 2007.

First, let’s create a new project for Visual Basic 2008. With Visual Studio 2008 opened, select the New|Project command from File menu. When the New Project window appears, browse the Office templates folder and select the Word 2007 add-in template. Our new project should be called TextToSpeechWordAdd-in as shown in Figure 1:

Figure 1 – Selecting the Word add-in project template

When the new project is created you have to add a reference to the System.Speech.dll assembly, which gives us an API to access the Text-to-Speech engine in a managed way.

Visual Studio 2008 adds to the project itself a code file called ThisAdd-in.vb. In this file you can find a class called ThisAdd-in which represents the instance of the custom component. This class exposes two events, which represent the two main events of an add-in’s lifecycle: Startup and Shutdown. As you can easily understand, the first one is raised when the add-in is loaded into memory, while the second one is raised when the application is closing (and therefore, when the add-in is being unloaded). It’s really important to understand that closing task panes in the UI does not mean unloading task panes, so even if you close a task pane inside an Office application this remains loaded in memory until you close the application.

As we said before, we are going to create a custom task pane. To create a custom task pane we will need a new custom user control, which will contain Windows Forms controls that will allow performing requested actions (as I will explain in detail in next section). Visual Studio 2008 handles custom task panes in a collection, called CustomTaskPanes of type CustomTaskPaneCollection. This is a collection of CustomTaskPane objects. So our first task is to declare a CustomTaskPane object at class level (this implies an Imports Microsoft.Office.Tools statement), which will represent our custom component:

Private speechTaskPane As CustomTaskPane

The next step is writing code in the event handler to the Startup event. In this handler we have to add our custom task pane to the CustomTaskPanes collection. This collection exposes an Add method which receives two arguments; the first one is a new instance of the user control which constitutes the real task pane (and that we will implement in the next section of this article), while the second one is a description which will be shown as a header for the task pane:

Private Sub ThisAdd-in_Startup(ByVal sender As Object, _
   ByVal e As System.EventArgs) Handles Me.Startup
    speechTaskPane = Me.CustomTaskPanes.Add(New SpeechControl, "Document's speech")
    speechTaskPane.Visible = True
End Sub

SpeechControl is the name for our custom control, which we are going to implement in the next section.

Customizing the task pane

As we said before, a custom component consists of a .NET user control. So let’s add a new user control to our project, via the Project|Add user control command. When the Add new item window appears, type SpeechControl.vb inside the appropriate text box.

Our new user control has to allow selecting voices and controlling speech reproduction. Figure 2 represents what our control will look like:

Figure 2 – Layout of the new custom control

So let’s add some Windows Forms controls onto our user control’s design surface:

  • a ComboBox control which will list all available installed voices. Set Name property for this control as ComboBox1;
  • a Button control, for starting Text-to-Speech engine. Set Name property for this button as speakButton;
  • a Button control, for stopping Text-to-Speech engine. Set Name property for this button as cancelButton;
  • a TrackBar control, for controlling Text-to-Speech speed rate. Set Name property for this control as speedTrackBar;
  • a TrackBar control, for controlling volume. Set Name property for this control as volumeTrackBar.

Particular attention must be paid to TrackBar controls. As we will see in the next section, speed rate for text-to-speech engine must be a number between -10 and 10 values. A normal speed rate is obtained at -1. Because of this, assign Minimum, Maximum and Value properties respectively to -10, 10 and -1. Volume must be a number between 0 and 100, so assign Minimum, Maximum and Value properties for volumeTrackBar to 0, 100 and 50 respectively.

Once we have designed our control, we are going to take a deeper look at .NET Framework speech engine features writing specific code.

Handling Text-to-Speech features

To let our computer speak we have to get a new instance of a class called SpeechSynthesizer, implemented by the namespace System.Speech.Synthesis. This class exposes properties and methods that can start, stop and control audio reproduction of the specified text. First we need to add the Imports statements:

Imports System.Speech.Synthesis
Imports System.Collections.ObjectModel

And then, we have to put this declaration at class level:

Private mySynth As New SpeechSynthesizer

The next step is to write a method which retrieves a list of installed “voices”. Voices are components which allow vocal synthesis of the given text and that implement pronunciation and text recognition features according to specific grammar rules. By default, Windows Vista is shipped with a voice called Microsoft Anna which represents an English woman voice. This voice recognizes English words, punctuation and grammar rules, so if your language is different from English you should find and install a specific voice for your culture. In .NET a single voice is represented by an InstalledVoice class. Retrieving a list of all installed voices is really simple, since we can call the GetInstalledVoices on the SpeechSynthesizer class. This method returns a ReadOnlyCollection of InstalledVoice objects. At this point, our method should be similar to this:

    'Retrieves all the installed voices
    Private Sub GetInstalledVoices(ByVal synth As Speech.Synthesis.SpeechSynthesizer)
        Dim voices As ReadOnlyCollection(Of InstalledVoice) = _
          synth.GetInstalledVoices(Globalization.CultureInfo.CurrentCulture)
        If voices.Count = 0 Then
            'no voices installed, so disable controls
            SpeakButton.Enabled = False
            volumeTrackBar.Enabled = False
            speedTrackBar.Enabled = False
        End If
        Try
            Dim voiceInformation As VoiceInfo = voices(0).VoiceInfo
            For Each v As InstalledVoice In voices
                voiceInformation = v.VoiceInfo
                ComboBox1.Items.Add(voiceInformation.Name.ToString)
            Next
        Catch ex As Exception
              'TODO: Handle errors...
        End Try
    End Sub

So when our control is loaded, the combo must be populated. This is a very simple task by calling the GetInstalledVoices method described above in the Load event handler for the user control itself.

Private Sub SpeechControl_Load(ByVal sender As Object, _
    ByVal e As System.EventArgs) Handles Me.Load
    GetInstalledVoices(mySynth)
End Sub

Next step is to write some code to handle volume and speed rate changes. As we mentioned above, volume must be a number between 0 and 100. This number becomes the value for the Volume property of the active SpeechSynthesizer class instance (mySynth, in this case). When the user changes the position of the slider in the volumeTrackBar control, a ValueChanged event is raised, so this is the event we have to handle in code.

Private Sub volumeTrackBar_ValueChanged(ByVal sender As Object, _
   ByVal e As System.EventArgs) Handles volumeTrackBar.ValueChanged
    'Set volume
    mySynth.Volume = volumeTrackBar.Value
End Sub

Both slider position and Volume property are Integer, so no casting is needed. We can then set the speed rate by assigning the Rate property of the SpeechSynthesizer class, handling the ValueChanged event for the speedTrackBar.

Private Sub speedTrackBar_ValueChanged(ByVal sender As Object, _
    ByVal e As System.EventArgs) Handles speedTrackBar.ValueChanged
      Set speed
    mySynth.Rate = speedTrackBar.Value
End Sub

As we mentioned before, speed rate must be a number between -10 and 10 and a normal speed is set at -1. Next step is to handle the Click event for the CancelButton button, which is raised when the user wants to stop text-to-speech. This is a really simple task:

Private Sub CancelButton_Click(ByVal sender As System.Object, _
    ByVal e As System.EventArgs) Handles CancelButton.Click
     mySynth.SpeakAsyncCancelAll()
     CancelButton.Enabled = False
End Sub

It is worth noting that changing the volume and speech rate will not change any active speech.

The last step is writing code for handling the Click event raised when the user clicks the SpeakButton. First let’s write code for the event handler:

Private Sub SpeakButton_Click(ByVal sender As System.Object, _
  ByVal e As System.EventArgs) Handles SpeakButton.Click
    'If no voice is selected, no action is taken
    If String.IsNullOrEmpty(ComboBox1.Text) = True Then Exit Sub
    'Select the specified voice
    mySynth.SelectVoice(ComboBox1.Text)
    'Get the instance of the active Microsoft Word 2007 document
    Dim document As Word.Document = Globals.ThisAdd-in.Application.ActiveDocument
    'Let it speak!
    mySynth.SpeakAsync(document.Content.Text)
    CancelButton.Enabled = True
End Sub

The event handler performs the following tasks:

  • Checks if any voice has been selected depending on the content of the Combo;
  • Tells the speech engine that the voice to be used is the one selected in the Combo (SelectVoice method);
  • Retrieves an instance of the active document in Microsoft Word 2007. A Word document is represented by a Microsoft.Office.Interop.Word.Document interface and is retrieved by assigning the content of the ActiveDocument property which is exposed by the Application class which represents the running instance of Microsoft Word 2007.
  • Starts the speech engine by calling the SpeakAsync method. This method receives a String argument to be played. In this case the string is constituted by the active document text. The SpeechSynthesizer class exposes another similar event, which is simply called Speak. This method plays text in a synchronous way, so you can do nothing else until reproduction has ended. On the contrary, SpeakAsync allows the user to do any other thing while listening to speech.

Now that we have finished writing our code, we are ready to run our add-in in Microsoft Word 2007.

The custom task pane in action

Building Office solutions requires the same steps that you should perform in other kinds of applications. So press F5 to compile the project and start an instance of Microsoft Word 2007. As you can see in Figure 3 (taken from the Italian localization of Word 2007), our custom task pane is shown in the host application.

Figure 3 – The custom task pane running inside Microsoft Word 2007

Type some text, select the Microsoft Anna voice from the combo (or any other voice you have installed) and click the Speak button to start the Text-To-Speech engine and see the magic happen. Just click the Stop speaking button to stop the engine.

It’s really amazing how Text-to-Speech can enhance applications’ UI and how it can be easily integrated in various kinds of applications like Visual Studio solutions for the Microsoft Office System.

Just remember, if you wish to deploy add-ins for Microsoft Office System 2007 using ClickOnce, your assembly needs FullTrust permissions. In this particular case just using SpeechSynthesizer requires FullTrust. For further information you can take a look at this article in the MSDN Library: Deploying Solutions for 2007 Office System with ClickOnce Using Visual Studio Tools for the Office System (3.0). Moreover, you can manage custom add-ins by calling the COM Add-ins window which is reachable through the Word options menu (you can find this menu by clicking the Office button in the Ribbon).

Useful resources

Microsoft Visual Studio Tools for Office Developer Portal

Visual Basic Developer Center

System.Speech.Synthesis namespace

About the author

Alessandro Del Sole is a Team Member in the Italian “Visual Basic Tips & Tricks” Community. He writes lots of Italian and English language community articles and books about .NET development. He also enjoys writing freeware and open-source developer tools. You can visit Alessandro’s blog.