Microsoft Speech Platform
Text-to-Speech (TTS) Overview
ISpVoice is the API for text-to-speech (TTS) in the Microsoft Speech Platform. Using this interface, applications can add the TTS support needed to speak text, modify speech characteristics, change voices, and respond to real-time events while speaking.
Applications obtain access to ISpVoice interface methods by creating a COM object. As the name implies, an ISpVoice object is simply a single instance of a specific TTS voice. Every ISpVoice object is an individual voice. Even if two different ISpVoice objects select the same voice token (for example "en-US_Helen"), each of the two voices can be changed and modified independently of the other.
When an application first creates an ISpVoice object, the object initializes to the default voice token that is specified at the following registry key: HKEY_CURRENT_USER\Software\Microsoft\Speech Server\v11.0\Voices\DefaultTokenId. This means that the new object is immediately ready to speak text, no special initialization is needed. At this point, applications can use Speak or SpeakStream to speak any Unicode text data.
Synchronous vs. Asynchronous Speaking
The two speaking functions can generate speech either synchronously (function does not return until text has completely spoken) or asynchronously (function returns immediately but continues speaking as a background process). Asynchronous operation is typically preferred if the application needs to do something else (highlight text, paint animation, monitor controls, or similar) while speaking. Otherwise, the simplest case is to speak synchronously.
Get Status Information
During asynchronous speech, applications can get current status information (text position, speech done state, bookmarks, etc.) in one of two ways. The simplest way is to periodically poll the ISpVoice object using the GetStatus method. The other way is to initialize the ISpVoice object so that it sends real-time events to the application as they happen.
Modify Voice Attributes
Often with TTS, voice output needs to be modified from its default setting. There are two ways to do this is; either by calling certain ISpVoice API methods, or by embedding XML that conforms to the Speech Synthesis Markup Language (SSML) Version 1.0 within the text to speak. Typically, the API functions are used as global settings that affect the speech independent of current selected voice or document that is spoken. While the SSML tags are usually used in much narrower scope, affecting only the spoken style in a single document. For information about modifying voice attributes using SSML, see Use SSML to Create Prompts and Control TTS.
Manage Audio Output
Audio output for TTS is not restricted to hardware sound card destinations. TTS functionality in the Speech Platform supports, either directly or indirectly, almost any audio configuration an application may require. Whether the destination is a PC sound card, buffer in memory, or a special telephony hardware, ISpVoice has several audio control methods to change the audio path from its default configuration.
|Speak||Speaks a text string or file.|
|SpeakStream||Speaks a text stream or plays an audio (WAV) stream.|
Get Real-time Status
|GetStatus||Returns current speech and event status information.|
|WaitUntilDone||Delays until either the voice has completed speaking or the specified time interval has elapsed.|
|SpeakCompleteEvent||Returns an event handle that will be signaled when speech is done.|
|Pause||Pauses the output speech at the nearest alert boundary.|
|Skip||Skips ahead or backward to a new input text position while speaking.|
Change Voice Attributes
|SetRate||Sets the speaking rate in real time.|
|GetRate||Returns the current speaking rate.|
|SetVolume||Sets the speech volume level in real time.|
|GetVolume||Returns the current speech volume level.|
|SetVoice||Sets the identity of the voice used for synthesis.|
|GetVoice||Retrieves the object token that identifies the current voice.|
Process Events in Real Time (inherited from ISpEventSource)
|SetInterest||Sets the type of events to queue.|
|GetEvents||Returns the queued events.|
|GetInfo||Returns information about the event queue.|
|SetNotifySink||Sets up the instance to make free-threaded calls through ISpNotifySink::Notify.|
|SetNotifyWindowMessage||Sets a window handle to receive notifications as window messages.|
|SetNotifyCallbackFunction||Sets a callback function to receive notifications.|
|SetNotifyCallbackInterface||Sets an object to receive notifications.|
|SetNotifyWin32Event||Sets up a Win32 event object to be used by this instance for notifications.|
|WaitForNotifyEvent||A blocking call which waits for a notification.|
|GetNotifyEventHandle||Retrieves Win32 event handle associated with this notify source.|
Manage Audio Output
|SetOutput||Sets the current output object. A value of NULL may be used to select the default audio device.|
|GetOutputStream||Retrieves a pointer to the current output stream.|
|GetOutputObjectToken||Retrieves the object token for the current output object.|
|SetPriority||Sets the priority for the voice.|
|GetPriority||Retrieves the current voice priority level.|
|SetAlertBoundary||Specifies which event should be used as the insertion point for alerts.|
|GetAlertBoundary||Retrieves the event that is currently being used as the insertion point for alerts.|
|IsUISupported||Determines if the specified type of UI is supported.|
|DisplayUI||Displays the requested UI.|
|SetSyncSpeakTimeout||Sets the timeout interval in milliseconds after which, synchronous Speak and SpeakStream calls to this instance of the voice will timeout.|
|GetSyncSpeakTimeout||Retrieves the timeout interval for synchronous speech operations for this ISpVoice instance.|