Note

Please see Azure Cognitive Services for Speech documentation for the latest supported speech solutions.

Microsoft Speech Platform

ISpVoice

The ISpVoice interface enables an application to perform speech synthesis operations. Applications can speak text strings and text files, or play audio files through this interface. All of these can be done synchronously or asynchronously.

A voice is an instance of a speech synthesis (text-to-speech, or TTS) engine that specifies a voice token to use for synthesizing speech from text. Applications can choose a specific TTS voice token using ISpVoice::SetVoice. If no voice token is selected, the TTS engine will use the default voice token, which is specified at the following registry key: HKEY_CURRENT_USER\Software\Microsoft\Speech Server\v11.0\Voices\DefaultTokenId.

Your applications can modify the characteristics of a voice (for example, rate, pitch, and volume), by embedding Speech Synthesis Markup Language (SSML) XML tags into the text to be spoken. See Use SSML to Create Prompts and Control TTS. Some attributes, like rate and volume, can be changed in real time using ISpVoice::SetRate and ISpVoice::SetVolume. Applications can set the priority of a voice using ISpVoice_SetPriority.htm.

ISpVoice inherits from the ISpEventSource interface. An ISpVoice object forwards events back to the application when the corresponding audio data has been rendered to the output device.

Associated Class IDs

The following class IDs (CLSID) may be used with this interface.

  • CLSID_SpVoice

See Application Object Classes for a complete CLSID listing for all interfaces.

Methods in Vtable Order

ISpVoice Methods Description
ISpEventSource inherited methods All methods of ISpEventSource are accessible from this interface.
SetOutput Sets the current output object. A value of NULL may be used to select the default audio device.
GetOutputObjectToken Retrieves the object token for the current audio output object.
GetOutputStream Retrieves a pointer to the current output stream.
Pause Pauses the voice at the nearest alert boundary and closes the output device.
Resume Sets the output device to the RUN state and resumes rendering.
SetVoice Sets the identity of the voice used for text synthesis.
GetVoice Retrieves the object token that identifies the voice used in text synthesis.
Speak Speaks the contents of a text string or file.
SpeakStream Speaks the contents of a stream.
GetStatus Retrieves the current rendering and event status associated with this ISpVoice instance.
Skip Causes the voice to skip forward or backward the specified number of items within the text of the current speak call.
SetPriority Sets the priority for the voice. Normal, Alert, Over.
GetPriority Retrieves the current voice priority level.
SetAlertBoundary Specifies which event should be used as the insertion point for alerts.
GetAlertBoundary Retrieves the event that is currently being used as the insertion point for alerts.
SetRate Sets the text rendering rate adjustment in real time.
GetRate Retrieves the current text rendering rate adjustment.
SetVolume Sets the synthesizer output volume level in real time.
GetVolume Retrieves the current output volume level of the synthesizer.
WaitUntilDone Blocks the caller until either the voice has completed speaking or the specified time interval has elapsed.
SetSyncSpeakTimeout Sets the timeout interval in milliseconds after which, synchronous Speak and SpeakStream calls to this instance of the voice will timeout.
GetSyncSpeakTimeout Retrieves the timeout interval for synchronous speech operations for this ISpVoice instance.
SpeakCompleteEvent Returns an event handle that will be signaled when the voice has completed speaking all pending requests.
IsUISupported Determines if the specified type of UI is supported.
DisplayUI Displays the requested UI.