SpVoice (Events) Interface (SAPI 5.4)

Microsoft Speech API 5.4

SpVoice (Events)

The SpVoice (Events) automation object defines the types of events that can be received by an SpVoice object from a text-to-speech (TTS) engine.

In order to understand voice events, it is necessary to distinguish between the TTS engine, which synthesizes speech from text, and the SpVoice object, which applications employ to communicate with the engine. The TTS engine is somewhat like a server, and the SpVoice object like a client. The voice object sends the engine a request to speak a string of text. The engine processes the request as soon as it can. The interval between a speech request and the production of the speech is unpredictable. SpVoice events overcome this difficulty by providing applications with real-time feedback from the engine as it speaks, making it possible to synchronize application functions with speech. For example, an application can use a voice object's Viseme event to drive animations that display mouth movements as the engine speaks.

The voice object initiates requests with the Speak and SpeakStream methods, which send text strings and audio files to the TTS engine. These methods can be called synchronously or asynchronously. Because a synchronous speech request suspends execution of the calling application while the engine speaks the stream, events from the speaking of the stream are received after the stream has been spoken. Applications which need to receive events as real-time feedback should use asynchronous Speak and SpeakStream calls.

Examples of voice events are the beginning and the end of a text stream, and the boundaries of visemes, phonemes, words, and sentences. The SpeechVoiceEvents enumeration defines a constant for each type of voice event. Use one or more SpeechVoiceEvents constants to set the EventInterests property of a voice object. Only the types of events specified by EventInterests property will be sent by the TTS engine. The default setting of this property specifies all voice event types except AudioLevel.

When using Visual Basic, you must use the "WithEvents" keyword to define an SpVoice object which receives events.

Events in file streams

When a voice object speaks into a filestream object, the TTS engine will embed event data in the file stream if all the following conditions are true:

  • The voice object is defined using the "WithEvents" keyword
  • The voice object's EventInterests property specifies at least one event type
  • The audio output contains event conditions of a type specified in the voice's EventInterests
  • The filestream object is opened for writing with its "DoEvents" parameter True

When TTS engine speaks a filestream object which contains embedded events, it will send events to the voice if all the following conditions are true:

  • The voice object is defined using the "WithEvents" keyword
  • The voice object's EventInterests property specifies at least one event type
  • The file stream contains an embedded event of a type specified in the voice's EventInterests
  • The filestream object is opened for reading with its "DoEvents" parameter True

When the TTS engine speaks a filestream object for a voice, if the voice's EventInterests specify StartStream and EndStream events, the engine will send it a StartStream and an EndStream event, even if these events are not embedded in the stream. If StartStream and EndStream events are embedded in that file stream, the engine will send the voice two StartStream events and two EndStream events.

Automation Interfaces

The SpVoice (Events) automation object has the following elements:

Events
AudioLevel Event
Bookmark Event
EndStream Event
EnginePrivate Event
Phoneme Event
Sentence Event
StartStream Event
Viseme Event
VoiceChange Event
Word Event