Note

Please see Azure Cognitive Services for Speech documentation for the latest supported speech solutions.

SPEVENTENUM

Microsoft Speech Platform

SPEVENTENUM lists the events possible from the Microsoft Speech Platform.

It is recommended that developers use the helper class CSpEvent to easily and clearly decode events.

`

typedef enum SPEVENTENUM
{
    SPEI_UNDEFINED,

//--- TTS engine
<strong>SPEI_START_INPUT_STREAM,</strong>
<strong>SPEI_END_INPUT_STREAM,</strong>
<strong>SPEI_VOICE_CHANGE,</strong>
<strong>SPEI_TTS_BOOKMARK,</strong>
<strong>SPEI_WORD_BOUNDARY,</strong>
<strong>SPEI_PHONEME,</strong>
<strong>SPEI_SENTENCE_BOUNDARY,</strong>
<strong>SPEI_VISEME,</strong>
<strong>SPEI_TTS_AUDIO_LEVEL,</strong>

//--- Engine vendors use these reserved bits
<strong>SPEI_TTS_PRIVATE,</strong>
<strong>SPEI_MIN_TTS,</strong>
<strong>SPEI_MAX_TTS,</strong>

//--- Speech Recognition
<strong>SPEI_END_SR_STREAM,</strong>
<strong>SPEI_SOUND_START,</strong>
<strong>SPEI_SOUND_END,</strong>
<strong>SPEI_PHRASE_START,</strong>
<strong>SPEI_RECOGNITION,</strong>
<strong>SPEI_HYPOTHESIS,</strong>
<strong>SPEI_SR_BOOKMARK,</strong>
<strong>SPEI_PROPERTY_NUM_CHANGE,</strong>
<strong>SPEI_PROPERTY_STRING_CHANGE,</strong>
<strong>SPEI_FALSE_RECOGNITION,</strong>
<strong>SPEI_INTERFERENCE,</strong>
<strong>SPEI_REQUEST_UI,</strong>
<strong>SPEI_RECO_STATE_CHANGE,</strong>
<strong>SPEI_START_SR_STREAM,</strong>
<strong>SPEI_RECO_OTHER_CONTEXT,</strong>
<strong>SPEI_SR_AUDIO_LEVEL,</strong>
<strong>SPEI_SR_RETAINEDAUDIO,</strong>

//--- Engine vendors use this reserved value.
<strong>SPEI_SR_PRIVATE,</strong>

<strong>SPEI_ACTIVE_CATEGORY_CHANGED,</strong> 

//--- Reserved for system use.
<strong>SPEI_RESERVED5,</strong>         
<strong>SPEI_RESERVED6,</strong>  

<strong>SPEI_MIN_SR,</strong>
<strong>SPEI_MAX_SR,</strong>

//--- Reserved: Do not use
<strong>SPEI_RESERVED1,</strong>
<strong>SPEI_RESERVED2,</strong>
<strong>SPEI_RESERVED3</strong>

} SPEVENTENUM;

`

Elements

  • SPEI_START_INPUT_STREAM
    The input stream (text or audio) from a Speak or SpeakStream call has begun synthesizing to the output. The event is fired by the Speech Platform.

  • SPEI_END_INPUT_STREAM
    The input stream (text or audio) from a Speak or SpeakStream call has finished synthesizing to the output. The event is fired by the Speech Platform.

  • SPEI_VOICE_CHANGE
    The Speech Platform fires this event for voice changes within a single input stream of a Speak call. wParam is either zero or the SPF_PERSIST_XML. If the current speak call takes SPF_PERSIST_XML, wparam is SPF_PERSIST_XML. Otherwise, zero. lParam is the current voice object token. elParamType has to be SPET_LPARAM_IS_TOKEN.

  • SPEI_TTS_BOOKMARK
    The bookmark element is used to insert a bookmark into the output stream. If an application specifies interest in bookmark events, it will receive the bookmark events during synthesis. wParam is the current bookmark name (in base 10) converted to a long integer. If name of current bookmark is not an integer, wParam will be zero. lParam is the bookmark string. elParamType has to be SPET_LPARAM_IS_STRING.

  • SPEI_WORD_BOUNDARY
    A word is beginning to synthesize. Markup language (XML) markers are counted in the boundaries and offsets. wParam is the character length of the word in the current input stream being synthesized. lParam is the character position within the current text input stream of the word being synthesized.

  • SPEI_PHONEME
    Phoneme was returned by the TTS engine. The high word of wParam is the duration, in milliseconds, of the current phoneme element. The low word is the id of the next phoneme element. The high word of lparam is the phoneme element feature defined in SPVFEATURE. This value will be zero if the current phoneme element is not a primary stress or emphasis. The low word of lParam is the id for the current phoneme element being synthesized.

    When the engine synthesizes a phoneme comprised of more than one phoneme element, it raises an event for each element. For example, when a Japanese TTS engine speaks the phoneme "KYA," which is comprised of the phoneme elements "KI" and "XYA," it raises an SPEI_PHONEME event for each element. Because the element "KI" in this case modifies the sound of the element following it, rather than initiating a sound, the duration of its SPEI_PHONEME event is zero.

  • SPEI_SENTENCE_BOUNDARY
    A sentence is beginning to synthesize. wParam is the character length of the sentence including punctuation in the current input stream being synthesized. lParam is the character position within the current text input stream of the sentence being synthesized.

  • SPEI_VISEME
    Viseme was determined by synthesis engine. The high word of wParam is the duration, in milliseconds, of the current viseme. The low word is for the next viseme of type SPVISEMES. The high word of lParam is the viseme feature defined in SPVFEATURE. This value will be zero if the current viseme is not primary stress or emphasis. The low word of lParam is the current viseme being synthesized.

  • SPEI_TTS_AUDIO_LEVEL
    This event is fired by the Speech Platform. lParam is 0, and wParam is the current audio level from zero to 100.

  • SPEI_TTS_PRIVATE
    Reserved for private/internal use by the TTS Engine.

  • SPEI_MIN_TTS
    Minimum event enumeration value for TTS events.

  • SPEI_MAX_TTS
    Maximum event enumeration value for TTS events.

  • SPEI_END_SR_STREAM
    The SR engine has finished receiving an audio input stream. LPARAM points to the SR engine's final HRESULT code (see CSpEvent::EndStreamResult). WPARAM points to a Boolean value signifying whether the audio input stream object was released (see CSpEvent::InputStreamReleased).

  • SPEI_SOUND_START
    The SR engine determined that audible sound is available through the input stream.

  • SPEI_SOUND_END
    The SR engine has determined that audible sound is no longer available through the input stream, or that the sound stream has been inactive for a period.

  • SPEI_PHRASE_START
    The SR engine is starting to recognize a phrase. Note that this MUST be followed by either an SPEI_FALSE_RECOGNITION or SPEI_RECOGNITION event.

  • SPEI_RECOGNITION
    The SR engine is returning a full recognition - its best guess at a text representation of the audio data. LParam is a pointer to an ISpRecoResult object (see CSpEvent::RecoResult).

  • SPEI_HYPOTHESIS
    The SR engine is returning a partial phrase recognition - effectively its best guess up to that point in the stream. LParam is a pointer to an ISpRecoResult object (see CSpEvent::RecoResult).

  • SPEI_SR_BOOKMARK
    A Bookmark event is returned when the SR engine has processed to the stream position of a bookmark. lParam is an application specified value set using ISpRecoContext::Bookmark. wParam is SPREF_AutoPause if ISpRecoContext::Bookmark was called with SPBO_PAUSE, and NULL otherwise.

  • SPEI_PROPERTY_NUM_CHANGE
    An SR engine supported property was changed. LPARAM is a string pointer to the property name that changed (see CSpEvent::PropertyName]. WPARAM contains the new value (see CSpEvent::PropertyNumValue).

  • SPEI_PROPERTY_STRING_CHANGE
    LPARAM is a string pointer to the property name that changed (see CSpEvent::PropertyName). Immediately following the NULL-termination of the property name is the new property value (see CSpEvent::PropertyStringValue).

  • SPEI_FALSE_RECOGNITION
    Apparent speech without valid recognition. An SR engine can optionally return a result object, which will be referenced by the LPARAM member (see CSpEvent::RecoResult).

  • SPEI_INTERFERENCE
    The SR engine determined that the sound stream has a hindrance and is preventing a successful recognition. lParam is any combination of SPINTERFERENCE flags (See CSpEvent::Interference).

  • SPEI_REQUEST_UI
    The SR engine's request to display a specific user interface. LPARAM is a null-terminated string (see CSpEvent::RequestTypeOfUI). Microsoft engines do not support display of graphical user interfaces (GUIs) in the Speech Platform. Calls to any ::DisplayUI method will fail.

  • SPEI_RECO_STATE_CHANGE
    The recognizer state has changed. WPARAM is the new recognizer state (see SPRECOSTATE and CSpEvent::RecoState).

  • SPEI_START_SR_STREAM
    The SR engine has reached the start of a new audio stream.

  • SPEI_SR_AUDIO_LEVEL
    The audio input stream object fires this event. wParam is the currentaudio level from zero to 100.

  • SPEI_SR_RETAINEDAUDIO
    Returns the audio that was sent to the recognizer.

  • SPEI_RECO_OTHER_CONTEXT
    A recognition was sent to another context.

  • SPEI_SR_PRIVATE
    Reserved for private/internal use by the SR engine.

  • SPEI_ACTIVE_CATEGORY_CHANGED
    The active category on the speech recognizer has changed. wParam and lParam are null.

  • SPEI_RESERVED5
    Reserved for system use.

  • SPEI_RESERVED6
    Reserved for systems use.

  • SPEI_MIN_SR
    Minimum event enumeration value for speech recognition events.

  • SPEI_MAX_SR
    Maximum event enumeration value for speech recognition events.

  • SPEI_RESERVED1
    Reserved for internal use by the Speech Platform. See SPFEI Remarks section.

  • SPEI_RESERVED2
    Reserved for internal use by the Speech Platform. See SPFEI Remarks section.

  • SPEI_RESERVED3
    Reserved for future use, do not use.