ISpeechPhraseElement Code Example (SAPI 5.4)

Microsoft Speech API 5.4

Interface: ISpeechPhraseElement

ISpeechPhraseElement Code Example

The following Visual Basic form code displays the properties of the ISpeechPhraseElement object. The ISpeechPhraseElement object is contained in the ISpeechRecoResult object returned by a RecoContext's Recognition event. This code shows two ways to create a recognition result object:

The example also demonstrates the use of the SpFileStream object in conjunction with the AudioOutputStream of the voice and the AudioInputStream of the recognizer.

To run this code, create a form with the following controls:

  • Two command buttons called Command1 and Command2
  • A list box called List1
  • A text box called Text1

Paste this code into the Declarations section of the form.

The code will set the command button captions as shown in the illustration.

The Form_Load procedure creates a recognizer, a recognition context, and a grammar object. It loads the grammar object with sol.xml, the Solitaire grammar from the SAPI sample code. It then activates the command and control (C and C) and dictation components of the grammar, and places a Solitaire command in the text box. Users can enter whatever text they like in the text box, but best recognition results will be obtained from phrases matching the rules in the C and C grammar; for example, sentences such as "Move the black ten to the jack of diamonds," or "Play the red queen."

The command button captioned Recognition speaks text from the text box into an audio file, and then performs speech recognition of that file. The command button captioned EmulateRecognition simply calls the EmulateRecognition method.

When the speech recognition (SR) engine has completed recognition, it generates a Recognition Event that returns an ISpeechRecoResult object. The Recognition procedure instantiates each ISpeechPhraseElement in the result object's Elements property (a collection of ISpeechPhraseElement objects), and displays selected phrase element properties in columns in the list box.


AudioStreamOffset and AudioSizeBytes

The first two columns show the AudioStreamOffset and AudioSizeBytes properties. These two properties indicate the boundaries of an element in the input audio stream. AudioStreamOffset points to the beginning of the element and AudioSizeBytes is the element's length. The sum of an element's AudioStreamOffset and AudioSizeBytes is the same as the AudioStreamOffset of the next element.

In an ISpeechPhraseElement object created by the EmulateRecognition method, the AudioStreamOffset and AudioSizeBytes properties are zero.

AudioTimeOffset and AudioSizeTime

The next two columns are the AudioTimeOffset and AudioSizeTime properties, which delimit the phrase elements in 100-nanosecond units of time. AudioTimeOffset indicates the beginning time of the element, and AudioSizeTime is its time length. The sum of an element's AudioTimeOffset and AudioSizeTime is approximately the same as the AudioTimeOffset of the next element.

In an ISpeechPhraseElement object created by the EmulateRecognition method, the AudioTimeOffset and AudioSizeTime properties are zero.


The next column is the DisplayAttributes property, which defines how the text of the element is displayed relative to text from other phrase elements. The SpeechDisplayAttributes enumeration lists the possible values of this property. All elements in the example above have a DisplayAttributes property of two, which is the value of the SDA_One_Trailing_Space constant, indicating that the element should be displayed with a trailing space.

EngineConfidence, ActualConfidence, and RequiredConfidence

The three numbers in parentheses are the three property values involving confidence in the recognition of the phrase element. The first of these is the EngineConfidence property, which represents the SR engine's level of confidence in the recognition. The ActualConfidence property reduces the EngineConfidence to one of three confidence levels: low, normal or high. The RequiredConfidence property specifies the confidence level that the ActualConfidence property must equal or surpass.

In an ISpeechPhraseElement object created by the EmulateRecognition method, the EngineConfidence and RequiredConfidence properties are zero. If the emulated phrase matches C and C rules, the ActualConfidence property is one; otherwise it is zero.

DisplayText and LexicalForm

The DisplayText property is the next column in the example. It consists of the recognized text of the phrase element, with normalization of numbers, ordinals, and currency values. The LexicalForm property, not shown in this example, returns the same text, but without normalization.


To the right of the DisplayText is data from the Pronunciation property. Each number represents a phoneme, and the phonemes represent the pronunciation of the phrase element.

In an ISpeechPhraseElement object created by the EmulateRecognition method, the Pronunciation property is Empty.

RetainedStreamOffset and RetainedSizeBytes

The RetainedStreamOffset and RetainedSizeBytes properties are not shown in this example. If the current recognition context is retaining audio data, then RetainedStreamOffset and RetainedSizeBytes are the same as AudioStreamOffset and AudioSizeBytes, respectively; otherwise, both properties are zero.

In an ISpeechPhraseElement object created by the EmulateRecognition method, the RetainedStreamOffset and RetainedSizeBytes properties are zero.

Option Explicit

Const WAVEFILENAME = "C:\ISpeechPhraseElement.wav"

Dim MyRecognizer As SpeechLib.SpInprocRecognizer
Dim MyGrammar As SpeechLib.ISpeechRecoGrammar
Dim MyFileStream As SpeechLib.SpFileStream
Dim PhraseElem As SpeechLib.ISpeechPhraseElement
Dim MyVoice As SpeechLib.SpVoice

Dim WithEvents MyRecoContext As SpeechLib.SpInProcRecoContext

Private Sub Command1_Click()
    On Error GoTo EH

    Set MyFileStream = MakeWAVFileFromText(Text1.Text, WAVEFILENAME)
    MyFileStream.Open WAVEFILENAME
    Set MyRecognizer.AudioInputStream = MyFileStream

    If Err.Number Then ShowErrMsg
End Sub

Private Sub Command2_Click()
    On Error GoTo EH

    MyRecoContext.Recognizer.EmulateRecognition Text1.Text

    If Err.Number Then ShowErrMsg
End Sub

Private Sub Form_Load()
    On Error GoTo EH

    ' Create Recognizer, RecoContext, Grammar, and Voice
    Set MyRecognizer = New SpInprocRecognizer
    Set MyRecoContext = MyRecognizer.CreateRecoContext
    Set MyGrammar = MyRecoContext.CreateGrammar(16)
    Set MyVoice = New SpVoice
    Set MyVoice.Voice = MyVoice.GetVoices("gender=male").Item(0)

    ' Load Grammar with solitaire XML, set active
    MyGrammar.CmdLoadFromFile "C:\sol.xml", SLOStatic
    MyGrammar.CmdSetRuleIdState 0, SGDSActive               'Set MyRecoContext & MyRecoContext active
    MyGrammar.DictationSetState SGDSActive                  'Set Dictation active

    Text1.Text = "play the eight of clubs"
    Command1.Caption = "&Recognition;"
    Command2.Caption = "&EmulateRecognition;"

    If Err.Number Then ShowErrMsg
End Sub

Private Function PhonesToString(ByVal arrV As Variant) As String
    Dim ii As Integer, S As String

    On Error GoTo EH

    If IsEmpty(arrV) Then
        PhonesToString = ""
        For ii = 0 To UBound(arrV)
            If Len(S) Then
                S = S & "," & arrV(ii)
                S = arrV(ii)
            End If
        Next ii
        PhonesToString = S
    End If

    If Err.Number Then ShowErrMsg
End Function

Private Sub MyRecoContext_Recognition _
   (ByVal StreamNumber As Long, _
    ByVal StreamPosition As Variant, _
    ByVal RecognitionType As SpeechLib.SpeechRecognitionType, _
    ByVal Result As SpeechLib.ISpeechRecoResult)

    Dim X As String
    Dim T As String
    Dim A1 As Long, A2 As Long
    Dim T1 As Long, T2 As Long
    Dim C1 As Single, C2 As Integer, C3 As Integer

    On Error GoTo EH

    For Each PhraseElem In Result.PhraseInfo.Elements

        'Audio data
        A1 = PhraseElem.AudioStreamOffset
        A2 = PhraseElem.AudioSizeBytes
        X = Format(A1, "000000") & " " & Format(A2, "000000") & "  "

        'Time data
        T1 = PhraseElem.AudioTimeOffset
        T2 = PhraseElem.AudioSizeTime
        X = X & Format(T1, "000000000") & " " & Format(T2, "000000000") & "  "

        'Display attributes
        X = X & Format(PhraseElem.DisplayAttributes) & " "

        C1 = PhraseElem.EngineConfidence
        C2 = PhraseElem.ActualConfidence
        C3 = PhraseElem.RequiredConfidence
        T = "(" & Format(C1) & " " & Format(C2) & " " & Format(C3) & ")"
        X = X & Left(T & "         ", 14)

        'Text and pronunciation
        X = X & Left(PhraseElem.DisplayText & "              ", 14)
        X = X & PhonesToString(PhraseElem.Pronunciation)

        List1.AddItem X

    If Err.Number Then ShowErrMsg
End Sub

Private Sub MyRecoContext_EndStream _
   (ByVal StreamNumber As Long, _
    ByVal StreamPosition As Variant, _
    ByVal StreamReleased As Boolean)

    On Error GoTo EH

    'Recognition uses the Filestream, EmulateReco does not
    If ActiveControl.Caption = "&Recognition;" Then MyFileStream.Close
    List1.AddItem ""

    If Err.Number Then ShowErrMsg
End Sub

Private Function MakeWAVFileFromText _
   (ByVal strText As String, _
    ByVal strFName As String) _
    As SpFileStream

    On Error GoTo EH

    ' Declare identifiers:
    Dim FileStream As SpFileStream
    Dim Voice As SpVoice

    ' Instantiate Voice and FileStream objects:
    Set Voice = New SpVoice
    Set FileStream = New SpFileStream

    ' Open specified .wav file, set voice output
    ' to file, and speak synchronously:
    FileStream.Open strFName, SSFMCreateForWrite, True
    Set Voice.AudioOutputStream = FileStream
    Voice.Speak strText, SVSFIsXML

    ' Close file and return reference to FileStream object:
    Set MakeWAVFileFromText = FileStream

    If Err.Number Then ShowErrMsg
End Function

Private Sub ShowErrMsg()

    ' Declare identifiers:
    Const NL = vbNewLine
    Dim T As String

    T = "Desc: " & Err.Description & NL
    T = T & "Err #: " & Err.Number
    MsgBox T, vbExclamation, "Run-Time Error"

End Sub