SpeechRecoContext SoundStart Event (SAPI 5.4)

Microsoft Speech API 5.4

Interface: ISpeechRecoContext Events

SoundStart Event

The SoundStart event occurs when the SR engine encounters the start of sound in the audio input stream.

SoundStart indicates a sound level significant enough to be a voice. When that sound stops, a SoundEnd event is generated. A recognition attempt occurs only after a SoundEnd event; hence, long continuous speaking periods may take an equally long time to process.

Light background noise will not register as an input sound. Likewise a loud noise will be considered the start of an input sound. If the sound is constant, a time-out occurs sending a SoundEnd event.

     StreamNumber As Long,
     StreamPosition As Variant


  • StreamNumber
    Specifies the stream number.
  • StreamPosition
    Specifies the position within the stream. If downsampling an audio stream, StreamPosition will be the byte position within the converted stream.


For speech processing, the SR engine must perform the following sequence: Stream start, sound start and phrase start. A stream start indicates a valid stream is ready for audio input. The stream persists unless the recognition context is disabled or the associated grammar is deactivated. The sound start indicates a sound level has been detected. However, it is possible the SR engine could stop that recognition attempt if the input sound were questionable. For example, if the sound were a constant level or if above or below pre-determined sound levels. If the sound level is acceptable and variable, a phrase start is initiated and it is assumed to be the beginning of a recognition attempt.


The following Visual Basic form code demonstrates the use of the SoundStart and SoundEnd events. The application displays a stream number and notifications that a sound has begun or ended. It also displays a successful recognition

To run this code, create a form with the following controls:

  • Two labels called Label1 and Label2

Paste this code into the Declarations section of the form.

The Form_Load procedure creates and activates a dictation grammar.

  Public WithEvents RC As SpSharedRecoContext
Public myGrammar As ISpeechRecoGrammar

Private Sub Form_Load()
    Set RC = New SpSharedRecoContext
    Set myGrammar = RC.CreateGrammar
    myGrammar.DictationSetState SGDSActive
End Sub

Private Sub RC_Recognition(ByVal StreamNumber As Long, ByVal StreamPosition As Variant, ByVal RecognitionType As SpeechLib.SpeechRecognitionType, ByVal Result As SpeechLib.ISpeechRecoResult)
    Label1.Caption = Result.PhraseInfo.GetText
End Sub

Private Sub RC_SoundEnd(ByVal StreamNumber As Long, ByVal StreamPosition As Variant)
    Label2.Caption = "Sound end at position: " & StreamPosition
End Sub

Private Sub RC_SoundStart(ByVal StreamNumber As Long, ByVal StreamPosition As Variant)
    Label2.Caption = "Sound start"
End Sub