How to output transcription on a word-level

Sophie 0 Reputation points
2024-05-17T08:41:50.08+00:00

With the provided callback function, the text is outputted as described by you, either after a short pause or after a maximum of 15 seconds. Is it possible to output word by word so that the text can be seen while speaking?


def conversation_transcriber_transcribed_cb(evt: speechsdk.SpeechRecognitionEventArgs):
    if evt.result.reason == speechsdk.ResultReason.RecognizedSpeech:
        print('\tText={}'.format(evt.result.text))
        print('\tSpeaker ID={}'.format(evt.result.speaker_id))
Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,464 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Gowtham CP 2,920 Reputation points
    2024-05-17T09:11:07.3166667+00:00

    Hello Sophie ,

    Thanks for reaching out in the Microsoft Q&A!

    def conversation_transcriber_transcribed_cb(evt: speechsdk.SpeechRecognitionEventArgs):
        if evt.result.reason == speechsdk.ResultReason.RecognizedSpeech:
            words = evt.result.text.split()  # Split the recognized text into words
            for word in words:
                print(f"\tWord: {word}")  # Output each word
            print('\tSpeaker ID={}'.format(evt.result.speaker_id))  
    

    To achieve word-by-word output, I modified the callback function by adding code to split the recognized text into individual words and then iterating over each word to print it separately. This allows the text to be displayed incrementally as it is spoken, providing real-time feedback.

    If you found this solution helpful, consider accepting it.