How to output transcription on a word-level

Question

With the provided callback function, the text is outputted as described by you, either after a short pause or after a maximum of 15 seconds. Is it possible to output word by word so that the text can be seen while speaking?


def conversation_transcriber_transcribed_cb(evt: speechsdk.SpeechRecognitionEventArgs):
    if evt.result.reason == speechsdk.ResultReason.RecognizedSpeech:
        print('	Text={}'.format(evt.result.text))
        print('	Speaker ID={}'.format(evt.result.speaker_id))

Answer

Hello Sophie ,

Thanks for reaching out in the Microsoft Q&A!

def conversation_transcriber_transcribed_cb(evt: speechsdk.SpeechRecognitionEventArgs):
    if evt.result.reason == speechsdk.ResultReason.RecognizedSpeech:
        words = evt.result.text.split()  # Split the recognized text into words
        for word in words:
            print(f"	Word: {word}")  # Output each word
        print('	Speaker ID={}'.format(evt.result.speaker_id))

To achieve word-by-word output, I modified the callback function by adding code to split the recognized text into individual words and then iterating over each word to print it separately. This allows the text to be displayed incrementally as it is spoken, providing real-time feedback.

If you found this solution helpful, consider accepting it.

Share via

How to output transcription on a word-level

1 answer