question

HalvorHellandBarndonVisitoAS-0848 avatar image
0 Votes"
HalvorHellandBarndonVisitoAS-0848 asked romungi-MSFT edited

Speech to text - Transcription in segments: Overlap post-processing

Hello!
We are using microsoft.cognitiveservices.speech for transcription and subtitling of video, focusing on norwegian. We are currently testing the efficiency and accuracy of transcription in segments. Some testing and reviews show that transcription quality can be improved with an overlap between parallell processed segments, and post-processing the overlap.

Are there any documented experiences with such overlap, or available recommended tools? The expected improved quality is cross-referencing either vectorized distance between- or reverse-search relevance words in both overlapped segments and the respective transcripts of each prior/following segment.

azure-cognitive-servicesazure-speech
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

1 Answer

romungi-MSFT avatar image
0 Votes"
romungi-MSFT answered romungi-MSFT edited

@HalvorHellandBarndonVisitoAS-0848 Did you try the batch transcription API for long audio on mono channel with diarization enabled and word level timestamps set to true? This feature of the speech to text API should allow you to use the full audio track of your video and get the transcription. There are some limitations with diarization where only two voices can be used with the REST API.

Using the short audio API with timestamps enabled will be ideal for short audio files without the need of referencing text from different transcripts.

Other tools that work well with video files to get transcripts readily is the Azure video indexer but there is limitation around the languages that it supports and currently Norwegian is not listed as a supported language.

Azure media services also offers built in presets to analyze audio through its API but Norwegian is currently not supported with this preset.


If an answer is helpful, please click on 130616-image.png or upvote 130671-image.png which might help other community members reading this thread.


5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.