Hello!
We are using microsoft.cognitiveservices.speech for transcription and subtitling of video, focusing on norwegian. We are currently testing the efficiency and accuracy of transcription in segments. Some testing and reviews show that transcription quality can be improved with an overlap between parallell processed segments, and post-processing the overlap.
Are there any documented experiences with such overlap, or available recommended tools? The expected improved quality is cross-referencing either vectorized distance between- or reverse-search relevance words in both overlapped segments and the respective transcripts of each prior/following segment.
or upvote
which might help other community members reading this thread.