question

GunnarSylthe-1468 avatar image
0 Votes"
GunnarSylthe-1468 asked RuneSkaar edited

Speech to text - Norwegian: Capitalization

We're using Microsoft.CognitiveServices.Speech for transcription/subtitling of video clips, mostly Norwegian materials. We have noticed that e.g. spelling of proper nouns is impressively correct, but that capitalization has been missing. But as of this weekend, there seems to be a change. Now, there is suddenly TOO MUCH capitalization going on. E.g., all occurrences of the word "nok" is written in all caps (which makes it look like the abbreviation for Norwegian currency (NOK)). The same thing happens for certain other words, like "FRA" and "ET". Also, seemingly random words in the middle of sentences are capitalized. Is this a bug MS is aware of, so that we can expect a fix soon?

dotnet-csharpazure-cognitive-services
· 6
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi, are you using online transcription or batch transcription?

0 Votes 0 ·

Online, using Microsoft.CognitiveServices.Speech.SpeechRecognizer from C#.

0 Votes 0 ·

Okay, thanks, we're reviewing your feedback and will get back to you soon. Thanks.

1 Vote 1 ·
Show more comments

1 Answer

GiftA-MSFT avatar image
0 Votes"
GiftA-MSFT answered RuneSkaar edited

A fix has been rolled out, issue should be resolved now, let us know if otherwise. Sorry for the inconvenience. Thanks.

· 24
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Thanks for the swift response. It's perhaps a little better now. However, I'm sorry to say we're still seeing many of the same problems. A few concrete examples: Every occurrence of certain words is in all caps, such as JA, ET, FRA... Other words are consistently incorrectly capitalized, such as Jo, Hva, Kan, Når, Det, Med, Hun, Jeg, Men, Det, Selv, Vi... The list goes on. So if you could have another go at fixing this, it would be greatly appreciated.

But let me also say that there has been a great improvement in capitalizing words that SHOULD be capitalized! Proper nouns such as Mallorca, Gran Canaria, Spania etc. etc. are now capitalized correctly. Kudos where kudos is due! :-)

1 Vote 1 ·

Hi, we've made some changes, can you confirm whether you observe improvements?

1 Vote 1 ·

Hi again! Yes, we noticed considerable improvements this morning. However, it seems to vary...? Could it be that you're still rolling out the changes, and that our results would depend on which server we're connecting to when starting a session...? Because with some sessions, the results are about the same as yesterday, whereas others give MUCH better results!

Thanks for being so responsive, we really appreciate it.

0 Votes 0 ·

Thanks for your feedback, we'll get back to you soon!

0 Votes 0 ·

There is definitely an improvement compared to how it was a few days ago, but there are still some short Norwegian words that are consistently written in capital letters no matter where they are in the sentence.

For example
- JA - yes
- FOR - for
- NOK - enough (while NOK in capital letters means Norwegian kroner)
- ET - a (ET spørsmål - a question)
- FRA - from
- OPP - up

We have not seen these errors before until they suddenly appeared last week, but now they come quite consistently all the time.

0 Votes 0 ·

Thanks for your feedback. We rolled out changes today. Please feel free to share updates by tomorrow and let us know your observations.

0 Votes 0 ·

Good morning! Unfortunately, we're still seeing much of the same. The results may still be depending on which server we get connected with (?), but so far today my trials have been disappointing, I'm sorry to say. I'll try to attach a screen shot of a short example from this morning, where I've marked incorrect capitalizations in red.

81053-2021-03-24-08-08-52.jpg


0 Votes 0 ·
Show more comments

Even though the quality of the transcription has improved a bit, it is still a big problem that some Norwegian words (FOR, ET, NOK, FRA, OPP, ...) are consistently transcribed in capital letters.

This is actually such a big problem that we had to shut down a "proof of concept solution" we have had up and running for a while where we offer automatically generated subtitles in Norwegian to end users.

Therefore, I hope you can prioritize finding a solution to these problems.

0 Votes 0 ·

Hi all, thanks for your feedback, we are still investigating this issue. Our assumption is that there might be some delay in deployment to production. Will share updates as soon as possible. Thanks.

0 Votes 0 ·

Looking much, much better now, thank you! Still SOME incorrectly capitalized words, but a huge improvement. Thank you for being so responsive, it's appreciated!

0 Votes 0 ·

Can also confirm that it looks much better today, the annoying errors with capital letters in words like ET, FRA, UT, NOK, OPP,… now seem to be completely gone. Thanks for the help :-)

0 Votes 0 ·

Glad to be of help!

0 Votes 0 ·

I thought I saw a comment from you asking for examples of words that are still incorrectly capitalized. Late answer due to Easter holidays, but as far as I can tell, there are now only a very few problematic words remaining. Predominantly "Skal" (shall/will); this word seems to always be transcribed with a capital S. Other words, such as "Det" (it), "Dette" (this) and "Etter" (after) are sometimes transcribed with a capital letter even in the middle of sentences, and sometimes correctly.

0 Votes 0 ·
Show more comments