Speech language detection for speech-to-text

Question

Hi,
We want to use your solution for speech to text service.

Our use case is the following one, we want to get the transcript from an audio, but we do not know from which language the audio is.

I noted that the language detection has some limits:

Language identification currently has a limit of four languages for single-shot recognition, and 10 languages for continuous recognition.

Is the limitation about the number of languages to search into or about the number of different languages that can be detected from the audio ?
As we do not know in which language the audio is, we may need to detect the language between all the one you can detect.
The time to detect a language between more than 2 seems quite long. What about it ?

Moreover, I tested to detect the language from an English audio between German and English, the result was German detected, it is quite weird. What can be the source of the issue ?

Do you have an audio to test your feature with ?

Thanks a lot for your precious answers,
Laure Florent

Answer

Hello,

Sorry for the delay. The limitation means you have max to 4 language in the single-shot recognition. For which language is supported by language detection, please refer to below table, there are around 30 languages are supported now:

https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/language-support#speech-to-text

For more samples code and sample input, please refer to here: https://github.com/Azure-Samples/cognitive-services-speech-sdk

Hope this helps.

Regards,
Yutong

Speech language detection for speech-to-text

1 answer