Can Azure Speech to Text support more audio files formats, like OGG OPUS, MP3 ?

Question

Does Azure Speech to Text only supports WAV file?

I have files in OGG OPUS format from WhatsApp but can not use this Azure service to convert that speech audio into text.
I had to use other cloud for this.

Can Azure Speech to Text accept OGG OPUS; MP3?

I tried NAudio to read or convert the OGG OPUS TO WAV, but it does not work. Also this would increase the file size.
https://github.com/naudio/Vorbis/issues/9

On other cloud, it was just send the file and get the text. Quick and easy. But I am Azure fan, would like to have this on Azure.

Answer

@Tony Thanks for the question. The default audio streaming format is WAV (16 kHz or 8 kHz, 16-bit, and mono PCM). Outside of WAV / PCM, the compressed input formats listed are also supported using GStreamer.
Here is the doc for supported input formats and samples.

The below python code is converting any audio files size:
https://github.com/caiomsouza/Microsoft-Cognitive-Services/blob/master/speech-to-text/speech-to-text-all-files_large_files.py

Can Azure Speech to Text support more audio files formats, like OGG OPUS, MP3 ?

1 answer