question

tonyhenrique avatar image
0 Votes"
tonyhenrique asked ·

Can Azure Speech to Text support more audio files formats, like OGG OPUS, MP3 ?

Does Azure Speech to Text only supports WAV file?

I have files in OGG OPUS format from WhatsApp but can not use this Azure service to convert that speech audio into text.
I had to use other cloud for this.

Can Azure Speech to Text accept OGG OPUS; MP3?

I tried NAudio to read or convert the OGG OPUS TO WAV, but it does not work. Also this would increase the file size.
https://github.com/naudio/Vorbis/issues/9

On other cloud, it was just send the file and get the text. Quick and easy. But I am Azure fan, would like to have this on Azure.

azure-cognitive-servicesazure-speech
10 |1000 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

1 Answer

ramr-msft avatar image
0 Votes"
ramr-msft answered ·

@tonyhenrique Thanks for the question. The default audio streaming format is WAV (16 kHz or 8 kHz, 16-bit, and mono PCM). Outside of WAV / PCM, the compressed input formats listed are also supported using GStreamer.
Here is the doc for supported input formats and samples.

The below python code is converting any audio files size:
https://github.com/caiomsouza/Microsoft-Cognitive-Services/blob/master/speech-to-text/speech-to-text-all-files_large_files.py


·
10 |1000 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.