Can Azure Speech to Text support more audio files formats, like OGG OPUS, MP3 ?

Tony 1 Reputation point
2021-03-06T16:19:09.86+00:00

Does Azure Speech to Text only supports WAV file?

I have files in OGG OPUS format from WhatsApp but can not use this Azure service to convert that speech audio into text.
I had to use other cloud for this.

Can Azure Speech to Text accept OGG OPUS; MP3?

I tried NAudio to read or convert the OGG OPUS TO WAV, but it does not work. Also this would increase the file size.
https://github.com/naudio/Vorbis/issues/9

On other cloud, it was just send the file and get the text. Quick and easy. But I am Azure fan, would like to have this on Azure.

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,410 questions
Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
2,409 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Ramr-msft 17,616 Reputation points
    2021-03-08T10:19:16.97+00:00

    @Tony Thanks for the question. The default audio streaming format is WAV (16 kHz or 8 kHz, 16-bit, and mono PCM). Outside of WAV / PCM, the compressed input formats listed are also supported using GStreamer.
    Here is the doc for supported input formats and samples.

    The below python code is converting any audio files size:
    https://github.com/caiomsouza/Microsoft-Cognitive-Services/blob/master/speech-to-text/speech-to-text-all-files_large_files.py

    0 comments No comments